 My name is Mastavina. I am a student from the University of Montreal. I've just accomplished my bachelor's degree in criminology with the specialization in analysis. And I'm already starting my master's degree also in criminology at the same university, but this time I'm focusing on cyber crimes. So today I will be sharing with you the research projects that I've accomplished through a mentorship with Kampanab's Québec in their cyber security department, which is called Dijon. And the research project is called Social Butts, Malicious Use of Social Media. So we'll be talking today about Social Butts, about social media platforms, we'll be talking about trolls, we'll be talking about political interference, and we'll be talking about disinformation and other kind of very interesting stuff. So bear with me. English is my second language, so I hope I won't be making too much pronunciation mistakes, and I hope you'll enjoy this talk. So for the agenda of the day, I will go through every general step of the research to give you really an insight of what we've done, but also at each step I will try to give you some tools and tricks. So if you would love to replicate a study similar to this one, you could do so after this talk. And also I will focus on the results because this is the part that I think is the most interesting. And also if you're just here because the subject really interests you, well, this is probably the moment you're waiting for. So without further ado, we'll already start with the definition of political interference, because I don't want to take for granted that everyone know what it is. This is quite a complex concept to understand and to explain in like 30 seconds. But the senseless definition that I could give you is that political interference happens when a country or a group of people try to put some discord between citizens or they try to manipulate public discourse, distributing the functioning of a electoral system through different kinds of tactics for their own personal gain. So political interference has happened through thousands of years, so this is not new. The only thing that's changing is how they are doing it. And exactly, we're starting to see it more happening with the aim of social media, especially with the help of disinformation. So I know there will be other talks, talking especially about disinformation today, so I won't go too deep into that. But simply disinformation, it's really false information that is spread online to fool its readers. And we also see that they do use the help of social bots to spread those disinformation online to manipulate the public discourse through a political campaign. So social bots, what they are, they're basically robots, actually they are scripts, that are put online on a social media platform to act like a human online. So they can do exactly like every user can do on their social media profile, so they can create a user, they can create tweets, they can post up, they can share stuff, they can like stuff, and they can also exchange with other humans or other bots. So what they are used for, they have multiple purposes, depending on their goals. What I mean by that is that we do have some legitimate use, like for helping, facilitating a marketing of a company by putting some content online for them, or by answering some common question that we often get, we all at least once talk with a social bot online, but well, they did the job, so they're great, it's really a legitimate use of them. But there are also other use of social bots that are more so illegal, or we could say immoral. What I mean by that is when they are creating harassment online, or when they are creating hate speech online, are they're spreading disinformation online. So keep in mind that they will act differently depending on their goals. That leaves me with the problematic. So the problematic is the why. So why are we doing this? Why should we care and why should you be listening today? The first reason is the rise in cases of political interference globally. So we did all saw it in the US 2016 presidential elections. We saw it happening again in 2017, where we did math only in France, we saw it happening again in the Netherlands in 2018, and we also saw it in Africa, especially in Ghana in 2018. Also, there was the CIC, so our Center for Security Establishment, said in a paper in 2019 that there was a high likelihood of foreign interference activities online, so on social media during the elections. Also, we do see a sophistication and a humanization of social bots online. What that implies is that it's getting way higher for researchers, practitioners, and even the social media platform to spot those bots online, to spot the social bots. Obviously, because of the US 2016 election and other events, they were getting more social and political pressure to do something about it. So people wanted the social media platform to control and present those kinds of events to happening again. They did put out some measures, and we're now at this stage where it's the time to validate the effectiveness of those measures. So to do so, we wanted to draw a portrait of the activities of social bots during the 2019 Canadian federal election campaign. So this is a descriptive research, and we had especially parts where there were sub-objectives of the research. So the first one was to establish a list of accounts, identify as possibly being active social bots between September 26 to November 11, 2019. The second one was to determine how the identified social bots inserted themselves into the political discussion during the collection period. The third one was to study the point of view of experts on the role of social bots in matters of political interference in order to deepen the first results that we obtained. So far, the researchers out there who did use the mixed methods methodology, so combining the quantitative and the qualitative data together. And also for the quantitative methodology, so the collect of data was done way before I started working with St. Canas Quebec. It was a period of 47 days, starting at the end of September and finishing early November. The data source was directly on Twitter with the help of Twitter API. And the initial sample went up to four millions of tweets, representing more, almost 400,000 profiles. So this is a list of data collected. So to do so, to collect the data, because on Twitter, people used hashtags to collect the data. So some of those, some of the 48 hashtags were really generous. So basically just referring to the election in general. And other one were clearly either in favor of some party and other one were clearly against other parties. So we really only concentrated our efforts on those two. They were the main, the most popular ones on that election. So also for the data, for the collection of the data, because there is some important limits that you need to work with when you're using the Twitter API, I'm talking about the limits of the request that you can do. So for example, at the time, it was the limit of the request every 15 minutes. So you have to work with it. So what they did is that they created three machines, which they call make one student three. And those machines were basically Python scripts. And each Python script was run every hour at different time, but still every hour in different. And each one were responsible of different hashtags. Those hashtags were, and when, when I mean, they were collecting hashtags, we were collecting tweets, a response and retweet that would refer that had this hashtag in it. And at the end, all those data were going through MSSQA server. And we also had a algorithm on the MSSQA server to make sure that there were no duplicates because yes, people tend to use more than one hashtag in one tweet. So how to replicate? So I will give you a couple of tricks if you would love to replicate something like that. First, you need to know how to program. So you need to know how to code. The big advantage of the Twitter API is that you can choose the language of your choice. We use Python, but you could use other one. Next, we did need to have a virtual machine or a computer that is fully dedicated to the collection of the data, because it can't work at the same time on it. So just for us, for example, it took almost for seven weeks. So keep in mind that you couldn't work on your computer for seven weeks. Also, you need to have enough database to store all those data for us just to give you an idea. And we're in Canada, where it's not the country where people use most Twitter for incorporation of the states, for example. We had a total of 2,500 tweets per day on an average. And it went up to 200,000, sorry, tweets per day on the election day. So that's really something that you have to keep in mind. All those data needs to go somewhere. Also, quick advice from me. Make sure that you really understand the limits of the API, because if you don't, this could really have an impact in the generalization that you can do of your research and your results, because you don't really have all the data, all the tweets that we're putting out there in your event. And also, you could use the Twitter API premium that could save you sometimes if you do have the money, because yes, it does cost money. So to come back to my research, because it was an internship of less than four months, I had to concentrate all my efforts into a smaller sample. So I only used 5% of all the users collected, but I chose the users that were posting the most. What I mean by that is that those less than 5% of users represented 73% of all the tweets. So that's roughly three million of tweets posted in less than seven weeks. So that's a lot of tweets per account and per day. And for the quantitative method, I did do three interviews with different experts in the field. I won't go too deep into who they are and their specialty, because I want to keep their anonymity. But all those information, there will be more information, sorry, directly on the paper that is coming up soon, if you're interested in knowing more. So for the results, that's my favorite part. So the first objective was to find the goal of that objective was to found some social bots online. We wanted to create a list of social bots. So how can you find social bots online? First thing to consider is that, yes, in the scientific literature, we can find some indicators that could help you to guide you to find some bots online. The thing is, if you do so, it does take quite some time. So yes, you can go on, find, have your account and kind of check all the boxes to see if they do or they do not have that specific thing that I found, but that will take you a lot of time. I know that other people did some graphs related stuff. So they started with one social bot and they started to check with who they were exchanging, who they were following, who were following them, each other out to do it from sort of networks with them. This is a really great way to go, but this is a really long way to go if you have thousands of users to look at. So in my case, I did use the BattleMeter tool. So this is a tool from the observatory on social media from the Indiana University. And the first result that we got was actually that more than 3,000 of users were not found by this tool. And mainly because actually more than the app were deleted. So actually their users really deleted their own profiles. And the other app was suspended. So it's really Twitter because those accounts went against their rules suspended those accounts. So with the remaining accounts, we got the results. So how BattleMeter works, it does have four indicators. And all those indicators get a score on 0 to 5. And then after that, they made a total with that. And that's for the final score. So for the, as you can see, the vast majority. So actually 8% of my sample got 0 to 0.9. So they were really highly human. And then we have a yield drop and go on the 5. So 5 is really 5 to 5. So the highest score that you could get. So that's the highest score that you could get, which means that there is a high likelihood that they are robots. They are social bots. So we can see here the list of those social bots. As you can see, because I asked for some color in there to help you recognize them. But some of those were clearly saying that they were social bots either on their username or others, they were saying it in their description. So that, so clearly they are not here to fool anyone. They are really saying it out loud that they are social bots. Something else that was kind of surprising is that some of those accounts were referring to drugs, with chemicals popping online and stuff like that. And other ones were only talking about arms and guns online. So that was quite surprising. I put some pictures in here so you can have an idea of what they look like. You could go on and check them out. Last time I checked they were still alive. They were still on. But it's a possibility that they could have been suspended since. I don't have control over that. So I did grew up the list because I did include all the accounts that had the score of 4.8 and 4.9 simply because others said you did so. And because as you can see, even an account that has 4.8 core still says itself that it's a robot. So, all right. So for the quantitative results, after doing my interview, coding them, analyzing it, I did end up with some really interesting themes. And the first one is about social bots, obviously. And the first thing that the experts said about that is that we need to remember the notion of intentionality. What we mean by that is that sometimes when we focus too much on researching robots, we forget that they are some humans behind them. Because yes, they are scripted, they are robots, but somebody wrote that script, somebody created them. So we need to not forget that. Also, the mixed impact. What we mean by that is that it's not because you do find a bot online, social bot online, that automatically it's disinformation, all they're saying is disinformation, or that they are trying to do political discourse online at political interference. So we really need to be more careful about that. And the final one is about the legal and illegal dues. Because yes, I didn't knew that and didn't saw that in the scientific literature. But yes, some bots are going to try to use some popular event like a political campaign to spread out and do some kind of marketing or some kind of propaganda about their illicit product online, so they can have more views basically. So this is what I did found in my sample, even though I was searching, we need to remember that why I did found those accounts is because we did posted a lot of tweets, including hashtags about politics, but they end up selling drugs, selling arms, guns, sorry, et cetera. So that's something that was quite surprising. And also the other team was about trolls. So what they were saying about this one is that it's getting really easy to go wrong about them. And what I'm being by that is that we could getting way harder to a difference trolls and bots. Because yes, just like one expert said, they did found, they were trying to find some bots online, and they did found one account that will posting more than 250 times a day. And they went out meeting that person. And he was a guy who was really not liking just saying to do and really didn't want him to be to win the election. So he was just constantly posting stuff against him. So that's something that we really need to keep in mind, even though some accounts didn't have a score of five on five, we need to be careful on the attribution that you do on their account and on who they are. Also, the other thing is the reactive measures. So something really interesting is the fact that yes, like I said at the beginning, the social media platform are putting out some measures and they are spending accounts that are highly possibly social bots, or they are putting this information online, etc. But the sad thing is that they are often doing it out of this too late. So after they have posted all those stuff online, just like we saw in my sample, a lot of those accounts were suspended. So yes, they were suspended, but they were suspended after the election. So they had the time to post everything they wanted to post during the election campaign. And after that, they were suspended. But after that, it doesn't really do something. It's already done. But the other thing is we need to keep in mind the perverse effects of sustaining accounts. Because like the expert said, we tend to see more and more users going on alternative social media platforms that are really like free speech marketing approach like GAB, Parter, and other times in other ones. Obviously I could do a whole talk about that, but that's something to keep in mind. Is that really the best solution? And also the legislative level, because yes, there is not enough laws about this. We're only counting and we depend on social media platforms to do something about this. So the problem, disinformation, political interference, etc., is on social media. And we depend on social media to do something about it. So the next objectives. So this one was really to give you an idea of the content that they put out online during that campaign. So we did got a lot of results on this part of the research because we did do a lot of analysis. So the first result that we got is that the most surprising to me was the fact that 72% of the accounts that created 5 on 5, so their highest score, actually were only posting in Japanese. And also any 7% of the content were actually in repeat. So what that means is that clearly the social platforms were there to repost stuff and not to create stuff. And also we did find, found a lot of links into the content. And we did found that three accounts, Jersey Party, Nostalgia and Kindness Boost, were three accounts that to this day are suspended by Twitter. And they were the generator of content in my sample. Also we did found that through the hashtag analysis that many, there were clearly more hashtags that were in favor of just going to go then for Andrew Scheer. So we can see the results here, but the time is running, so I won't go too deep into that. If you have a question about this one, it will be my pleasure to answer. So for the quantitative results, so we did the more important thing that came out of this one was the tactics because we did find, there are plenty of tactics online about this information, political interference. I did find out a total of 16 tactics, but clearly there were two tactics that were used in my, in the social but that I found. So the first one is to share and comment a lot on the same profiles with the goal of spreading out in helping to have more views on that specific account. So just like we saw three of those accounts are to this day suspended, but they were clearly the generator of content. The other one is to do this critic, the election of a candidate. As we saw in the 2015 election, presidential election, clearly they were trying to help the election of one president to be determined of the other candidates. So and we, if we look at the hashtag analysis, we do can see that clearly there is one, they were the social but that I was trying to help the election of one from the day through the detriment of the other one. So just like every other research out there, there is some important limits in my study. The first one is the fact that I only use a sample of 5% of the users, even though they were the users that posted the most, etc. It still does have a good implication on the attribution and the interpretation that we can do on my result. So it's not because I did found some results that we can just generalize them on the entire campaign and are that we can take for granted that those results, that because we did find some hashtags that were more, because we did find more hashtags on that were in favor are just going to do that this have somehow influence the end result of the campaign. So for the further studies, I would love to invite other researchers, practitioners to work together and to compare results because in the scientific literature, it's clear that we see a sophistication in humanization of social but, but we don't know exactly how. So this is working together could really help to do so. And also, at some point, we will have to focus on looping in those social but not only detecting them, even though we don't have the localization. If you really want to make some hypothesis on political interference, and if we want to know where those social but are coming from, we will have to put some more interest into that. So thank you so much for listening. I really enjoyed doing this talk. I hope you will, I hope you won't hesitate to ask me your question. I will be on this part too at the day and I will be at 2 p.m. on the discussion panel. You can find me on LinkedIn. Here is my email for other questions and obviously it will be a publication about this that's coming really soon. So if you follow me, you won't miss any of that. So thank you so much for listening and have a great conference.