 Hello everybody, my name is Master Chen and this is Can I Make My Own Social Threat Score. Brought to you by the Urikan Village at DEF CON 29. Alright, so before we get started, I'd like to say who I am and what brought me here. So I have been a hacker and a martial artist for most of my life. I have a degree in psychology from the University here in Las Vegas. And because of all this, I have become a teacher and mentor to anybody who decides to listen to me. But with all that, I branched out into writing specifically for 2600, associated with Telefreak. Hello, Telefreakers out there. And again, through all this research, I've gone out to become a con speaker in most recent years here during hacker summer camp. So DEF CON besides Recon Village, DEF CON Sky Talks, all around good fun. And, you know, my research has been mainly focused around stalking, anti-stalking, data scraping, and a little bit of phone freaking in there as well. Alright, so before we get started with the actual topic today, I'd like to put forth a couple of standard disclaimers. The first one, and these kind of carry over with me for quite some time now. The first one is IANAL, which is I am not a lawyer. Not that there's any legal terms today, really, but I still got to put it in there, I think. IANUS, I am not a stalker. And the last one is INANCE, I am not a data scientist, although I am trying to get better at becoming one. And my last disclaimer here is that I use Twitter and social media as a way to scrape data and test theories for psychology. And so if you are following me, you might be a guinea pig, so you'd be warned. But also, thank you very much. I really do appreciate the follow. Tell your friends about me. Alright, so before we talk about social threat scores, let's take a step back and talk about credit scores. That's something that we know about a little bit more, right? So we have a financial credit score. You might know this as the FICO score, which is a combination of metrics that determine your trustworthiness in finance or your risk in finance. This could be the age of your credit cards or credit accounts, the debt-to-income ratio, how many accounts you have open and for what amounts and so on and so forth. And these are different percentages based on their impact and weighted score. And we now recently have heard more or less in the news about the social credit system, predominantly in China, but it's starting to kind of show up here in other western countries in the United States. And this is something like, you know, rating your Uber driver or your Uber driver rating you as a passenger, or for instance, food delivery services. Those are rated with a social credit system. Yelp might be a good example. Most recently, you've probably seen this in the Black Mirror episode, I think in season three, not that, that. I got to go back and check and do a fact check on me, but I think it was season three or four, something along those lines. The point is that we are slowly becoming more and more familiar with this idea of a social credit system, not just financial credit. But what do these scores have in common? Well, they're both scores based off of the risk assessments to determine you as an individual of what your behavior is going to be. They also shape behavior. Since you know that a score is being placed on you, it shapes the way that you maybe address financial responsibility or address social responsibility. And both of these scores determine an at-a-glance look at a particular individual, a particular profile. With a simple glance, you can see what this person or entity is kind of all about, right? So it's an at-a-glance look. So conversely, what is a threat score? Well, a threat score, the way I see it is it uncovers imminent threats or unknown threats that we might not foresee at the moment unless analyzed and determined then to be a threat. And I see it as a tool for an individual to assess profiles and entities that are around them, whether or not they are a threat. And this will be a form of self-defense, advanced knowledge knowing who or what may pose a threat in the near future. And I see this as a risk. The threat scores are a risk score in the opposite direction where instead of a score being placed on me as the individual, I can look externally or, you know, we can all look externally to see threats coming up on the horizon here. Now, what I say is the threat score pays attention to what matters. In a sea of data, these metrics will start to rise to the top and you can pay attention to those metrics instead of looking at the data as a whole after analysis. Now, when it comes to presentations like this, I do like to try to answer the what, who, why, where, how we're doing this. And so, right here we see, you know, well, what? What we're trying to do today is determine through these analyses which profile out there would be considered a threat. We're going to use threat metrics to see, okay, is this person negative? Is this positive? Are they going to send a mob after us? What would they be doing? Now, who do we suspect will be coming up in this, in these metrics? Well, I suspect stalkers, cyberbullies, and just people who are generally espousing negativity and negative vibes. Why? Well, in general, like I said, I'm a martial artist and a psychologist or not psychologist, but armchair psychologist with my degree. But that's where the interest comes from. I see this as a way for us to have self-defense, advanced knowledge, and it's just psychological interest. Now, how are we going to be doing it? We're going to be using a little bit of descriptive statistics, finding out kind of a baseline summary of what we should be looking for. And then we're going to jump off of that to meaningful metrics. Again, what scores are the most impactful to us in these types of analyses and data scrapes? Now, where we'll be scraping this? Well, this is going to be scraped using Twitter. And the reason for that is because it is such a large database of publicly searchable sentiment, shared sentiment. And it's just a treasure trove of everything you can think of, psychologically, neuroses, narcissism. It's all there. It's great. It's a great way to really just do an evaluation on the human species. It's actually kind of enjoyable, and I kind of hate myself for it. Now, initial thoughts and hypotheses coming into thinking about this threat score. Well, like I said in the last slide, Twitter is a publicly available database of sentiment. And I believe that we can look at this database and see two things. Influence and motivation for expressing negative sentiment and promoting adverse behavior, adverse activity. I feel like we can uncover that quickly. And at a glance, right then and there, we know that this particular profile is going to be negative. I personally don't want to go through a timeline to determine that somebody is a threat. Now, that's not to say that I want to judge a book by its cover, because I don't believe in that. However, if I can get the computer to read it for me and then just tell me what I missed in a Cliff Notes form with reasonable accuracy, then I think that's all just the same and just as beneficial. All right, so in thinking about the metrics, just being a user of Twitter, these are the metrics that I came up with that seem to be what I would consider very impactful in devising a threat score. So on the left side, we go in clockwise order. The bottom one is follow ratio. And I was thinking, you know, this was a score that would go from 0.0 all the way up to infinity with 1.0 being kind of a medium base. And what I mean by that is this would be followers versus who is following you. So taking my own profile as an example, if I'm following 1,000 people and about 1,000 people are following me back, that's a 1 to 1 ratio, a follow ratio. 1,000 followers divided by 1,000 following. That would give me a ratio of 1.0. So what that tells me is that the person who has a 1.0 ratio is following their followers and vice versa. So it's more of a mutual follow ship or friendship than it is somebody having a high influence on the public. It seems to be more of a close-knit circle at a 1.0 ratio. And that's why I call that the medium ground. Now, let's say if a person has a follow ratio of 200 using simple math, simple numbers, if somebody has, it's following 200 people but they only have one follower, 200 divided by 1, that ratio would be 200. 200 has a whole number. So I wouldn't consider that profile to be highly influential if they only have one follower. Again, we're using simple, simple numbers. Conversely, if I was only following one person but I had 200 followers, a metric on the lower side, 0.001, sorry, 0.0, you know, lower. So that would be a higher influence. And so that would be higher on the scale of something that I would pay attention to. The next metric here is verified percentage. So I would take the people in my follower list who is verified divided by how many total followers I have and that would give me a verified percentage of how many people verified are following me. I thought that would be an interesting metric to follow. Word frequency, and there's a screen cap of a word cloud. We have tweet frequency. So how often is a particular subject tweeting? Are they tweeting once a day, twice a day, once an hour, once a week? I feel like that frequency would be also a determining factor in how likely, well, one, how often and active they are on Twitter but also how likely they are to go in and share their thoughts with the community on a regular and consistent basis. Sentiment analysis is one that we spent, that I spent a lot of time on in this project. And so I thought it was a very, very big metric. Initially this is just something that I thought would be an interesting metric and it turns out that I was right. Sentiment analysis is a very interesting metric. And so I saw which profiles were more positive based, negative based or neutral. And last is the hashtag usage. Hashtag usage is something I spoke on at the recon village last year. But it's different from word frequency because when you use a hashtag, you're trying to get a topic or a particular word trending. And so what is that usage like for profiles of interest or influences? So again, these are the meaningful metrics that I came up with just thinking at the start of this project. So with any data science process, what is the process? Well, first we have to scrape all that data, scrape all the data of timelines, of our follower metrics or anything that we're trying to look at. We're going to scrape all that data, we're going to sanitize it and normalize it to make it easily workable with our programming language, which by the way I was using Python for this project. So we're going to sanitize that data and normalize it to make sure that we take out punctuation, anything that is a little bit garbage in the sense of data and data analytics. We're going to analyze the data that we've normalized and scraped. We're going to gather insights and if we need to, we're going to rinse and repeat. So if I've come up with an insight based off of the data that I'm looking at and I want to analyze it further, I would go ahead and maybe go through the process again. And these screen caps on the side here are a little bit of the descriptive statistics that I saw. And so that was just a cool screenshot that I wanted to share. You'll see the average following rate, the average follower rate, the average tweets of the profiles in my followers list and so on and so forth. So yes, these are all based off of my own personal followers. Please don't unfollow me if you are following me. I really appreciate you as a data point. Also, when it comes to sanitizing the data one thing that I chose to leave out when it comes to determining influence is I decided to leave or influence and also threats. I decided to leave out corporate accounts and accounts that are bots or something of a business purpose because I just did not see them being any sort of potential threat, you know, as anything that I've seen out there. I don't foresee, for instance, as an example, a KFC or some cybersecurity firm to advocate for a Twitter mob to go after you. Now, I could be wrong, but I haven't seen anything like that coming from those types of accounts. So I decided to leave those out of the influencer list. Now, after I've come up with the influencer list and then, you know, looked for any sort of sentiment in their profiles, I then could, in step two, look at accounts externally of my own followers, accounts that are just out there in the wild, public accounts that I could analyze and see if I could determine any negative sentiment in those profiles. Those would be notably, you know, reporters or journalists, politicians, celebrities, just people of high interest or high influence. And speaking of which, speaking of high influence and celebrity status, this is what I call the Tegan baseline. So, Chrissy Tegan, for anybody including myself, who doesn't really follow pop culture, Chrissy Tegan is a model and she's also the wife of singer John Legend. And she's been recently in the news for cyber bullying and using Twitter really as a sounding board for such negative behavior. And so that is the reason why I chose to use her as a baseline. There's no vendetta against her or anything like that. But it has been noted in public that she has been deemed a cyber bully. So I figured it would be a good thing to look at her profile first and see if there's something in her profile that I could find in profiles elsewhere out there on Twitter. So, right off the bat, and these are screen captures now from her account, right off the bat you see that her bio or user description is a demotivational speaker. So, I don't know if that's her being ironic, but there it is. That's something that she describes as what she is. So, okay, take that face value. You'll notice her negative percentage is about 17%. 17% of her entire timeline was deemed negative by the sentiment analysis. And then here on the right screen cap you'll see her top words. Notably, you'll see LOL is used as her top word. And I also circled you, meaning you as in, you know, the pronoun, as a point of interest because that shows some tendencies that I'll explain further on in the presentation. I found that to be very interesting. And it was interesting to see that in her profile. And I'll explain that further as we go on. But again, this is a database, or sorry, this is a baseline of what I might be looking for in other negative profiles. So speaking of the negative profiles, these were profiles that I put together that I saw in my scrapes of profiles that were generally negative, generally kind of spouting off undesirable or kind of mean sentiment, directed at maybe particular individuals or particular groups or entities or even organizations. But it was all kind of directed. These tweets were directed at somebody. So at the top, now these are three separate accounts. But at the top you'll see their word usage in a word cloud form. So the bigger the word, of course the more they're being used. And below that is the top words breakdown of the words that they've decided to use. All three of these accounts were retweet heavy. So they retweet a lot of what else was going on Twitter. But that was also something that I found to be interesting because these accounts are ones that share a lot of what is going on there. So they amplify the messages of others. So not just their own sentiment, but they're expanding on other people's thoughts or they're just sharing that sentiment, resharing and just pushing that message further out. And if you look down at the bottom, these are the screen captures of their sentiment analysis percentages. So with Chrissy Teigen being the baseline, I saw that these negative sentiments were kind of close to that percentage of Chrissy Teigen, you know, 12% to 15% and 12%. And so that was something that I found interesting. But something that I found even more interesting is that the positive percentages were also very low compared to either neutral or more positive profiles. The neutral and more positive profiles would be something of the mid-high 30% up to 40% and even 50% positive messages. But of all the sentiments, the neutral sentiment seems to be the most or the highest percentage of all. So not determining one way or the other, whether it is negative or not, it's just kind of neutral. So that's an interesting on the negative side. To contrast, we have a couple of examples of the positive sentiment. So now these are only two profiles, but I hope it's enough to show a difference in the word usage that these two particular profiles have compared to the negative sentiment. So up at the top again, you see the word clouds. Again, the bigger the word for anybody using or knows how to use these graphics, the bigger the word, the more that word has been used. And there's a top word to break down right below that. And you'll notice one does have a high retweet frequency, but that's probably just how things are. Of course, if you're sharing maybe positive sentiment, so that was cool. But also you'll see a different change in word choice or a change in word choice. So there's a lot more positive, what I would be considering positive words here in their list, happy, love, compassion. That's one to see there. And you can see the difference in their positive and negative sentiments. Like I said in the previous slide, these positive sentiments were closer to the 40% range and their negative percentage is very, very low. Lower than 10% of their timeline is considered negative sentiment. And then of course neutral being the highest number. Now the profile on the right, I put an asterisk next to it because I found this one to be very interesting. It is a Buddhist account. Not that religion has anything to do with it, I don't think, but it was a very positive profile with zero following. So what I mean by that is this particular profile was not following anybody, but they had about 20, if I remember correctly, it was about 20,000 followers. So they had a high following and it didn't seem like this profile was a bot in any way. But it was a very high following rate and it wasn't following a lot of other profiles and it still had a very high rate. So that was a very interesting point to note there. A very high on the follow ratio, which is interesting because I don't have that screen kept up there, but that was a high follow ratio. Okay, so we talked about, well, you said something about a score, right? Yes. So at the beginning of this project, I asked myself, could I make a threat score? And I feel like I can. Did I? I'll let you guys decide that. I mean, I don't have a quantifiable number, but one thing that I learned about this in doing this whole project is scoring is hard. And the reason why scoring is hard is because these different metrics that we're using, they don't scale the same way and they can't score the same way because you're looking at the same data at a different angle or a different perspective. So there are weights involved. There's percentages involved. And so I've looked at other threat models, like for instance, threat scoring for CVEs or threat scoring that some government agencies use to determine threats. And there are some complicated formulas out there to determine if an entity is a threat or not. So it was a lot harder than I thought it was going to be. And so it's something that I feel like I would have to analyze further. However, one thing that I can say is that these metrics, I can pull them into at least categorize them into severe to low polarity. So again, going around the clock here, a follow ratio, I would deem to be a high metric. So that'll fall under the high category. And I'll come back to the verified percentage and why I put an X around that. The word frequency I would actually put in the low category because the word frequency in these profiles didn't seem to be very indicative of negative or positive sentiment or negative or positive activity. Anybody who has a good grasp of the English language or any language that is being used on Twitter, words are words that really depends on how you use them. So I would put word frequency in the low category. And depending on what other metrics I may introduce into this threat model, I may get rid of word frequency altogether now that I think about it. Now the tweak frequency, we touched upon this at the beginning of the presentation, but it really didn't dig deep in the way that we did with the sentiment analysis. But I would put this in the medium category. That being because what you'll see if you download the Jupyter Notebook that I'll link available, is that this is an interesting range. And the range didn't determine whether it was a negative or positive person. So a very positive person could tweak just as much as a very negative person. So while it is still a good metric, it's a metric in a different sense of how often that message will be going out, whether it be positive or whether it be negative. So for instance, again, if you're tweeting twice a day on average, five times a day on average, that could be a good thing. If you're a positive profile, it could be a bad thing if you're a negative profile. And so again, I would say I'd put the tweak frequency in the medium range. Now sentiment analysis, as you can tell, this presentation spent a lot of time explaining that sentiment analysis and I would actually put that in the severe category. I, you know, looking through the data, I saw that sentiment analysis was a pretty big factor, of course, and it determined with a reasonable amount of certainty that a particular account would be threatening or not. And so that would definitely be in the severe category. The hashtag usage, I'm kind of torn with this metric only because there's, I would say, between the high and medium range for tweet or for hashtag usage, you know, because sometimes you're using a hashtag to get something trending, of course, actually all the time. But is that a positive trend? Is that a negative trend? That is to be determined in other ways. And you could take a seemingly positive hashtag and try to add that to negative sentiment or vice versa, taking a negative sentiment, you know, and adding it to a positive hashtag, you know, or vice versa. So that, again, was interesting. I'd say that would be about medium to high. So some insights and improvements on this project that I'd like to make. Well, some insights, let's start with that first. Like I said in the last slide, word frequency just isn't, doesn't seem to be a very big indicator. That was kind of surprising to me, because I think it's a bit of a reflection retrospect. But one really important one that I think is very interesting is negative sentiment in the sentiment analysis. Negative sentiment may also be self-deprecating. I've seen a couple of profiles that are very self-deprecating. And sadly, that negative sentiment or that negative percentage is seen very high in the data screen. And it was only second to Donald Trump. So that's interesting and kind of sad, but that was what it is, I guess. Oh, and so the U tendency, one thing that I found kind of interesting with the U tendency is a lot of the negative accounts, the accounts that I deemed to be negative, negative sentiment, they used the U as an external motivator, meaning, so to quote some of the things that I've seen go fuck yourself. You don't know what the fuck you're talking about. You don't know what you are talking about, et cetera, et cetera. And so it was very negative, very pointed at a particular organization individual. So that might be a metric. That might be a future metric that I want to incorporate in a threat analysis. And moving on to the improvements that I'd like to make to this project. I'd like to start tracking profanity. That is something that I kind of wish I did for this project, but I didn't think about that right off the bat. But yeah, profanity, tracking, I think would be very interesting. Word phrase searching. I started to do this for this project and there was a clean way that I didn't do it in a clean way in the Jupiter notebook, but it's something that I'd like to incorporate. So for instance, you'd be able to search for words such as, or phrases such as, you know, go kill yourself or bad, bad things like that. And of course, aside from the word clouds that we saw in this presentation, I'd like to graph these metrics a little bit better. I showed a lot of raw data and I'd like to put that in a better graphic so that, again, the goal is to look at a profile and at a glance know that this particular profile is a threat. You can do that with raw data or raw screen captures, but it would be much better in a graphic form. And lastly, one thing I'd like to do with the sentiment analysis is fine tune it a little bit. It was a very simple scale of, you know, 0.0 being neutral, anything negative of 0 being negative and positive of 0 being positive, but that was a very broad range and so if I tweak the metric a little bit better, I may be able to eliminate some of the false positives that I've seen while looking through the data and spot checking. So that's something I'd like to improve in the future. Now, with all that said, here are my resources and at the bottom there is the GitHub link to where you can find my project. By the time this broadcast, it should be public and it should also be sanitized so that you won't see any of the data that I analyze, but you may see, you may use this to analyze maybe your own Twitter timeline and use it how you feel, feel free to use it however you'd like. And with that said, I will see you all in the live Q&A and thank you for watching.