 All right. Hello everybody. My name is Master Chen and you are watching or listening to Twitter word frequency. Welcome to my talk All right, so let's start with who I am or who am I I'm a prior b-side speaker where I spoke about being a con man and working in Vegas surveillance I spoke at Defcon on VoIP security and as well as a DC sky talks on automate your stocking, which we'll kind of cover a little bit about today and and Most recently the recon village in the last few years where we talked about stocker in a haystack URL shortened by any other name which was last year and of course this year and this talk just It's virtual now You could find me at Chen box now that is my handle on on most social media. So so feel free to follow All right, so let's get into this and let's start from where We are at this point in time and and that I think that'll that'll help us understand Where my mindset is when it comes to comes to this talk So of course I started with automate your stocking where you know, I was digging into my Twitter research This is bad and we learned that during the DC sky talks a couple of years ago So the following year I did stalker in a haystack where we Figure out how to not be stalked or at least we can detect cyber stalking And that was that was good And now I've gotten recently into political sports betting or you know market speculation Which is part of the story And profiling I mean profiling like social media profiling on different accounts And of course I do have a degree in psychology. So all this kind of wraps up into where we are at this point So these these five pieces Are what kind of has has you know gotten around this this research here? So about this talk I'm gonna start with the original goal and how I came up with what I was researching at that at that moment Why that failed and why? We move on to what it is currently now as I'm presenting this and what I would like My current research to be in the future So before we get into all of that though, let's talk about caveats and disclaimers and warnings. Oh my I'm bad at that. Anyway first caveat or first disclaimer is For those who have followed my work before you know that I've said I anal or I anus Which is I am not a lawyer and I'm not a stalker. I'm gonna add something new to that because I'm very new to the whole data science thing I've only recently gotten into Using pandas and scikit-learn and and numpy and all these different data analytics tools So I've added I've added Inaz. I am not a data scientist I'm not a lawyer. I'm not a stalker and I am definitely not a data scientist, but I'll try my best So this project is a work in progress And I'm hoping that after I explain kind of what we're doing here today That maybe I can get some help from the community to further this project or at least now that I've defined the basics I can continue and make this something very strong This is a neutral tool I'm not I'm not building what I'm building today for Stalking purposes for you know, this is just a tool you use it how you want But of course as always I'm not gonna be held responsible for for what you do with with my research And I did not stand on the shoulders of giants for this research what I mean by that is I Know that there's a lot of research already in the space when it comes to frequency analysis word frequency analysis You know sentiment analysis NLP not neuro linguistic programming, but natural language processing I know that there's all that research out there and I I barely used any of it. So I would like to integrate that later, but I just want to let you know that for this particular talk That was really not used so It is definitely a work in progress All right. So again for those of you who know me. I'm not a sports guy I've only recently gotten into hockey and you know, I've always been a UFC fan But other than that I'm not a sports person So I've never been into the sports book There's one thing I know about being in Vegas for for my entire life is that there's a sports book for every market It doesn't matter what the market or what the speculation is There's a sports book for it or or a betting market for it. And that's kind of where we start our story So if anybody's ever heard of predicted org, it's a market for political sports betting So you can bet on you know, who you think the next president's gonna be not just in America, but globally, you know Pick a country pick a state pick it, you know Impeachment hearings Who will run the White House who's gonna be the VP pick all of these different markets are on predicted org And so you can you can make a fair amount of money if you're if you know what you're doing Or you can just have a casual bet and and just be right with a few dollars and cents and so and specifically we're gonna be talking about President Donald Trump's tweets as that because kind of a famous thing here And and how it it pertains to the political sports betting market Okay And well, let's get into that now because right now we're talking about pi count Com which is a tweet counter that's out there on the internet And it keeps track on a weekly basis of the president's tweets as well as the VP the White House and the POTUS accounts And this is all geared towards the the tweet market that predicted offers or used to offer and we'll get into that in a second but We're talking about Twitter, you know Twitter markets again This is on a weekly basis the question or the market is How many tweets will the president tweet out for this particular week starting from Wednesday of this week and going on to? Wednesday of the next week it could be anywhere from, you know, 50 Which is which is highly unlikely all the way up to you know, 300 400, you know, and and that's that's what I was analyzing the forehead Now I wanted to know if the profits could be based off of analytics and you know, could be Could I anticipate what he would be tweeting based off of current events news drama Anything that's out there in the global space and the goal the goal with this original research was to get the president to pay for my energy bill So if I made enough money on this predicted org market Could I pay the energy my energy bill at the house? With with Donald Trump's tweets. That's that's where I was going with this whole thing. So again prior research is The frequency of the tweets how often he would tweet in that week and and I was capturing as you'll see here In the CSV file on the right side of the spring here. I was capturing On a weekly basis. How many tweets how many tweets did he do in this week? Well, we see 68 in the week of April of 2019 we see 137 140 141 so these are all A rolling tally or running tally of of his tweets and of course you graph it using Using Jupiter a Jupiter notebook, you know, you you graph it and see the frequency here and then you ask yourself, you know was this peak here due to impeachment drama was this due to some sort of scandal or emergency in the Middle East What caused these peaks of tweets? Was it just somebody getting on the president's nerves? These were all the questions that I was asking at the time, but there's a problem Predict it org actually stopped their their tweet markets. So while well predicted is still around They no longer offer You the ability to to bet on tweets and so all of this research has kind of just stopped You know, so they changed their offerings They they started to offer other markets that just weren't working fast enough You know, these other markets aren't resolving on a weekly basis and so my profits slowed to a halt And so because of that this whole project that I was working on as far as tweet frequency It it kind of took a pack a back burner and I kind of forgot about it for a couple of months But it got resurrected and so the project lives and I'll tell you why so I Tweet on a regular basis and I tweet very vague vague things And that's on purpose. I would say it could be a zen of the day quote. It could be Something obscure or just something that's on my mind, right? Because that's exactly what Twitter is used for you You post something just because it's something that you're thinking about and then you just let it out there It's not always political. It's not always socially driven. It's just a thought that's on your mind. Well, I Had one of my followers Kind of get a little bit offended or had a little bit of ruffled feathers here with one of my tweets and So he kind of got mad at me and you know, what? I Wasn't being political but the vagueness of my tweet allowed this particular person to Project their political ideals on on my tweet. So I thought about that for a while and I could have gotten mad and you know gotten back and forth with him But I didn't and instead like I don't get mad, but I will turn you into research and and that's what's bringing us here today so again the target projected their beliefs on me and the meaning of my tweet even though There was no correlation there. There was nothing to draw conclusions from okay, and so Blindsided by rage meaning that I had no idea that I that this was going to be happening But I asked myself could I have seen this coming? Could I have predicted that maybe what I tweeted would result in a certain outrage from not just a certain demographic but for maybe specific people and This is still the work in progress So Who are we targeting with with what we're working with today? Well as as stayed in the last slide we're working with political opponents or we're talking about political opponents and not because we're trying to be political but just because This is a use case for what we're doing. This is a scenario Corporate marks we could use what we're going to do today for you know spear fishing or maybe password lists or some sort of targeted Execution of of an attack Cyber bullies, you know there are plenty out there and wouldn't you want a leg up if you knew the type of vocabulary that they were using the Way in which they tweet the sentiment behind their tweets and maybe you could You know work for a better outcome Against your cyber bullies because yes, they are definitely out there Now sadly I have to address this This also has some stalker interests But remember what I said before and in my previous research Stocker interest. This is bad while it can be used for stalking. I do not condone it I I think we've we've crossed that already. Yes Okay, so what are we talking about? Well, everybody knows Twitter is a goldmine for oscent and recon. Okay, and and it's not a secret It's it's very well known and it's it's that way by design Not everybody has a private account and and it kind of ruins the fun if you are on a private account So Twitter we know as a publicly shared sentiment if you're gonna put it on Twitter, you have to expect that It can be analyzed. It can be scraped. It could be mined. It could be Used for research such as today's research So what we're doing though is we're making an at-a-glance picture of what this profile a target profile Is about what it's all about. So You know the term at-a-glance what I'm talking about here is instead of Instead of having to scroll through an entire timeline to understand what this particular profile is all about What if we could just take a look at a dashboard or or just one screen and understand the sentiment of that? Twitter profile that's that's something that I think would be very valuable Why why are we doing this? Well? Because I can because we can we're hackers and this is what we do We look at something we look at a problem We look at maybe a scenario that has recently happened to us and we say okay I know how to fix that or I know what I can do about that. So yes, because I can because we can remember we're hackers All right, and we also want to do this because this gives us an insight into tendency of the profile Okay, again, and the goal is here to to find maybe quicker red flags. Maybe catch that outrage Before it bubbles up and blindsides us Okay, and of course as with anything we can use this for later weaponization We could turn this into something that could be an offensive tool if needed, of course We're talking about password lists further social profiling Maybe psychological profiling and troll bots, you know, maybe they could have a conversation with themselves Now how are we going to do this? Well, we're going to look at a couple of things We're going to track a small sample size of the Recent tweets in the timeline and an ongoing sample of course as we add that programmatically And we're going to be doing our analysis on word frequency word choice retweet frequency And hashtag usage. That's actually of course as we know it's pretty important Okay All right, so let's take a look at the code. But before we do a couple of notes here And you can play the game at home actually so now by the time that you're watching this This should be available and public for you and your consumption so you can follow along with what I'm doing here In the next few slides, so I'll go ahead and give you a minute to to take that link down And we'll go from there Actually, there's a couple more points here. So It's just basically my github handle and you know the name of the project twitter word frequency Now we scraped I scraped this Information with python and it's all python all this is python Um, but I scraped with an actual python file And then I use a jupiter notebook for more analysis and there's a reason for that as well And we'll see that in the next couple of slides In the background here, you'll see your code snippet of the actual script, which pulls the timeline Now there's a couple of things here. It either Checks to see if the timeline has already been saved on your machine And if it hasn't been saved on your machine, it then goes out using the twitter api To grab the latest Timeline of your subject or of your target Okay, now in the foreground here, you're going to be seeing A snippet from the jupiter notebook and I used the jupiter notebook because Maybe I wanted to run a little snippet of the code without running the entire script So a lot of the code is going to be the same but with With jupiter notebook, you know, and I'm sure everybody who's in data science already remember. I'm not Um, you you already understand that you can take these pieces of or snippets of code and just run those individually and change things as new data becomes available to you All right, so let's talk now about the analysis. Okay, so this first analysis is actually Myself. This is my own analysis on on my handle at chinbox Um, so on the left side, uh, you'll see a couple of the words here that I've used that on a regular basis, of course with with the most current recent events you see Mask is used there six times a twitter, okay, because I've I've had Opinions about twitter on twitter. Okay, so that's that's there. Um, now you'll notice that these red blotched out Pieces are actually These are twitter accounts. Okay, and and how often I've referenced these twitter accounts So this can be a way that we can again we're talking about at a glance We can see who we're looking at who We're associating with Quite quickly and and right right up front. Okay On the In the middle on the on the top side, you'll see I have a red dot right next to Rt Rt medium course retweet and you'll see that my retweet frequency is about 54 times in a 200 tweet sample That's what that means remember. This is out of out of 200 tweets 200 tweets 54 of those were retweets of other people and of course All these twitter accounts that are blotched out Um, I think it would be safe to assume that if I'm not mentioning somebody I'm definitely retweeting then At the bottom, you'll see 26 38, which is the word count associated with the last 200 tweet sample of my own personal account I'm using mine as a base as a baseline And on the right side You'll see my use of hashtags and of course its frequency and at the bottom is the Um association with the histograms. It's a histogram associated with With the hashtags there at the top At that and there's there's my handle. So, you know that this particular slide was for for my own profile analysis Okay, so, um, remember that person that I said got kind of angry with my vague tweet Well, this is uh, this is the analysis slide of of his account. So You'll see that there's a couple of differences, obviously Because we're not exactly the same On the left side, we'll see a couple he's he's mentioning a couple of people who are blotched out Not as frequently as maybe I do but you know, these these profiles are in in the scan that I've seen You'll see his retweet frequency is 178. So so out of 200 out of 200 tweet sample We're talking about 178 of those being retweets retweets and I also hit the A red dot on the word I because I was just kind of curious as to see How often somebody is Um is using I in a sentence. I'm sure that there's a psychological profile Based off of The use of that word, right? and at the bottom we see The word frequency 37 88. I know that there's math that we can do on this word prop or on this On this word count as far as Calculating, you know, the types of words that we're using How big they are how small they are etc. And and another interesting thing about this particular analysis is The histogram on the right side Indicates that this particular individual only Tweet or only uses a hashtag once like One and done. So I found that to be interesting Again, this is for the at my target. This is not The actual twitter account obviously But I thought I had to identify it somehow Now lastly, I'd like to you know, I use myself as a baseline. This is the targets um dashboard analysis and This next one is Actually, uh president donald trump and we'll see of course a couple of uh, of of key features here on the left We don't see the word I actually we do but it's no more than 20 times now For those of us who might think that uh, president trump is a little bit of a narcissist We might be kind of surprised To see that he's only tweeted the word I about 20 times in a in a 200 tweet sample at the bottom there. We see 3400 words. So these these are all, you know, kind of interesting the same stats He's retweeted only 96 times In a past 200 tweet sample, of course This piece is not so Surprising because we know that he uses a lot of his own words In his tweets, um, but uh, back to the left side really quick. Um, the only blotched out account there Um, I don't know if anybody wants to take a quick guess at what that is. I know I can't Wait necessarily for your answer as as uh, this is the closest that we can get to um audience participation at the moment But uh, yes, if uh, if you guessed that that was his own account you are correct So he's what i'm saying here is he's mentioned himself in his own tweets about 46 times So that might be a an indicator of narcissism. Maybe tweeting in the third person will have to analyze further for that And on the right side, we'll see um, again our histogram and and hashtag matching so Um, uh, president trump has maybe tweeted a couple of things with hashtags But of course the recurring one is maga. Is anybody surprised? I don't think anybody is I'm not And actually that would have been three if he didn't add emojis to that Third maga there at the bottom so again, my point is All of this was programmatically programmatically excuse me. Um, all this was programmatically scraped and and put together so that we can at a glance take a look at what A target is talking about And and this is done. I understand that you know, this could be easily done through apis and other research This is my own Basic whiteboard. This is my own started from scratch analysis of these targets myself President trump because of the sports, uh, bedding And of course somebody who got really mad at me for something that Was really vague Okay, so now that that's uh an overview of the data really quickly. Let's take a look at the insights here So some of the insights um It seems like of course Retweets indicate shared sentiment. Well, uh, but the question is From who who are we sharing a sentiment with? This might give us insights, of course into Who a person is associating with or how frequently or How how closely related these ideals may be that they share Depending on the retweet, of course And of course we see narcissism in hashtags. So you may see a lot of A lot of your your own actually, okay, so there's a couple things here. I did forget to include Uh a screenshot that I have of my target tweeting himself like his own handle in In the hashtag so his hashtag his own handle And I I found that to be very narcissistic Kind of interesting, but that's something that I wanted to Piece in there. So I do apologize for not having that screen cap in there But I found that to be interesting And that's why I put that in in here in the insights um and Repeating hashtags to me, of course, again, this is this is kind of obvious when it comes to somebody who's a regular twitter user Um, it depicts brand or focus, you know, and of course we see that with with the repeating hashtags um Now I've noticed that word frequency um the word frequency count significantly drops off after stop words so, you know, I have a rudimentary introduction to uh natural language processing so I understand a little bit about tokenization and What stop words are stop words being like uh in the very small Words that kind of glue the english-led language together right And so again after after those words are scraped off the top the the frequency of those words of other words Really drop and of course for those who are linguists May already know that Now, um, obviously with what we've uh with what I've presented today We know that descriptive statistics are as far as we can get Without any sort of further, um analysis with better tools better tools being of course, uh, neurolinguistic Not natural language processing. Maybe some other types of data science tools and Techniques and so that is where this work in progress is going to be heading next Okay, and so speaking of which, uh, where do we go from here? Okay, well, uh, of course I want to clean up the data collection data collection methods right now It is pretty rough and I'd like to turn that into something that is uh easy to read and and followed very Uh very easily Now better data visualization. Um, I quickly grabbed screenshots together Um, but of course I'd like to incorporate word clouds and other charting. Maybe better charting, right? Um dashboard of data, I'd like to clean that up so that again the the whole goal is an at a glance for For a twitter analysis Okay Now what can we do with this? Well, we could talk about a profile scoring. So for instance, um Leaning's whether it's a sentimental leaning a political leaning a corporate leaning. I'd like to see if we can we can understand that At a quick look The potential for extremism If if somebody's getting mad at you for a tweet that was vague enough for them to project Meaning onto it Then then what is the potential for somebody to go kind of a little bit more off the rails? That's something else that I'd like to look at Um, and of course, um ongoing me being a psychology buff You know ongoing psychological profiles and red flag notifications If there's a red flag with anything that we see here in in their timeline Can we detect that and of course? Um, I am I'm almost certain that there's already research out there I'd like to continue looking at that myself and and building from there So features to incorporate I've said it a lot already Um natural language processing. I think that'll be a very uh a very good thing to add in here sentiment analysis also part of part of that space a mirror but Really quickly what I want to talk about there is is um If we know how our target speaks can we make a bot that kind of Talks to themselves. I think that would kind of funny to watch. Um Hopefully doesn't get anybody banned, but I think it'll be again. It'll be Funny to see just as a funny project. This is all just it's just jokes folks. It's just jokes um Now I want to build off of the prior research that I myself and the community has done Including my my stalker research and and anti-stalking research and anything else that the community has done I'd like to incorporate in what I'm doing now um And even though we've even though today I've only shown three profiles. I'd like to do a quick Mass script a mass scripted analysis of maybe everybody that is following me or everybody that I'm following and just see What kind of results are in their timeline? These are public timelines and of course, um, I don't want to Hit any rate limits. So we'll we'll do it slowly some future some future questions that are on my mind when it comes to this is Can I get a target to talk to themselves? Uh and agree or disagree? I think that'll be funny That's what I was mentioning with the the mirror bot Can a person have a conversation with themselves and disagree on their own? Maybe extremist points or non extremist points, you know Next question is, you know, can outrage be predicted, you know, that's an opinion that I that's something that I've brought up Prior or earlier in in this talk as well Could that be predicted based off of what is in the current events in the news and what's going on globally? at the moment Next one, of course any questions that the audience may have for me. I would definitely want to put that into this project So to summarize Who said nothing came nothing good came out of sports betting? I think I think there's some benefit here. I'm glad that I went to Tom's rabbit hole Uh now sharing vague sentiments can can still make frenemies, you know, apparently I've I was Not expecting that even with today's climate, but hey, it is what it is Um now remember don't get mad Uh, you know, don't get mad. Don't feed the outrage. Just don't get Don't you know, don't send the rage back just get productive Take what you have and turn it into research. That's been that's been my guiding light for the past five years Maybe longer. Um start somewhere. Like I said, this project is definitely a work in progress But I'm going to start somewhere and of course we build Uh, okay, so fork me There's the link again to get hub link if you didn't get it before. Um, twitter word frequency and oh, sorry I'll give you a couple seconds on that one A couple more And resources so the top link is actually another, uh, uh, it's college speech on, uh, Natural English processing and and and how that goes And then of course, um, oh these next two, uh These next two links are very good. I'll let you check that one out Because uh, those are actually a kind of in line with what I'm trying to do But on a massive scale for myself like I'm trying to do this on a massive scale What these last two links are doing specifically for Donald Trump's, uh, tweet account Uh, thank you very much and and that concludes my talk