 So first year I've actually tried making the, you know, the slideshow in the presenter room and getting the slides on the screen and testing that out and it worked fine 15 minutes ago. So what can I tell you? Anyway, this is the SQL, no, the Twitter talk. So I'm Chris Sumner. I'm sort of representing a small charity called the Online Privacy Foundation where we've been doing some work to raise awareness or point out issues that I think are of interest and concern to this sort of audience in particular. This is my co-presenter Randall Wald from Florida Atlanta University. He came to the Q&A session last year after our Facebook talk and said, hey, did you guys think about machine learning? We said, well, we don't know the first thing about machine learning. He said, well, I'll sort of help you out. So it's really neat that he's been able to join us this year. So with this talk, we're really looking at the question of, well, can you or can't you predict personality traits through Twitter and, you know, indeed, other social media. We approached it with a sort of healthy degree of skepticism because, you know, this image here is a phrenology, which is, used to be pretty popular not so many years ago and it had a whole host of problems essentially. So we decided, as we say in the UK, to have a bloody good look and see what you could actually determine. So let me just explain the talk outline and you can see if this is the right talk for you. We're going to start here with sort of why we approached this. It's a question we've been asked a number of times in the media. Give an introduction to psychopathy, briefly talk about data collection and processing if people want more input, come to the Q&A session, share the statistical results, the results that we've got from the analysis and then spend a big portion of time talking about machine learning where we'll probably start off somewhat basic and go into something that I haven't got the foggiest way he's talking about, but you might and then finish up with some conclusions and I guess more questions. So why did we start this activity? Well last year we had a project that we spoke about at DEF CON called the Big Five experiment where we looked at personality traits in Facebook and there was a reason we did that is because we're seeing sort of figures and statistics like this. This figure here comes from this year's Jobvite Survey where it's looking at social, it's looking at recruiting essentially and what it's saying is that 48% of hiring managers are always using online searches so we were a bit concerned about whether they were doing that with any real sort of weight behind it, could science if you like support their decisions. Last year's figure was 45% so it's a slight rise so they're making judgments about people and we weren't sure whether those judgments were actually going to be accurate or full of problems. So that's what drove some of our research and also at the time and I think it's probably useful to be kind of open and give that context. At the time I just had a child and anybody who's had a child will probably, no actually my wife had the child but I had some involvement I think anyway, I was involved in anyway and let's move on but sleepless nights and I was wondering sleepless nights actually really affected me pretty badly and I recognise that I was sort of maybe acting differently on social media than I would have been normally and I was wondering whether that sort of stuff could get picked up by people and use perhaps against me as going this guy's a fruitcake we don't want to hire him. Then last year after our Facebook talk we saw this study come out called Hungry Like the Wolf by Professor Jeff Hancock at Cornell. Hungry Like the Wolf is a play on two things one of it's the wolf is a predator a psychopath is a predator but I also emailed him and said hey has that got anything to do with Duran Duran and you know expecting him to say you know go away you idiot and he was like no actually me and the sort of co-author really liked that but he studied the word pattern analysis of psychopathy in criminals but that attracted this headline which is the same sort headline as the talk in Twitter exposed psychopath killers traits and that's what we were that's the question we're trying to at least examine critically and have you guys maybe examine along with us. What his paper actually said was that these findings on speech begin to open the window into the mind of the psychopath know that he didn't say oh it detects psychopaths it's giving some idea that something's going on and might help our understanding of how their minds operate. Bruce Schneier was concerned that there would be a you know a high degree of false positives and quite rightly. Earlier in the year we saw you know this post in the telegraph can't confirm whether it's sort of true or not but it looks like it you know there might be something there FBI using Twitter to predict crime but that doesn't necessarily mean they were using personality to draw the link of crime prediction however there is a paper that came out this year called automatic crime prediction using events extracted from Twitter posts but that is also not personality driven that but it is language driven and there's been some studies into that with the London riots last year most notably from Manchester University. We were either thrown a bone by the FBI or you know in sort of light of the media that came out or it was just pure coincidence but this week there was a sort of FBI bulletin about language and psychopathy and psychopathy and how they sort of deal deal with that and this was a quote from from that piece which is available on you know on the web individuals language is one of the best ways to glean insight into their thoughts and general outlook so I don't know if that was pure coincidence I suspect it probably was. Crime prediction isn't new necessarily this is a system run by IBM to detect juveniles who may go on to become a problem for society we can look at that as being a bad thing but it might be not such a bad thing if you're intervening in somebody's life and preventing them from going down a really really bad path themselves and also impacting other people's lives so I don't think it's as clear-cut as saying this is really bad and spooky but maybe it is there certainly needs to be discussion on that I think. Next with psychopathy you know the sort of the media seem to sort of maybe fixate maybe that psychopathy leads to crime but there are a number of papers that show at least in terms of violent crime that there's still a high false positive rate so just because somebody's scoring highly in psychopathy doesn't mean they'll go on to commit violent crimes and then our paper got some you know sort of coverage in Fox News which we weren't actually saying at all as you'll bear my witness today that we're not going to say psychopaths can be spotted on Twitter so but maybe the answer is not less not so much black and white so let's have a quick look at psychopathy. There's a general public interest in this you've got here the Oklahoma City bummer the night stalker Richard Ramirez and of course you know Ted Bundy who's got a whole catalog of different crimes behind him there's also movies people have got a fascination with these psycho the shining Texas Cheng chainsaw massacre obviously science of the lambs is a good example my personal favorite if you've not seen it it's it's pretty good good film no country for old men it's a guy walks around with like an air canister and just pops people off it's like a cattle machine used for killing cattle anyway it's a great film go and see if you've not so let's look at PCLR which is a psychopathic psychopathy checklist revised from Bob Hare who's kind of like the almost the godfather of psychopathy to some degree some will argue it consists of about 20 areas and you'll score a zero for each of those categories if it doesn't apply at all if it applies to some extent you'll get a one if it applies and it's a reasonably good match then you'll get a two so you can score zero one or two on each of those sort question areas or emit the emit the answer if there's not enough evidence to make a decision and you can have up to five emits if we look at it and this is a way I kind of conceptualize it thinking about an equalizer or low more static than an equalizer most people are going to score fairly low so this is good news for society otherwise we'd all be bumping each other off I hope that translates if somebody scores high in certain categories it doesn't necessarily make them a psychopath they might be an unpleasant individual but they're not necessarily a psychopath or they may just be you know highly charged individual in certain circumstances however when you're scoring high in a number of those questions that's when you've got you know the settings for an interesting problem shall we say so speaking to so I went on the PCLR course earlier this year speaking to read my lawyer who's the author of the psychopathic mind he sort of categorizes it like this or moderate and severe between 20 and 40 although there's a number of psychologists that don't like the idea of a cut-off in general UK population we see about 2.3 male 0.9 female scoring you know kind of within the 25 or over kind of category this is a slightly different scale but it maps to PCLR in the prison community you see that there's a different distribution so this is from a Dutch study where you see between 22 and 37% depending on where you draw the cut-off so there's an abnormal distribution of psychopaths in prison. Psychopathy is really divided into two primary factors personality aggressive narcissism and socially deviant lifestyle and we're going to sort of whiz through these really and there are two there are four facets involved with that interpersonal and effective which is kind of regarding emotion and lifestyle and antisocial sort of burglary crime record and then you've got sex related stuff and relationships under just general so social deviants. So glibness and superficial charms move talking engaging charismatic slick if you watch the TV show Apprentice you'll see these sort of characteristics displayed pretty nicely if you've got I'm not saying they're psychopaths I'm saying that they're sorry but if you've got in the cab way you've got Steve Wynn talking about how good the Wynn is and how you know I mean that that is a good example of superficial charm and glibness I was checked to the taxi driver said does that get in your nerves and he was like tell me a story oh man so and this is one of the things that led to Herbie Cleckley's work The Mask of Sanity where he talks about there being less than meets the eye they deliver a good talk but they can't walk it essentially then we've got grandiose sense of self-worth which is really sort of related to narcissism the cocky opinionated self-assured these are the people that come to training classes that can be taught by the world's leading experts and they're going in with the mentality that this guy's not going to teach me anything because I already know it there's also sort of the concept of a narcissistic takeover where you're claiming claiming success for somebody else's successes so you know so if I was here saying look at this this is really cool they made this all because I was going to be here that would be kind of an example of a narcissistic takeover the fact that it's true is no I'm kidding pathological lying so people lie for different reasons but generally most people lie to manage anxiety in some way but your your psychopath is almost a born liar they're weaving elaborate stories some of them may seem implausible but they deliver it with that glibness of speech so that it you know it translates to it to a lot of people they're doing it all the time and that is almost a perfect toolbox for conning people so I don't know how many people saw the art of con but you know this is say you know a guy scoring high in those traits ripping off these these old people here and he's able to do that pretty well you know making maybe financial advice that's never going to benefit them in the life and leave them distraught penniless and homeless when they've got no opportunity to make any money you don't have to find the media to see that but just because they're like that they're still not necessarily a psychopath that's an important point but they're able to do this really well because they've got a sort of lack of remorse and guilt so they're not going to worry about what they've done when they get home at night they are completely devoid of feeling guilt but guilt is a different construct although it's related to empathy you can feel guilt from running a red light but not necessarily empathy we've also got the concept of a sort of shallower effect a sort of limited range of depth of stony emotional nurse so you know here's a psychopath displaying joy and here's a range of other emotions that the psychopath is displaying there are emotionalist beings almost this you could also look at as being this is a quote from Ted Bundy it says I'm the most cold-blooded son of a bitch that you'll ever meet you could also load the answer to the narcissistic factor as well but you know it really shows that the guy is cold-blooded and then in terms of empathy they've got a general lack of consideration or concern for the impact of whatever it is they're doing and a non-psychopath may look at the dog and think the dog's going to be better off you know with a new home or what have you but a psychopath isn't going to be thinking like that the empathy of the situation isn't even going to occur to them and if you want to read a good description of that there's a great book by Simon Baron Cohen and that name may be familiar because you know it's brother Sasha Baron Cohen or Barra or Ali G or whoever turns out they're both pretty smart and this guy's got some really good this book is excellent if you want to understand more about sort of the general differences between guilt and empathy then you move into this sort of failure to accept responsibility so here a guy is sort of describing what he did doesn't a very detached manner almost like he had no choice I was pissed off he stepped into my space I did what I had to do I had a similar experience buying a soda this morning when I was asked do I want a banana or do you know part of me wanted to to say you know if I wanted to buy a banana I'd have bought a banana as well a psychopath would have maybe taken that step further and smashed the ladies head onto the to the desk possibly I was able to stop myself doing that fortunately so failure to accept responsibility this is also seen in sort of corporate America and that sort of context where this is a quote that's related to the recent LIBOR scandal in the UK where they've been fixing interest rates it was described by one of the senior managers that's no longer senior manager is an unfortunate series of events and that will be a complete failure to accept responsibility for their own actions and you'll see that in the corporate context a lot as well and if you want to see just how ugly that can get in the corporate context then I really recommend reading the book snakes in suits by Babiak and hair who were sort of the the leaders if you like in talking about psychopathy a need for stimulation you know we can look at a skydiver here although evidence suggests that skydivers the risk doesn't necessarily translate into other stuff they're doing so they can make reasonably good financial judgments for example this is more like Vega style need for stimulation but it's not the odd spliff it's it's doing this all the time and having a real fascination for this sort of stimulation another way of considering this if you look at a number of shooting kind of related murders and crimes it's not uncommon to see them describe it as being well I was a little bit bored I wanted to mix things up and just see what would happen and you know you can Google for that and you'll find you know some interviews like that there's a parasitic lifestyle so they're leaching off people and they're going to move from person to person to to meet their basic needs of food and shelter essentially and money which they'll burn on the sensation seeking stuff that's why they're so kind of manipulative people and you'll also see that in work as well they're quite happily let others carry them them along but then they'll then use that grandiose sense of self-worth to have a narcissistic takeover and claim success for a project that they've pretty much done nothing on so kind of you can watch out for that the astronaut here is a metaphor really when questioned in prison settings the psychopath will often not have a realistic idea of what he's able to do maybe wants to be a CEO or something where he's probably more likely to be delivering pizzas but not for long because he's probably not going to hold his job down for too long before he gets angry and wastes somebody this is from Gary Gilmore guy who has been you know executed since and one of his quotes was I wasn't thinking I wasn't planning I was just doing it was very impulsive irresponsibility so repeatedly stoned at work not honoring deaths fired driving under the influence but doing this all the time a consistent pattern in your life hot headed hot tempered there's a caveat on rage boy it turns out you know maybe more of a media thing picking up on him but it's you know it's still a reasonably nice image early behavior problems here this is ages 12 and under history of sort truvency robbery vandalism class destruction bullying cruelty to animals when I talk about cruelty to animals one of my previous co-workers who worked in sort of the psychiatric industry said oh yeah in in terms of cruelty to animals these guys you know they had one patient where it buried a cat up to its neck in in the lawn and ran a mower over it it's that kind of cruelty to animals not burning the odd ant so it's it's pretty bad already at that stage and then it progresses into juvenile delinquency where you've got violence fire-starting murder attempted murder rape and you're seeing all of that sort of stuff over and over again so now you can get the idea that psychopathy isn't just because somebody tells a good lie they're going pretty well beyond that they're probably going to skip bail escape from prison try and escape from prison these are the factors that aren't loaded so promiscuous sexual behavior not you know one or two instances but continuous pattern through their lives many short-term relationships so you know use the image of a player and this was my favorite from the course it's called criminal versatility so you can't just be a good murderer you've actually got to have sort of theft robbery kidnapping awesome you know so six of these will get you a score of two or four or five will get you a score of one so it's not just how you've committed a few a few robberies so psychopaths are more violent they commit more crimes they're more likely to use a weapon in their crimes their violence is both predatory and effective that means they're going to react to something and they're also going to go after people they're likely to go after strangers because they can smell weakness they've got no guilt so they're really the perfect predator so the images a nice match there's discussion that there there's not really any treatment for psychopathy or there's a school of thought that says that but it gets pretty interesting there's a guy called Jim Fallon who's spoken at Ted Talks and he'd looked at sort of brain images and these are the kind kinds of images they saw so the control set here there's more brain activity in the frontal lobe for a murderer there's there's less it turns out his brain was like the one of a murderer so that you know this spooked him a little bit there's more there's more sort of evidence emerging here this says you might think this guy is being a total tosser says Dave Owens an officer for 19 years but when he describes the abuse he had as a child from his father the jigsaw fits and this from a study that's come out in in Norway and is gaining some traction now as well all offenders have experienced gross instability neglect and or abuse from the family of origin so this is either extreme neglect or extreme over controlling and that leads to a sort of a metaphor that Simon Baron Cohen uses in his book we talks about the pot of gold so the formative years of a child's upbringing can influence even the brains you know the brain structure having the having the decent upbringing can reduce the possibility of them going off into a sort of a life of violent crime so there's a there's a lot of research going on in that space now to look at is there some way of essentially an intervention so let's talk quickly about the data collection we created an application the application essentially asked a number of questions I like to get revenge on authorities I avoid dangerous situations that one's reversed payback needs to be quick and nasty people often say I'm out of control it's true that I can be cruel people who mess with me always regret it I have never gotten in trouble with the law that's also reversed I like to pick on losers and I'll say anything to get what I want so we took that and a whole bunch of Twitter attributes from 2927 participants of the study and essentially analyzed that on a little HP microserver running a bun two catch DB with some custom Python scripts to call it the Twitter API now in terms of the answers that we saw the analysis that we saw we saw this distribution which shows that most people fortunately are not high scoring on the the psychopathy scale if we look at that the GB study earlier the Great British study in an hour as you can see a kind of a similar curve almost although it's important to know that we use different measures and scales so there's some variation in terms of correlations between the big five traits what you see here with the blue bars was a study conducted by Dell Paul Huss at British Columbia and the green bars are from from our study so we see for example agreeableness we see similar direction in you know negative correlation and the same with conscientiousness you know that there's a difference in your autism but this could well be because of the measures that we've used. Interestingly though we saw the same sort of correlations when we looked at the attributes in in in Twitter so that is either pure chance or add some credibility so we end up with attributes that were significant were geo enabled number of tweets notifications fairly weak but but nevertheless significant the cloud score for psychopaths was 0.051 for narcissists it was 0.088 so a little bit higher as you might expect for linguistic analysis we got swearing anger death negative emotion filler words you can see that the significance is higher than is for the attributes there and in terms of negative correlations meaning they say less of these things positive emotion family work work relates to the parasitic lifestyle so that's kind of interesting and the use of we because they're not really social creatures so they're going to use less of we us our because they're you know that their lone wolves. This was an area that we found pretty fascinating so we included this into you know coded this into our data set as well when we looked at low scoring psychopathy we saw people tweeting around this sort of frequency. Of course there are differences but generally we saw around you know around this area when we look at high scoring psychopaths we see much more tweets when we look at those tweets they're more angry tweets as well. So we saw see much more erratic tweeting almost and that may be related to sort of this sense of boredom or what have you but overall it's interesting the results are consistent with two other papers that have been done on sort of linguistics and the dark triad but the statistics are still relatively weak for basing as a decision that's going to you know impact somebody's life potentially. So as we look at machine learning I just introduce the topic here. The reason you know one of the things we're attempting to do here is classify the people the tweeters as instances. They're going to have a number of independent attributes so those will be the features that we've selected the linguistic analysis and such like and then the number of dependent attributes either a class high or low or a psychopathy score and then we'll take all of those those independent attributes features to predict the classes and build using training data and validate it with test data essentially. And the reason we're doing that is there's pretty much there's too much data to understand quickly and it's too difficult or it's too challenging really to do with statistical analysis it's not impossible but too difficult and stakeholders will typically want results you know sooner. How you approach this is going to depend on the question you really trying to answer and that will really influence the sort of results you're going to get but for our study we really focus on spotting the top end on psychopathic scores either the top 10% or the top 1.4 as we'll discuss. So we have the concept of true and false positives which I think Randall is going to talk to you guys about and talk a little bit more about the results we've seen through machine learning. Okay so I'm to repeat I'm Randall Wald and as Chris was discussing we have you know true in this case positive is for the psychopathic class. Now obviously psychopathic is not positive but for convenience we refer to the minority class which is the positive class and as discussed psychopaths are the minority fortunately. The true positives are when a model predicts a minority class correctly. False positives is when you say someone is a psychopath but you're wrong they're actually perfectly healthy. True negatives and false negatives analogously true negatives are correctly saying someone is normal. False negative is when you take someone you say they're normal but they're actually a psychopath. So problem with trying to evaluate models that have this type use these models false positives false negatives is that you don't have the same number of individuals who are positive and negative. For example this is an example a skew distribution for the majority of individuals are normal the minority are extreme end of the psychopathic scale they're much higher in this measure than everyone else. Now if you try to simply classify and specify what's the most accurate model I can find or I want to minimize the average error of my model. You'll get everyone in this sort of big blue box here correctly you'll say you know I guess most people are normal so I'm going to just predict normal or I'm going to say I want to get an average value close to you know the population mean. That will give you a high average a lower average error or a higher overall accuracy but you're going to miss all of these individuals here. You're going to miss the individuals who are at the extreme end who are in the positive minority class and unfortunately those are the ones you care about most. Your goal here is to figure out who are the psychopaths who are in the minority class so you can't just evaluate models looking at your average error or looking at the overall accuracy. So you need to use some performance metrics that are based on more refined views of these things for example true positive rate which is true positives overall positives false true negative rate just true negatives overall negatives false positive rate etcetera and then arithmetic mean geometric mean are ways of trying to find models that balance your true positive rate and your true negative rate because you don't want to have extreme for either of those that's just not going to give you a useful usable model and one way to find models that try to balance these well is what's called using the ROC curve. This is a graph as you change the decision threshold you change how likely you are to classify a person as positive and you move this along you're going to affect both your true positive rate and your false positive rate and this graph shows you how this value is changed over the full scale of the threshold and you see the dotted line going up the middle there that is if you had a random model you'd expect that as you increase the threshold you increase both true and false positives a good model should do better than that it should go to the upper left of that curve it should be giving you higher values and one way to try to evaluate the quality of a model is instead of looking at any one point on this curve using what's called the area under the curve AUC which is just integral of the area under the curve and the higher the value the better one would be ideal but you take what you can get. So I'm going to let Chris talk briefly about Clef. So one thing we did was we found that there was this data this might be interesting actually the data science company in San Francisco called Kaggle.com and they host data science competitions you can either run them as public competitions and invite people from all over the place to participate or private competitions where it's closed and just limited to a number of data scientists. They work with you to shape and create your data set. Essentially then you host a competition where data scientists from around the world will compete with the best models. This we felt in terms of crowdsourcing was an interesting way of looking at how we can find maybe the best models that you know the quickest. In doing this we had 113 teams compete and created over 1000 models. So in terms of if you've got a data science problem but you don't have the skills actually we found this actually quite quite useful and for non-profits the fee is minimal. So Randall's going to talk a little bit more about the results from that and then talk about some of the results that he's had in his research. Okay so this curve is exactly what we were looking at earlier. It's a RSE curve with true positive rate and false positive rate as you change the decision threshold and you have across the middle you have that gray line which is how well the random model would do. Now the other two lines you'll see are the black line which is using the benchmark Kegel model which is a simple random forest model it's going to the Q and A if you want more details but then the winning Kegel model was able to do a little better than that. See this the red line there. And you saw across the side kind of small the area under the curve the benchmark Kegel model was able to do a AUC area under the curve of 0.64 while the winning Kegel model got 0.6 0.66 and some of the other models did more poorly than that. Now again it's important to look at the AUC and look at all the decision thresholds because if you look at accuracy you get very misleading results. For example these are the same results the same models looking at it in terms of accuracy and it's a dotted line at 90% which is the accuracy of the random model because in this case we're considering 10 we're trying to classify the top 10% psychopathic individuals versus 90% normal so if you assign everyone as normal every single person the model is as trivial as possible you're going to get 90% accuracy which sounds good until you remember you're missing all the actual positive instances. So you can see that the Kegel winner and the Kegel random force benchmark are very close to 90% in terms of accuracy but they don't necessarily do what you want they don't necessarily capture the actual positive instances. For example this is the confusion matrix for the winning model you see of the 125 individuals who are considered to be psychopaths here only two were accurately correctly predicted to be psychopaths. The other ones were all false negatives people who were really positive instances but labeled as negative. On the other side there were no false positives so it always is a matter of finding the threshold that balances your errors and the way you want them to because you can never have no false positives and no false negatives you need to decide where you want to draw the line there. So my own research has focused on a slightly more difficult problem rather than looking at the top 10% of individuals as being psychopaths looking at the top 1.4%. That's a very small number. The data set we're using had about 2,9 2,914 ish instances plus or minus 38 plus enough. Only 41 of those were considered psychopaths from this data set. So the goal is to build a model which could actually detect with some accuracy some performance those very small number of individuals while minimizing as much as possible are false positives and false negatives. So one way to try to alleviate the problem of having this hugely imbalanced data is to use what we call sampling. And the principle here is very simple. Your data set is imbalanced. You want to fix it to make it balanced. And a simple technique to this is just to either randomly remove instances from the majority class i.e. remove negative instances until the balance is more like what you want or to duplicate extra copies of the minority class instances, the positive instances. And now it sounds very like naive. You're just throwing away data. You're just duplicating data. It actually works extremely well. We've done research comparing this to other techniques that are more algorithmically intensive and this actually works extremely well. And the one other benefit of under-sampling specifically, which is what was done in my research, or it was Chris here, is that it also reduces the data set size. So you can get your results faster because you're not working on 2,900 instances all the time. You're working on a smaller number because you've thrown away some of that data. Now obviously to throw away data like that you're going to have a weaker model. It's not going to be as accurate. One way you can try to fix that is called ensembles. Two heads are better than one. So you want to take these multiple weak models and put them together to give you a stronger, more powerful, resulting model. So one trick here is that if all the weak models are just the same, added them together does nothing. You need to have these diversity among the models so that even if one only captures part of the data, one only captures certain ways of identifying psychopaths, you put all these models together and you're able to get an ensemble model that will actually give you accurate results. So one technique we developed is called Rust Boost, Random Under-Sampling Boost. This combines Random Under-Sampling, which discussed earlier with boosting, which is an established form of ensemble classification. The premise of boosting is you build a model and then you look at which instances, which people were classified correctly and incorrectly. And you give extra weight to those who are classified incorrectly and reduced weight to those who are correct. Then you build a model again. And you iterate this process multiple times each time building the model and trying to change the weight so that those that were done incorrectly are more likely to be classified correctly in the next round. Then once you have all of these models, you put them together into your ensemble weighted by the accuracy of each model to try to take different views of the data. So each of these individual models will classify certain instances based on how accurate they are, but collectively they'll give you good performance over all of the instances. And Rust Boost is a variant on this which applies random under-sampling within each round of the boosting to provide an additional source of variation and also to help the models work better because they are the more balanced data. So the model building process would normally fail with 1.4 percent minority class. Suddenly it's not doing so poorly. It's got much more usable data to work with. One other aspect of the data we have here is there are a number of different features. Discussed earlier features and classes. In this data set we had nearly 100 features between the profile information like number of tweets, frequency of tweets, number of friends, and then the text information like obscenities, anger, we, all those text attributes, almost 100 of them collectively, but they're not all going to be equally useful. So feature selection will rank them by their usefulness towards the class and then say I only care about the top 20 of these. The rest of them are all irrelevant. And this will help reduce your data set size because you're using many fewer features to build your model and also can give you more accurate results because the irrelevant information has been thrown away. So the combined model we're using for using all of these techniques together is called Select Rust Boost. Select Random Under-Sampling Boost, which has feature selection. I'm going to talk a little bit about how this works, trying to go quickly. Unfortunately I can't shoot the laser around the corner so the red dot is where the laser pointer would be if I had one. So the model starts with your data and you have a initialized weight vector that has, that all values are equal. Everything has starting weights of, you know, equal to each other because you don't know anything initially. Then you're going to repeat this algorithm n times, so the study n equals 10, basically going to go over create 10 models. Each time you first use random under-sampling to sample your data to figure out to reduce it down, make it more balanced and using the weights as well for that. Then you perform your feature selection on your sample data to figure out which features are most useful from that data toward the identifying psychopaths. Then you actually build the model and you're actually going to do two things. You see two lines coming off of this. You're going to save a copy of this model for the final ensemble, but you're also going to evaluate it to say how good this model is based on what you have. You store that weight parameter on the side as well and use that in the model to figure out what your new weight vector should be. To update which instances were correct, which instances were incorrect, so that when you repeat the entire process, starting, you know, back at the sampling, you're going to focus more on the instances that were classified incorrectly. And finally, once you've done this as many times as you want, you combine all of your models into the final model based on the weights to give you the classification values for your instances. So the results of using both Rust Boost and Select Rust Boost on the data as follows. With six to the seven underlying modeling techniques we're using, we found that Rust Boost was better than using No Boosting. However, with the Select Rust Boost, we found that it was always an improvement. We always got better AUC values, better true, the different true negative rates than when we had No Boosting here. So this shows the importance of these ensemble techniques when you have such complicated data as such imbalance data, you need, individual models are going to be weak and not as powerful as you'd like. You need to use ensemble approaches to combine them and create more accurate models. So, are we cross-performed ANOVA analysis and Tukey analysis to verify that our using Boosting, using these ensembles is statistically significantly better with alpha of five percent than not using ensemble. So this is validated statistically. And the best results we had using our, the best choices of modeling techniques, feature selection, number of features, et cetera, we got a true positive rate of 0.707, a true negative rate of 0.719 and AUC area under that RSC curve of 0.746. And to put that into confusion matrix terms, you can see we have of the 41 individuals who are classified as positive, really 29 percent we got correctly. Now 12, this one of nine individuals I should say are correct. 12 individuals weren't correct, unfortunately. Those are false negatives. And we had 808 false positives, i.e. people who were not actually psychopaths between nonetheless a model set of psychopaths. And this underscore is the problem that you're never going to eliminate both false positives and false negatives. You need to make sure that when you use the model, you deal with that fact. And I think Chris had some additional comments to make sure about that and concluding rocks. Thanks very much, Randall, which appreciated. I mean, this is one of the points that we've seen. There's been a number of news articles over the last couple of years about Facebook can be predicted within 10 percent, but you know that really falls into the whole accuracy issue, which Randall described earlier. So, okay, so you know you're identifying 29 out of 41, but you've also pulled in 808. You know what possible use is that? So, you know, first of all, we see 808 is a sort of high number of false positives, but if you combine that with closed source data, such as the criminal records or backgrounds, genetic history, or what have you, you're probably going to be able to lower that margin quite considerably. There's research from a gentleman called Mietje Bach, who's looked at combining multiple measures of personality and in all cases increased the correlation coefficient significantly, so improving the accuracy of personality prediction. So, if you had access, let's say, your, you know, FBI or what have you, I'm imagining, then you'd probably be able to weed down that 800 and something false positives even further. So, I mean, how could that be of use? Well, you know, if for example you're in the FBI and you've got somebody, I don't know, in custody, having an idea about whether they may be psychopathic or not, might influence the decision to use somebody who's been through a certain interview technique or something like that. It's a possibility. I don't know. I was kind of reading a little bit about this on the FBI site and you can read more about that as well. If you just Google sort of FBI psychopathy, you'll probably, you know, find this or I can, you know, give you the link in the Q&A, but I think it's probably more useful really for observing changing levels of anti-social personality traits if they are changing or not. And depending on your viewpoint, that's either going to be a good or a bad thing. And I don't, I'm not going to put a position forward that says I think it's good or bad. I'm really just saying I think it's interesting and requires more research and a lot more discussion. If you want to read about that anti-social changes in society, then this is a really great book to have a look at as well on Masking the Psychopath, which talks about social crisis psychopathy and there's a section in there where a gentleman in 1830 something was talking about the impact of industrialization causing a moral insanity, brackets, psychopathy. And I wonder whether, you know, we're seeing changes in psychopathy or anti-social traits as well. The more agentic countries like the US and the UK that saw more individualistic, you see higher levels of psychopathy than you do in communion countries like Taiwan for example. So here in, in the US and the UK, you've got that sort of one to five percent depending on the population that you measure. But in Taiwan, you're seeing that as a point one or point one. You know, so you're seeing a lot less in communion countries, but we don't really know why. So from a research perspective, I think it's a fascinating area of research. If you can just put the sort of the more sensational headlines aside and look at that. So it might help us understand a little bit more about this whole concept of cyberbullying, which is, you know, probably not very well understood really, but it's going to have a massive impact on the next generation of people coming up. My little boy, for example, you know, how do, how do we understand the impacts on our young people? And of course, as Jeff Hancock had mentioned in his paper, knowing a little bit more about the inner workings of sort of the psychopathic or anti-social mind probably helps us understand that and might even help us in, in slightly over context. Important to mention a couple of the limitations that we've had. Obviously, it's a self assessment. So you could argue that people have gained that maybe, but you know, the, the research or at least on the, on a broad scale is consistent with other papers. It's not a widely used assessment. So we didn't use PCLRSV. For example, we used a relatively new test by Del Paul, Husset British Columbia called short dark triad. You could argue that there's a selection bias because we've got a, you know, a tweet from Stephen Fry, which bumped our numbers up from 24 participants to just over 2000 overnight. So, you know, you could argue that that's that influenced it. And also the linguistic analysis of Twitter. I spoke to the gentleman who created that linguistic program, James Panabak, or at University of Austin, or University of Texas in Austin, one of the two. And looking at the language that people use in social media and on Twitter is obviously going to be, be different in some ways. So that those are limitations which research would have to overcome in a variety of contexts and looking at social media. Overall, really it's important to keep in mind that the research in social media is really at the, sort of the tip of the iceberg. And we as a, as a group in, in DEFCON, for example, have a real opportunity to partner with people who are studying sort of psychology of social sciences and help add sort of the data mining, but also have an important part to play in terms of discussing or generating discussion about the social implications of privacy, unfair invasion of privacy, particularly those 48% of hiring managers, is that ever acceptable? So that's kind of the residual question is we're kind of showing that we can determine things about personality and evidence suggests that the more we know about people different measures we know we're going to get better accuracy, but we haven't really tackled the question of should we, and that's a question or a thought I'd like to, to leave with the audience. So I appreciate everyone turning up. Thank you very much for your time. We'll be in the Q&A afterwards. Thank you.