 shine it and the answer was clear you know so why why do you want to talk someone okay there's online dating you know Tinder so easy right you just know someone's name know their profession or actually something related to what they do you know a city and you can pretty much come across their LinkedIn you can get to their Facebook or their Twitter if you're trying to date like an 18 year old hipster maybe their tumblr I don't know but okay that's the stalking one-to-one right but what about grand scale stalking okay so I have to say this talk is not going to be about how you can stalk your lover it's going to be about different research projects that have been going on in universities where people take all this data that is available and they try to infer things about people and some of the results are a bit creepy but also very cool and I think they give you some impression of the things you can learn from this data so the first the first project I'm going to be talking about is essentially I call it you are what you like and what was done by these guys Kosinski was that they took Facebook likes from a lot of users and they tried to infer some personal traits so this goes from pretty easy stuff like okay gender maybe your ethnicity that's not too hard to infer but gets a bit creepier so they they infer also the sexual orientation also whether you take drugs or you smoke and very weird stuff like are your parents still together by the age of 21 so little bit strange so and the cool thing is that a lot of these was very accurate so the first two things I mentioned so gender and ethnicity was actually very very accurate and all all the personal traits they try to predict are I think the lowest thing accuracy was about 63% which it's nice well nice they also try to predict some of more subjective things such as whether you are like an open person you're a competitive person and some some of these things so I'll just talk about their method very quickly and if you want more detail I would say okay you can refer to the paper we we can also you can also ask me during the the questions so essentially what they do is they have data from users right their Facebook likes and they also have some they also have data from the same people they took some tests like psychometric tests some IQ tests and stuff like that so these these services the validation so how is this done so essentially you have a huge matrix where you have all your users all the likes and then it's a well zero one entry matrix right so if you like something you have a one under your interest if not there's a zero and then what they do is a single valid decomposition so collapse the dimension of this thing and then trying to classify on this so they use linear regression for continuous variables and logistic regression for categorical variables okay so but these the most technical thing I'll be talking about I thought it would be interesting to show you some of the predictor variables that gave out the labels so okay if you find yourself liking RPGs fan fiction programming anime basically like all the stuff I like turns out we might be shy and reserved that's I mean this pretty it's not that strange right but okay if you like these things sounds like a pretty random list you might be spontaneous and also a serial killer screamo not dying it's awkward if you hate everyone hate you the police and when I totally did not write that down you might be a competitive person and this is my favorite the last one if you like walking with your friends and randomly pushing them into someone or something you might not have a lot of friends you know okay so there's the complete list on the link so if you want to check it out it's actually very fun there's a lot of interesting things there and again refer to the paper for more information so the second thing I'm going to be talking about is inference from mobile data so cell phones you know cell phones are used to communicate but you can also it's pretty much a tracking device right so essentially I'm going to be talking about two studies what the first one they try to infer just from cell phone data such as like call logs and location logs they try to infer relations between people whether they are from the same family their friends or co-workers and these pretty neat the way they represent these is essentially a partially label graph so some times people actually label these connections right on my phone I have brother has my brother so you could get the label from that but a lot of the times you don't label anything so the cool thing with what they did this was with 107 users they tracked them over 10 months and they their best predictor was actually around the accuracy of 83% so that's pretty neat the second thing is another study also using mobile data so what they try to do here is they ask people who work in the same place and they ask them whether they are friends with their co-workers or not and then they try to infer these labels from from what from what they said right so if it's not clear please shout so anyways what is the data that they have they have a call logs they have a cell tower IDs they have other devices in proximity they got tracked over nine months and also yeah they they had this quiz where they say whether this person is their friend or not and there's some interesting results from this so there's this is a bit hard to understand to see but essentially they came up with some variables to try to capture these these raw data right so they look at the proximity between people on Saturday night the phone communication whether like the proximity at work proximity outside of work and stuff like that so there's seven variables again you can get more information from the paper and what they do then they represent these in terms of a baseline so the baseline is they took the people who actually both said they were friends and they take that as your okay your maximum right and then you express all the other variables in terms of this baseline so for instance if if both people acknowledge that they're not friends like their communication is like less than five percent here like less than five percent of what they would communicate if they were friends so that's the idea and there's some pretty interesting things coming from this first is that you can predict the proximity of work by looking at the proximity at home you can also you can find the two best indicators of whether people are friends or not so by just looking at their externally at the communication outside of work between people you can pretty accurately determine whether they are friends or not but there is so okay so the picture here is that you can determine whether guys are friends or not or there but when there is the case the awkward case when one person says they are friends and the other does not acknowledge them there is something interesting going on because you sort of see traits of around the area of not being friends but also some things that actually exceeds the baseline of being friends with someone you know so this could just mean that friendship is not a categorical variable so I mean there's different levels of friendship so yeah it could be that it could be different types of culture or you know in scientific jargon this is what we like to call it the friend zone no but in all seriousness this is pretty I think it's pretty interesting I mean it's sort of obvious right if you're friends with someone you probably have a lot more communication with them but the paper goes on to to look at other things they also try to determine what is the most important thing in this data to determine friendship or not and also they relate this to the satisfaction at work so if you have a lot of friends at work maybe you will be more satisfied so for more information I would look at the paper so what happens with this so these sort of websites come out so there's these websites where it's pretty much called yeah you are what you like and you can log in with your Facebook and it looks at your Facebook likes and tries to tell yeah tries to come up with a prediction of what sort of person you are there's also something that's a bit more creepy which is called please rob me and basically looks at your Twitter and your Instagram no at your Twitter and your four square and tries to determine whether you're at home or not slightly weird I think it's closed down I hope and then there's this ultimate creepy thing which is actually not machine learning but I just felt the need to put it out there because it's this program where you can put someone's nickname from Instagram a flicker and Twitter and essentially just gives you the history of where they've been if they do geo tagging so you get a map and all the timestamps really creepy and awesome sort of anyways so what's the take-home message from this talk well when you go on an online date no just kidding now so be mindful of how much information you leave out there I suppose and also machine learning is really cool you can do a lot of stuff with this so yeah that's it thanks a lot for listening to this talk follow me on Twitter you can meet me on scriptogram or you can talk to me in real life so if you have any questions yeah hi there was this I just I was reading an article a few weeks ago that was about hi it was like this person had basically liked everything on Facebook they kind of liked every single thing and it kind of Facebook got really confused about what to show them in their feed and I was kind of interested in how those kind of DIY very simple hacks of these kind of machine learning researches like how does that tessellate with what you're with what you're interested in and what you're doing so yeah that was of course you're gonna have like so first I think the data set they use here they they try to use something that is nice right that if someone looked at that data point the person with all these likes they would probably label it as an outlier I don't know I'm not well aware of how the Facebook actually does machine learning I'm pretty sure they just try to like come up with like articles that's your friends liked right but yeah I suppose it's easy you can easily confuse it as well so I'm not I'm not really sure I suppose that if that guy's profile would enter these data sets you would easily be able to rule it out as an outlier because it wouldn't provide you any information I think yeah I hope that answered the question right thank you very much