 Hi. So I'm Heather Nolas. I am a machine learning engineer. And this is Jacqueline Nolas, a data scientist, and we are going to talk to you about a very interesting project that we had last year. But before I get started, this is a talk about sex. If sex is going to make you uncomfortable, go ahead and leave. And it's not just sex, it's other people's sex lives, which can get really interesting. So if that makes you uncomfortable at all, please leave. If at any point you are uncomfortable, please leave. Blacked out the worst of it, everything's consensual. There is a reference to bestiality. But if that's okay with you, it's okay with me, and we'll keep going on. So I bet you're wondering how we end up giving a talk about sex journalism. And so what happened was, periodically, the stranger, which is an alternative newspaper within Seattle, does a sex survey. And it's super unscientific. They just put out a survey monkey link, and they have a bunch of multiple choice questions, and people give their various responses to this survey. And like I said, incredibly unscientific. No researcher designed this survey. Strange questions. And they designed it to be interesting to take. And so, Jacqueline and I, we are married lesbians, and we own an artificial intelligence consulting company. And coming up through the sciences, I've seen what happens when you give certain people the power to pick a part of data set to really amplify differences. So I've read research papers about how the brains of gay people are different than the brains of straight people. And I'm totally uninterested in that, and have no faith in the journalism complex to handle this concept with dignity. So, as a pair of married lesbians, and my wife is trans, we were like, well, I think that I can do this data responsibly. And so we saw this one fateful tweet, and we were like, if anybody can do some good data analysis on it, it's us. So we literally just went to the stranger's website, wrote them an email being like, please give us all of this private data about people's sex lives. I promise we're trustworthy, and hope for the best. And so, getting the data was really strange. It started with a few emails where we had a link to our website. See, I promise that we're real people and we're not creepy. And then it navigated to phone calls where they were like, well, we have somebody who's going to do really basic analytics on this. And I was like, but we do data science and I think it will be more fun if we can do an additional article that applies some machine learning techniques to the stuff that's happening. And then we had to have an in-person meeting where they just looked us in the face and we're like, okay, you guys don't look like skisballs. Great, like great, we finally trust you and then they handed us the single fateful flash drive with a single SPSS file on it. And they were like, here you go. So then Panic sets in, what are we going to do with all of this data? And so we looked at it, the survey was about 50 questions, 8,000 people responded. And again, they made it entertaining to take and not necessarily entertaining to analyze. So some of the questions would be if you were going to commit a lewd act with a CEO of a major corporation, who would it be? Like, FMK style, who would you pick? Which for most people ends up random. There's not like a whole sector of people who like really are like, yes, it's Jeff Bezos and he's the love of my life. So it was a really interesting thing. And then some of the answers were actually free form responses where people just got to type whatever they want, which sometimes would be one word. And sometimes it would be 4,000 words because people get really into telling you about their sex lives. So we didn't really know what approach to take with this because again, it wasn't designed for analysis. It was designed to be interesting to take. So we did the classic technique of spaghetti on the wall. We're just going to throw analysis, throw analysis, see what sticks and follow those threads. So Jacqueline's gonna talk to you a little bit about our spaghetti. All right, so the key point is we had 50 different features of a data set, 8,000 respondents. There's all sorts of things we could do, but there's no clear objective, except we needed an article. To write an article that was interesting for the stranger. So we tried lots and lots of things. Here are five things we tried that aren't even worth mentioning in this deck. Barely, just want to put them on the board just so you guys are aware that we tried lots and lots of stuff. And most of the stuff wasn't interesting at all for a number of reasons. And what I'm going to go through now is like five things we tried that were interesting, they didn't all make it to print, but they were interesting. And what we learned about doing data science on this data. So here we go. Idea one, sexy stories neural network. So the question was, what's the sexiest thing you did last year? And this is a free form text. And there's lots of interesting neural networks out there where you can train to come up with new Pokemon names or new Jane Austen novels. And then we thought, well, what if we train a neural network to come up with new sexual experiences? This will be funny, people will laugh, wow, well, and here's the result. The generated things were less one of the actual answers. For instance, to have sex with a guy who had sex with a, just isn't that interesting of a result. Nothing could beat what people actually wrote. And this took hours of programming in R with TensorFlow to get this neural network even to this point, until the payoff for the joke wasn't worth it. Idea two, dating apps and porn. So there are two questions. One was, the people I have sex with, I usually meet through, and it was multiple choice. You could select multiple. And the next question was, how many hours of porn do you watch per week? And we said, well, some of these apps, maybe they feel a little lazier, is there some sort of correlation with the dating apps you use for finding sex and how much porn you watch? And so this ended up being a bad result. And what we did was we ended up using a regression to try and figure out what was important to predicting the porn use. And while it did find significant things, like people who use Grindr, if you use Grindr, you use more porn, we used the definition of importance because it had to with regression, and it just ended up being too confusing for the editors of The Stranger. And so while the data was interesting, the result that we did, it just didn't end up making sense to other people. Idea three, related kinks. So there are two questions that were relevant. What activities have you done and all sorts of sexual acts? And another question, what kinks are you into? And a whole bunch of possibilities. And these were great because each one of these had about 30 answers they could give yes or no to, which meant there's a lot of relationships within each question. So on the right, you'll see a list of some of the, I guess it's a little small, but you can see a list of some of the kinks people could do, like pain, latex, I don't want to list any more because I feel uncomfortable reading this. And we really wanted to know how do these things relate? So the first activity we tried was we made a network where we put edges between related activities. And there's a lot of things blacked out here. When we were making this presentation, it kind of felt like the molar report, just how many times we had to put black bars everywhere. But you could see that there's some interesting things. At the top, you can see people who were cheated on and have been cheated on are close to each other. Had sex while watching Game of Thrones is disconnected from almost everything. So there are two kind of clusters of black boxes, one on the lower right and one on the upper left. These are both related to the gender of the person doing the act. And so there were some of these, and so like it just, there were some relationships in there, but it ended up being largely confusing. So we didn't go with this, but it did inspire us. Well hey, let's actually take that data and try doing a k-means analysis on it. And so here what we did was we took the questions people said for the list of kinks people said, and we did just a traditional clustering analysis. And we ended up finding that there were four clusters that were really interesting. There was the cluster of people who have really no kinks listed at all. We have the list of people who really are into spanking, blindfolding, BDSM. That's a second group. The third group is people who are like to show off. So they're into group sex, exhibitionism, voyeurism. And the last group, which we dubbed the kitchen sink, is for people who want all of it. So this was quite clear to the newspaper editors. They thought it was great. And to the point that the actual title of the article that ultimately got written about our work was based on that graph. So that ended up being good. OK, idea four, checking out sexting. So there were two questions about sexting. So one question, when I'm sexting, I'm most likely to send a picture of my blank, select one. When I am sexting, the thing I want to receive from the other person is to send a picture of their blank, select one. So the thing about the stranger is they were already going to publish each question individually. So just looking at the raw questions on their own, we wouldn't add any value. But this is an obvious opportunity for us to take two separate questions and put them together in a clear way and get an interesting result. So how did that look? So we made a sanky diagram, which this might even the weirdest sanky diagram ever made. And on the left, we have what people like sending. And on the right, we have what people want to receive. And if you look, at first, we're like, well, this is just as complicated as that network we couldn't use. But then we realized there are like eight funny things in here. And in particular, Heather noticed that one of the things you could select was that you don't send sex. But about half of the people who said they don't send sexed actually listed things they want to receive. We quickly dubbed them selfish sexers. And Heather just made a hunch that these are probably cis men, cis straight men. And it turns out that we looked into it and that was right. Men just tended to be more selfish than women, which we thought was a fascinating finding. Idea five, cross tabs with some of the funny questions. So as Heather pointed out, there were some questions that really were put in just for the benefit of the person taking the test or the survey. So for instance, one of the questions was, even though bestiality is illegal in Washington state, if I had to have sex, do something sexual with animal, I would choose A. And a list of answers. There's that bestiality one, Heather mentioned earlier. The other question is, what would you like your sex act to Donald Trump be? And then you had a list of things involving P. So we looked at these and we thought, these are really funny questions. What if we could find relationships between these questions and other questions? And so we looked into this and both of these failed for two different reasons. For the bestiality question, everyone picks Centaur. And so this is really small. But that bar graph is, for all the different dating apps you used, how likely are you to say Centaur? We figure, oh, the people who go and only go to hang out in coffee shops to get dates are most likely to be Centaurs because that's boring. That ended up being true. But they are all so ridiculously similar because just everyone wanted a Centaur. For Trump, the data was more distinct. Like really, there was a good distribution of answers. But we couldn't find meaningful relationships to other data sets. So for instance, we again tried to use a logistic regression to predict what would predict someone saying they would pee in Donald Trump's mouth. And almost all the different kings, none of them were statistically significant except for pain and group sex and we could not figure out what that meant. So with that, we went through, we came up with some of the things that we thought were interesting enough to be in a newspaper article and it was time to go to print. And so back to Heather. Okay, so we finished the analysis which just involved a lot of actually emailing back and forth with the editors and being like, I found this, is this cool? And they were like, no, it's not. Try again. Okay, sure. And then we would have, as Jacqueline showed, these beautiful, well, they would become beautiful graphs if we wanted them to be. And still, it was too confusing for the average reader of The Stranger, which is a free newspaper in Seattle that you can pick up as you exit the grocery store to really parse. And so what we ended up doing is we had three major visualizations and then a bunch of supporting facts because for instance, people over the age of 55 are twice as likely to have had sex on poppers. It's not a good visualization because we don't care about the other ages. However, it's a very, very interesting fact. And so when we first wrote it, we thought these are not extremely data literate people who are reading this. And so we were like, well, we'll just make it really funny. So the point where the first line of our article was if data is the new oil, then surely these sex survey results are crude and the editor came back and said, that's way too cheesy. Okay, okay, and then he also said that our article wasn't technical enough. He was like just by throwing out terms like you've thrown out in our conversations, it establishes you as an expert. So go ahead and add K-means back in and add the word Sanky diagram just because even if people have no idea what it is, it makes you look important, which is a thing that I never thought an editor would say to me. And so we went to print very quickly. We finished writing the article on June 29th and it was published on July 4th in the sex issue. And like I mentioned, there was another article that just included all survey responses and some excerpts from some really funny stories that people submitted, but then there was us and we were doing weird data science on the sex information and it included our company logo and our names and we were a brand new company at that time so it was kind of nerve wracking. This is our first consulting project that we've done and it's not the most professional. And so this was one of the diagrams that we included. Specifically, the word that's blacked out is a word for vulva or vagina, which I really wanted them to print and they were like, you can't say dick and say vulva or vagina. And so they used a different word that I don't like so we blacked it out. But this is our final Sanky diagram and I think it does a really good job of demonstrating the selfish sexsters. If you look at the FAR photo, you can just see by bringing the blue forward you're actually able to see all of these people who are asking for things that they aren't actually sending. And if you notice from our earlier Sanky diagrams this took a bunch of the little information out so all sorts of body parts and other things that you could have wanted are just completely removed to really simplify the image. And then even after hours and hours of editing the clustering on our own, they actually went through and made it even more simplified, removing a lot of the features that we considered really essential and you think, oh, when I do a data visualization I want to visualize all of the data. But that's not what newspaper editors want, they just want the data that seems interesting to them so we let them cut it down. So then we went to print and we were like, what's this going to look like? Could be because for context, Jacqueline and I have a family together. At the time we had like a nine month old son. Nolus is the name of our company and our last name which we made up were the only Noluses in the world. So the four beings that have the last name Nolus are me, my wife, my son and this company. And our first thing going to print is a sex article. So Poor Child in the Future, anybody who Googles his name and his parents' sex article is going to come up. So we were really scared and we kind of braced for impact and nothing happened. The majority of responses we got were from acquaintances that we had fallen out of touch with who live across the country and saw our article and were like, I have no idea that this is what you're doing now. Like this is incredible. And so it just goes to show that you can go and do something really weird and interesting with data that might seem a little bit risque and receive essentially no professional feedback to the point where our coworkers at T-Mobile would be like, hey, bring in a copy of the paper. And there was like a copy of the paper going around the office because everybody wanted to read what we did. And so what did we learn from going on this sex journalism journey where we really tried to be data centric? The first thing that we learned is that when somebody says, here's some data, please tell us a story and we can't do simple aggregation. We really become the NOLIS LLC spaghetti factory and we just throw every tool in our toolkit at it until something teases out. And then we learned good data science doesn't equal good journalism. So like I said earlier with a lot of our graphs they had to be deeply simplified to end up going to print which I wouldn't consider good data scientists but they did make extremely beautiful visualizations. And then the visualizations need lots and lots of editorial cleanup. I'm finishing my master's degree right now and in class I had to, my professor had to be like, yes you can go to this conference and meet the class. So he of course read the newspaper article and was like, it's a shame that I could never show these beautiful visualizations to all of my students because they are so horribly inappropriate because this is really a master class in how to make beautiful visualizations. And then the last thing is that sex stuff is honestly really endlessly funny. Like looking at people's sex lives is endlessly interesting and intriguing and we thought that we would get bored of it by the time that we were done and we were still like, no, no, let's put it in again. Let's see what we can do. Let's play with the neural network. We couldn't really pull our hands off of it. And so just to round it all out we want to say thank you to the stranger who gave us approval to do this talk. You can see the article at the Bitly link. Jacqueline and I are pretty active on Twitter and then if you want like super, super professional machine learning consulting you can go to nolisllc.com.