 Thank you so much to CSB Conference for having me. I'm a big fan of CSVs. I was actually going to include a screenshot showing you that emoji data begins as a CSV and then you extract emojis and then it becomes a pretty graph, but I'll spare you that. I'm really excited to be here today. Many people ask me, how did you become an emoji data scientist? Is that something you always wanted to be when you grew up, when you were growing up? And given that emojis weren't around when I was growing up, nor was data science. I did not always know I wanted to be an emoji data scientist. I kind of fell into this by accident because I was always, for many years now, I've been a data scientist. I've worked in a bunch of industries. I really enjoy it. I also really like emojis. People remember the first time they saw a Ford Mustang or the first time they saw their favorite Rockstar. I remember the first time someone sent me a tears of joy emoji and I asked the person I asked her, does this literally mean you're crying when you're laughing because I thought like what else could it possibly mean? And I mean that's how my fascination with emojis began and one day I realized I can actually combine my passions for emojis and data science and that's what I'll be sharing today. I was really inspired by Angela's talk yesterday when she talked about industry data science versus indie data science and I think my journey is also one of transitioning from an industry data scientist, it's like the big building. I worked at Facebook in New York. We had a huge building and it kind of looked like that and the guy with the beaker is just like scientist because otherwise the bar graphs just look boring to an indie data scientist which is just me on my laptop. And I think throughout this journey the most meaningful realization I had is that I could harness my data science skills and abilities, finding a good question, finding data, analyzing data, cleaning that data, telling a story, building models which I had done in advertising and demography and brain imaging and all of these fields where I learned a lot but I really wasn't asking questions that I personally was passionate about. And I realized that I could use these same technical skills to answer questions that I personally was really curious about, that I wanted to know, that I thought weren't being answered and that is the story of the birth of Prismoji which is an emoji data science lab in New York. So how did this begin? The story, I'll take you back to last summer when Brexit happened and I was scrolling through my Facebook feed and I saw this link to an article in Quartz. The world's reaction to Brexit in emoji. And I was super pumped when I saw this link because I was like, this is so cool, like someone used emojis to understand sentiment in response to Brexit. I've always thought of doing that, like, oh my God, they beat me to it but I was excited because I believe in open source and collaboration. So I was like, all right, let's find out what they did. And I read the actual article and this is not a snippet of the article. This is the entire article. There is no data in this article as we would say when listening to an academic talk. It says François Hollande, one tear coming out of his face, Nicola Sturgeon, bicep emoji. It's just a bunch of emojis and people's names. That's the whole article. And I read that and I was like, okay, if this is the bar for emoji sentiment analysis, it's a pretty low bar that Quartz is setting over here. And why don't I try to surpass that? So first I had to react to this article in emoji and that was my reaction. So what I did is that I spent the next three days engaged in a one-man hackathon. I locked myself in my room drinking only water and coffee. And I just spent three days learning how to extract, how to download tweets from the Twitter API, how to read that into R, how to extract emojis from those tweets, how to make graphs, how to visualize that data. It was like I was like on the moon. It was so exciting to me because it hadn't been done before. And since Quartz had just published this not very well done article, it was timely. So I thought if I do it right away, I can get it published somewhere. And three days later, I finished the article. It gets published in Motherboard. Here are the most popular emojis from the Brexit reaction. And that's how I became an emoji data scientist. This was the most fun I'd ever had, a lot more fun than I'd had doing advertising research or working in grad school or doing all of the other data science I'd done. This was really cool. So let me walk you through some of my work on this Brexit piece. And then later on, I'll talk about some of the other analyses I've done. The question we were trying to answer is a simple one. How do people use emojis to react to a major political event in real time? And the first step is always getting the data and cleaning the data, which takes 80% of the time of a data scientist as the maximum goes. And the approach was straightforward. I used the Twitter API to sample 100,000 tweets for five hashtags related to Brexit. I removed retweets. I used regular expressions in R to figure out how to extract the emojis. I computed the emoji count. So basically all I did is I downloaded a bunch of tweets about Brexit. And I counted how often each emoji appears. I matched those tweets with an emoji dictionary. And I figured out what is the count of each emoji in this data set. And then I compared that with a control group, with a baseline. I used this guy, Emoji Tracker, Matthew Rothenberg, who's built a great site, which has every emoji that's ever been tweeted in the past five years. So that gives you a baseline of how popular are these emojis in general. Yeah, that was my reaction too. And then I also looked at the hashtags in these tweets. So high level, these are the emojis that, these are the top 10 emojis people use to talk about Brexit. And the ones in blue are the ones that over index on Brexit. And this is my first time playing around with emoji data. This is untested code. I'd hacked it together. And the thing here that made me happy is that the flag of the UK is in number two on the list, which tells me that my code worked. So if it was the flag of Papua New Guinea, then my 72 hour hackathon would have been a waste, and I would have to start from scratch. So that was cool. What was also cool to me is that there's a lot of, there's a tremendous diversity of reaction in this set of emojis. You have positive emojis, thumbs up in the clapping hand sign, the heart. You have sad emojis, the crying face, the pensive face. You have the senile evil monkey. There's a lot of variety here. And this is something real. This is Brexit, like people's lives were changed, like Europe is figuring out how much of a penalty they're going to assess on Britain. And here are people on Twitter using emojis to react to something that's happening in real life and is affecting them tremendously. So this was really hardening for me, because I think a lot of people who don't use emojis think emojis happen just in a vacuum, randomly. People are just randomly picking emojis. But no, emojis actually correlate with the emotions people are trying to express. Another thing we did is we came up with the idea of a hashtag signature of a given emoji. So we had five different hashtags we looked at. Not my vote, vote remain, EU, Brexit and Vote Leave, spanning the spectrum from pro-Brexit to anti-Brexit. And then I figured out across our entire data set, what is the distribution of these hashtags? And it's something like this. And then I asked myself, for a given emoji, what does this distribution look like? And comparing those two allows me to understand how are people using certain emojis to express certain emotions. And the findings are really intuitive, and that also makes them interesting. So you find that the British flag heavily overindexes on Vote Leave. People are much more likely to include the British flag when they're using the hashtag Vote Leave, suggesting it's a sign of patriotism or nationalism. The party-popper emoji, people are using that when they treat Vote Leave. And when people tweet, vote remain, they're crying. It's the loudly crying emoji. Subsequent research has shown that the loudly crying emoji with the two tiers is often ironic. So we'll have to explore that in future research. And it's also the praying emoji, because when Brexit first happened, a lot of people didn't know what they voted for. There was a lot of voter regret. So there was thought of it they were going to do a do-over. This is like the day after Brexit. They didn't end up doing that, of course. But here you see people are praying for a do-over, which is really cool. Here are emojis, and people are using them to express their emotions about a real-time event. So that was my first step as an emoji data scientist. And I was like, this is cool. What else should I work on? And then one day, I don't know if you guys listen to pop music, my friends, I have a younger sister who's in high school, they all start telling me about a feud between Kanye West and Taylor Swift. And that is my next, does anyone know what I'm talking about? Oh, wow, awesome. A few people know what I'm talking about, awesome. When I give this talk to high school students, everyone starts jumping up and down at this point. So that was my next emoji analysis, the data scientist emoji guides at Kanye West and Taylor Swift. And a quick back story is that Taylor and Kanye had a pretty serious beef. Kanye West used, mentioned Taylor Swift in one of his songs in a derogatory manner. He claimed he had her permission to do that. And she denied that. She said he's making that up, he's insulting me. And then Kim Kardashian jumps in and she released a leaked Snapchat video of Taylor giving him permission to use that tweet. Justin Bieber jumped in, maybe Drake even jumped in. It was like a huge, huge controversy in the hip hop world. And I was like, okay, this is really cool because I'm really curious to know how people are using emojis to talk about this controversy. So what I did is I looked at tweets mentioning Taylor, tweets mentioning Kanye, and I looked at the top five emojis in these tweets. And what's really interesting here is the thing that immediately jumps out is the snake emoji. A lot of people are using the snake emoji to tweet about Taylor. And for you guys who've been following the controversy, it's because people said Taylor was a snake. She was lying, she was ungrateful to Kanye, she couldn't be trusted, and that was the meaning of the snake emoji. This got so serious that Instagram for 24 hours banned any comments with the snake emoji on Taylor Swift's profile. Because something like 80,000 or maybe 800,000 people were leaving snake comments on Taylor's Instagram. And they banned that because she thought, she called that cyber bullying and she appealed to Instagrams like directors. And yeah, that got pretty real. Another way of visualizing this data is if you plot each emoji on a spectrum where the farther to the left it is, the more likely it is to be used in a tweet mentioning Taylor Swift. And the farther to the right it is, the more likely it is to be used with Kanye. And here again, you see something really interesting is that the more feminine emojis, the pink hearts, the purple heart, the rose, the kiss, they're heavily more likely, they're much more likely to be used in tweets mentioning Taylor Swift. The snake emoji of course is there also. That's the snake bombing because it shouldn't be there. The heart is there. Kanye West doesn't really have, his emoji action isn't that interesting in comparison. He has the lit fire, which people use with Kanye West. This got me thinking, I wonder if there is a common emoji language we use when we talk about our favorite celebrities. As DJ Khaled would say, what are the emojis we use to talk about fan love? That's his term for how we adore our favorite celebrities. So I extended this by looking at five celebrities and I looked at the top five emojis people used to talk about them. And the fascinating thing here, it might be a little too small for you guys to see, is that four of the five emojis are the same for four of these celebrities. It's the hard eyes emoji, it's the red heart, it's tears of joy, it's loudly crying emoji. Which is fascinating because there's no collusion happening between all these people tweeting about these celebrities. This is just a free market, it's chaos, it's like a bunch of 250,000 people tweeting about how much they like these celebrities and a lot of them are using the same emojis. That's crazy because we don't learn emojis in school. We learn words in school, but no one is teaching these people that when you talk about your favorite celebrity, use the hard eyes emoji. It's just this is like order coming out of chaos. This is like, and it's public data. It's open data, it's all out there. It's like it's staring us in the face, it was just waiting, it just needs to be analyzed and it's like, that's really cool. Obviously the interesting scientific question is, how does this order emerge from this chaos? How are all these people using these same emojis? Are they using them to mean the same things? And I think there's a lot of interesting research to be done there. You also see uniqueness, right? Beyonce has the queen bee emoji. Justin Bieber, people use the fire emoji. Drake is very interesting because his number one emoji is the tears of joy. And after doing this, I Googled, why is everyone laughing at Drake? And apparently he had challenged someone to a rap battle and like Twitter just attacked him for that. Yeah, it was not good. And DJ Khaled is in the middle. He's a man on his own. His emojis are unique to him. He has his own unique emoji signature. And many folks have picked up on this now. This is because everyone's talking about DJ Khaled and his unique emojis because he talks about major keys to success. He talks about bless up. He talks about give thanks to the sky. Like he has all of these motivational sayings and people just associate emojis with those sayings, which is really cool. In just a few minutes, I wanna talk about some of the more recent work I've done. I did some work on Election Day, a much more sensitive topic to many of us in this room. And what was really interesting to me, and I think just looking at it from a bit of a high level, is that tweets are data, right? They're structured data, but it's just a bunch of data floating around in the ether. And how do we make sense of that data? That's the question, right? And the thing is natural language processing is hard. So if you try to analyze the sentiment of tweets based on the words in those tweets, it's really hard for a computer to do that. So what happens in most journalism about tweets is very anecdotal. It's like, oh, today people on Twitter are saying X, Y, and Z, and it's based entirely on the social networks of those journalists. It's not based on rigorous data. And I think this is a first step in applying that type of rigorous data science, rigorous data analysis, to understanding what social media reaction is on a given topic. So without any further ado, this is emoji reaction on election night, looking only at tweets after midnight, eastern time. And the emojis to the left are more likely to be used in tweets mentioning Hillary Clinton. And the emojis to the right are more likely to be used in tweets mentioning Donald Trump. And when I looked at this, it's just like, I wanted to cry. It's just beautiful. Because this is like, what? It's like half a million tweets, like crazy data, like someone could write a PhD dissertation on how do you summarize half a million tweets. You couldn't teach a computer to do that in a sensible way. And here, all I do is I look at the emojis and I plot their frequencies. And I get this amazingly fast, this incredible plot where the emojis that are used with Hillary Clinton are sad emojis. They're deeply crying, distressed, praying emojis, the heart emoji, the blue heart, the broken heart. It's like all emojis of people really sad that Hillary lost. And then you look on the Trump side and there's a huge binary going on where you have happy emojis. You have the party popper emoji, the hundred, the train, which is like, I think the Make America Great train emoji. People use that train. And then there's also a lot of anger. There's like, what just happened? There's rolling eyes. There's the middle finger emoji, which I had never seen prior to this work. Used in the wild. This was like a sighting of the middle finger emoji in the wild. And that was really cool to me. Another way of looking at this is if you look at the time course of reaction in tweets mentioning Hillary Clinton. So everything on the left was used more in tweets before midnight. Everything on the right was used more in tweets after midnight. And here you can see that in a very crude fashion how that emotion shifted over the day. Where throughout the day it was happy, like you got this girl, like biceps emoji, kissy face, and then after midnight it's just like night and day, which is like amazing to me. I also looked at four specific emojis and did a more intricate time series. And here you see something, a few interesting things. The American flag emoji spikes in tweets mentioning Trump after the results are announced. The middle finger emoji also spikes in tweets mentioning Trump after the results are announced. The loudly crying emoji is very heavy for in tweets mentioning Hillary Clinton starting at 10 p.m., because people are seeing the results are not going the right way. And the most interesting thing, the prayer emoji is used at about 10.30 p.m., 10.30 to 11 in tweets mentioning Hillary Clinton. Because that's when people thought Michigan or Pennsylvania might turn the other way. And they thought Hillary could eke out a victory. A lot of this is storytelling on my part. I dabble in storytelling. But I think what's so interesting here is that process of taking raw data and finding an interesting story and communicating it back to the public. That's what excites me so much about emoji data science and emoji data journalism. Because this work isn't buried in a journal somewhere. It's not buried in the confines of a major tech company where it's never going to see the light of day. It's just out there for the public to make of it what they will and hopefully be inspired to play around with emoji data on their own. This is just a quick summary I did of the divided emoji states of America where I think word clouds are also really interesting. Where you see how are people using these emojis. And you see the thumbs up, congratulations, Mr. President, make America great again. The middle finger is just, I don't know if I'm allowed to say that word, but yeah, whatever. This tweet is, fuck Trump, fuck the people who voted him. And the people who didn't vote at all, middle finger emoji. That's like a mic drop tweet right there. That's really cool. And then the thinking face emoji, which is like, what just happened? And it's just so fascinating. Like people's emotional lives are represented in the emojis they use. That's what seems to me the burning story here. And I think that's really fascinating. I'll just talk briefly about my last piece of analysis I did, which is on emojis of resistance. So this is right after the Muslim ban, the travel ban was announced. There were all these protests happening at the airports. And I live in New York. My sister works for the city government. So we went to the JFK and she huddled with the lawyers and she's like doing government stuff. And I'm just like, I'm an emoji data scientist. Can I help anyone here? And the lawyers didn't really know what to make of me. And I said, I watch a lot of law and order if that helps. And it did not help. It did not help. So I went back home and I did another one-man hackathon. This time it took like six hours because I've been doing it for a while. I'm much better at it. And I looked at emojis people use when they're using these protest hashtags. So the hashtags that were big that weekend were no ban, no wall, not my president, the resistance. Women's March had also just happened so a lot of people were still using that hashtag. And here I looked at the emojis people use when they use these protest hashtags. And the interesting thing here is that it's the fish emoji. I mean, that was my main finding. But it's also the American flag. It's also the red heart, which is counterintuitive to a lot of people who think protestors are coming from a place of anger or a place of anti-Americanism. But if you just look at the emojis they're using, a lot of people are using patriotic emojis to talk about how this ban is not in line with our values. And I thought that was a really cool way of looking at it. Here are some examples of tweets. Here's a tweet of the Statue of Liberty hugging a young woman in a hijab. And the tweet is such a beautiful picture. Heart emoji, American flag emoji, hashtag Muslim ban, hashtag no ban no wall, hashtag not my president. Which is just really cool. I think this is the kind of stuff that, so one of the limitations of this work is it's only using Twitter data. I think next steps are to extend this using Instagram, other social platforms. And I think there's just so many interesting directions to take this kind of work in. Speaking of which, speaking of open data science, you guys too, all of you and girls, can become an emoji data scientist, not collectively. You can become individual emoji data scientist. I released a, there's a tutorial on our site, prismoji.com, emoji data science in R. All of this latest, the code for this latest analysis is open source, you can download the data. I walk you through step by step how to download tweets, how to extract emojis. And what really excites me about this work is that I think when we talk about open source and open science and open journalism, the idea of accessibility is really important to me. And I think making data science seem complicated is easy. Making it seem highly technical and you need five PhDs and you need to know deep learning and you need to be how from 2001 a space odyssey, that's easy to do. But it's hard to take something so complicated and make it accessible, make it resonate. So when I give this presentation to high school students, to college students, data science, they would have been like, what is that? But when they see this, they're like, oh, this is really cool. This is a tool, a powerful set of tools for me to understand my world. It's a way for me to go out and look at the questions I care about and find data on them. And I think it's a powerful educational tool and that's why I enjoy giving a lot of these talks. Some quick next steps, study emoji usage across countries and cultures. The interesting thing is that all of this is public emoji usage. Whereas for most people, their most meaningful emoji usage is in private conversations. So that's a whole other ball game. Obviously getting that data, I think some academics are beginning to try to figure out ways of getting that data in an informed manner. But I think that's a whole other, because that's what we really wanna know, right? When she sends you that heart eyes emoji, like is she flirting with you? Is she just like, what does that even mean? And yeah, I think this is a starting point. And then also using this as a kickoff point to just dive deep into data journalism and storytelling and use that to answer unexplored questions in society and culture. And I'll stop there. Thank you so much for your time. Feel free to reach out to me on Twitter or by email if you have questions or tips or if you'd like to collaborate. And I'm happy to take questions now. Thank you.