 snack overflow. I only have two interests in life and it's food and computers. So we're going to talk about low res NLP for your everyday life. Hello. I hope your lunch was good. I'm Duretti. You can find me on the internet. My handle is my first name. It's amazing. Thanks to my parents for such a unique name. I have my handle. Yeah, my first name on everything with Snapchat. There's like some teen who has it. You can find me hanging out the most on Twitter where I post a lot of jokes and occasionally a sad computer essay. And I guess this is a picture of me grinning at a wall. So I work at Slack. You might be thinking what Slack? I don't know what that is. It's a group messaging app for teams. We have clients for desktop, iOS, Android, and even Windows phone for the rare Windows phone user. We have 3 million daily active users and about a third of them pay us, which is pretty good, I think. And there's over 600 people that work at Slack. I've been there a year and a half now, which is an eternity in San Francisco, like a real long time. And I am a backend engineer on the growth team, which is kind of ironic because of how fast Slack has grown. My team focuses on getting in, which is like signing up for Slack, and getting it, which is once you're in Slack, how do you use it? And one of the ways we do that is be a Slackbot. So yes, that means I'm a maintainer Slackbot. I'm sorry. And today we are going to talk about NLP. I don't mean neuro-longostic programming. Apparently it's like a discredited psychotherapy practice. I had no idea. And it's like about connections between your mind and your body and what you do stuff. But I'm actually here to talk about natural language processing. So we are creating more data than we ever have. Like Joe said yesterday, it's kind of like breadcrumbs in a forest. We're just throwing data everywhere. And because there's such an overwhelming amount of data, humans can actually process it. We're not good at scanning through those kind of large data sets, but computers are great at this. And for our purposes today, we really want to look at, like, tightly scope what we're going to, like, look at building, specifically automating replicable tasks, which is something that computers are really great at. And specifically, you know, I work at Slack, so in a chat interface. But before I kind of get into talking about what natural language processing is, I want to delve into some of the things I think are important when approaching problem solving. So there's a lot of jargon that I'm going to kind of go through. And I want you to know that jargon can kind of blur your comprehension. Computer science concepts have a lot of jargon, and it's kind of like an iceberg where there's a bit of understanding at the top and then just sort of this unreal chasm at the bottom. And there's always more underneath the surface. So we've talked a lot in the last couple days about the idea of resiliency, specifically when it comes to our computer systems. But I want to talk about resiliency as a human trait. Resilient people don't let failure overcome them, and instead they see failure as a form of feedback. And I think programmers have this trait in spades, specifically if you've ever debugged anything, you know how to be resilient. And closely related to this is the idea of grit, and it's the idea of perseverance and passion, and making your way through these problems. And high achieving individuals tend to have grit in spades. And that's important because longevity in any industry demands this. Specifically in software where things change so fast. So one last thing. This is from Kyle Kingsbury's guide closure from the ground up, and I know this is like a talk no-no, like a huge wall of text. So this is kind of generally what we want to look at. Basically, you can program, you can do math, it doesn't matter what the media or your other colleagues say. You don't have to be white or a man or straight or anything. As long as you have the right tools, curiosity, passion to do the thing, you can figure this out. All right. Everyone can do this. Thanks for humoring me. Now we're going to talk about NLP. And we're going to talk about what it is generally a very, very high level overview of this space, and what we can do day to day. So what is NLP? Here is a slide of acronyms. Natural language processing is chiefly concerned with the interactions between human language and computers. And it does so primarily through algorithms and specifically through machine learning algorithms, which is the ability for a machine to learn without being specifically programmed to do specific tasks. And this is closely related to artificial intelligence, which is when a computer mimics human cognitive functions like learning and problem solving. And all of this generally is known as human-computer interaction, which is its own field. And it observes the ways that humans interact with computers and designing technologies in a way that's fun to use computers as a way of seeing how humans interact with computers in a cool way, basically. But for the purpose of this, we're only going to talk about NLP. And I think there's this perception that all of this machine learning and NLP and AI and neural networks is just going to solve the world's problems. And it's going to be amazing. It will help these smart devices and learn from our behavior, and it's going to be totally great. And I don't know about that. So this is still from the 2002 movie Minority Report. I don't know if anyone's seen it. Great. I love this movie. And it's definitely how I imagine the future to be when AI kind of gets out of the way and it becomes kind of a part of our regular lives. And a lot of actually the technology from this movie exists today, like the connect, like the way the interfaces work. Or the first time I saw Google Glass, I thought of that retina display, like the scanning of your eye. So yeah, this is how I think of the future. But in reality, we're kind of far from that Minority Report future. This is a tweet from a Twitter account called Internet of Shit. And this particular tweet is about how a researcher found a flaw in a smart plug and put it in an Amazon review of the plug, and then they pulled the plug. They're like, we can't sell this anymore because regular humans can't patch their own devices. And it's proof that it's easy to imagine this future, but it's hard to execute it. Anyway, that's kind of an aside. Aside from problems of putting chips in literally everything, there are really hard problems with natural language processing, specifically when humans try to understand sentiment versus understanding. So an example of this is spell check. Spell check is a great example of natural language processing that doesn't require understanding or sentiment. You type into your computer and it's constantly checking against a list of known words, like a dictionary, and kind of checking for any versions of the word. So then you're like, oh, cool. My computer figured out how to spell a thing. Great. But it's an example of NLP that's stateless. It doesn't even know anything about you to exist and run. But another example of NLP is when you're calling customer service. On my way here, I had to call Delta to change reservation. And they were like, hello for English. Say English. And I'm like, oh, this is zero. Get me through this whole tree of stuff. I don't want to go through this. But actually voice recognition is one of the hardest problems in AI and natural language processing because the lack of pauses between words, it's very hard for computers to understand. And, you know, there's other problems with natural language processing, like detecting human emotion and context. But luckily humans are amazing at context. We've got like a built-in bullshit detector, which is our brains, and we can always tell when something's a little off. Like, if you see something that's been Google translated to your native language, it always is ridiculous, right? And that's because we're really good at context. But, you know, it's not all great. If you recall from Toby's talk yesterday, he kind of talked a little bit about how machine learning works where you have like your training data and your test data. So the test data is what we use to check our work. And computers are like the promise of like machine learning and AI and all this stuff is that like this logic machine would never replicate human biases. And that's not necessarily true. So, I don't know if you all remember TeBot. This is amazing. Microsoft released this machine learning bot on like most social networks in March. And you could interact with it on the internet. And it was so that like Microsoft could understand the young because they wanted to know how people from like 15 to 24 actually talked on the internet because they were like, what's happening? Like, what are teens doing? And within 24 hours, they had to delete the bot because people had taught it to be racist. It was terrible. And it tweeted some really horrific stuff like Bush did 9-11, all those terrible things. It's true. You can't find it now. And so like, you know, we've already kind of taught these bots to be terrible. And there are some Princeton University researchers that are looking into the idea that all the training data we've got that we're feeding into these neural networks have our human biases in them already. Because they've done these studies and every set of data has like our human assumptions in them. And you know, it's great that we're researching it, but we also need to kind of get in front of the fact that like our training data can be kind of garbage. But you know, it's not all bad. Watson really did a great job in winning Jeopardy. It did have access to like 200 million pages of the internet structured and unstructured and like all of Wikipedia, but you know, that's super fair to like humans. But it wasn't connected to the internet during the game, so. Anyway, all this aside, we're going to take some tactics from NLP and human computer interaction to solve the problems that we actually have today. So before I said like, oh, there's data just everywhere. We're literally, you know, there's so much of it. But in most production systems, there's not actually enough data to be aware of having computer like look over these data sets, not really as much as you think there is in terms of needing machine learning to get through all of your data. So what we want to do is solve the problems that are close to home and the ones that we have today. So I think a great place to start with all of this is a chat bot to solve like a problem that you have with your actual everyday life. GitHub is fantastic at this. This is a schematic for their Hubot, which is a bot written in coffee script. I don't know where you do that, but they did it. And it does all kinds of things for them, like deploying and jokes and all kinds of stuff. Okay. So now that we've kind of talked about NLP, at a very, very high level, we're going to take ideas from NLP and try to apply it to our everyday lives. Okay. So I think chat interfaces are excellent. This is a screenshot of my actual terminal. And I think that the real interface for the future is going to be like SMS, like just texting. And it's going to be like, you know, hey, do you have this thing for me? Yeah, sure. I'm going to text you back. This is great. But I can be biased. I work at Slack. I, of course, think that bots in chat are the way to go. So if you recall, I told you that I'm one of the maintainers of Slackbot. And we recently shipped a feature that I'm going to kind of tell you about in detail about how to use natural language processing. So that project was you can use Slackbot to ask questions about Slack. And it should respond with like, yes. Here are some things I know about Slack. So imagine I want to learn how to delete a channel. And I say, hey, how do I delete a channel? This is a diagram of the sentence that I just typed. So it's like, hey, like an injection of the verb and like a noun and then deploy being with them. Actually, no, this is the wrong thing. This should have said something else. Quality control is really excellent. Whatevs. This is a diagram of the sentence that I was going to do before. Okay. Well, this is also like a really cool visualization that's provided by this company called Spaces. So you should totally check it out. But I did this totally wrong. Well, anyway, you are going to want to like clean the input that comes in. And I'm going to show you how to do that with some pseudocode. So imagine that this is what comes in. You want to strip that white space at the end. And you want to lowercase your information of something. Cool. You can use regex, if you want, to get rid of anything that isn't alpha numeric. And so then you're just left, there's like no comment anymore. There's no question mark anymore. And then you want to tokenize your string. This is fairly straightforward in English because every word is separated by a space. But this is much harder in my language say like Japanese or Korean where it's more pictographic in nature. So it's hard to split the words apart. So, you know, we're going to tokenize, we're going to split on a space. Cool. This is great. And then we want to further take apart the sentence. So we want to get rid of what's known as a stop word. Stop words are usually the most common words in a language. And they're usually function words like is, on, at, and computationally they're thought of as filler words. So this is a minimal array of stop words and are as at an alphabetical order. And, you know, you can choose whatever you want. This is what we chose. You don't have to choose these words. And then just to add some for loops. So here you'd want to, we're going to do two for loops. Here you're going to want to loop over that first stop word hash and create like a new array that you pop everything into. And when you look at the array, you see that we've created this hash where every word has, it's a key value. And so then everyone has like a true value. Which is great because then it will allow us to loop over that array looking at our array of inputs that we've split up. And which is good because relatively fast, right? Because you can just, oh, this is true, cool, put it in an array. This is true, cool, put it in an array. But this is actually like looking for any word that isn't a stop word and putting it in an array. So we go from this input, which is how do I delete a channel to just the words delete and channel, which are like the main words that we care about. And we are now going to like fuse these words together. And here's where I tell you that I cheated. So for this particular project, we had a seed of information that we thought was going to be interesting. Like people are commonly asking about how to delete a channel. People are commonly asking about how to pay for Slack, how to do X or Y. So we have a database of all of these kind of potential strings, potential keywords that are of the same, you know, delete channel kind of, that's how they're stored in the database. So then we do this cool lick up. And so now we can query the database for our expected response. So like I'm going out to a database and then I'm getting this return, you know, string that I can now push into our interface. So now when I say, hey, how do I delete a channel, immediately Slackbot will be like, oh, this is how you can delete a channel. And we did a lot of this by hand. You don't have to do that. There are a bunch of really handy libraries out there. There's this JS library called SuperScript. It's a node package. You can download it, use it, play with it. Stanford has a really great set of tokenizers and all sorts of things written in Java. And it has it for most major languages. So, you know, I think German, Spanish, French, all these things, which is really great. And if you're wanting to build something specifically for Slack, there's this bot toolkit called Howdy, and it makes it really easy to get started and listening for, you know, input and figuring out how to parse it and all that kind of stuff. All right. Moving right along. So if I had to tell you anything, I would say it's important to take ideas from other disciplines and that we can get 80% of what we really want with, like, 20% of the work. Like, this is, like, a very naive algorithm. There's no machine learning. It's just pulling apart strings and using it to get the information that we want today. Also, you know, I think that we should fix what annoys us. Like, you know, this whole project was based in the idea that people literally couldn't figure out how to do stuff in Slack, and we wanted to give them an in-product way of figuring out how to help themselves as opposed to, say, you know, searching on the Internet or tweeting on us or whatever, you know, it's more powerful to, like, be able to do this stuff for yourself. Yeah, so now you got this, and I'm actually done. So thanks. So my first question is, I think it's really cool what you've done with the Slack bot, but what other use cases do you see taking on this NLP stuff, like, now? Things like, you know, if you're a project manager and you want to know what your team is up to, you can say, like, you know, via a bot or chat interface, like, tell me what my team is doing, and then it can go and query each person on your team and then give a roll up to, like, the project manager, which is great because then you get to be mad at the bot and not the person asking you so that you're not as friction at work. Another thing is, like, deploying your website. You can do that through that kind of stuff, just kind of things like that. Okay. And then there's, oh, there's loads of questions flying in now. I think you kind of surprised everyone. Oh, there's another, so this, I'm interested in this. Where else do you use NLP at Slack? Not as much places as you would think. Like, it's very, we're, like, just beginning to look at actual, like, machine learning. This is, like, the only instance in the product today. How do you deal with synonyms? That's a good question. We have been trying to get a sense of, like, what people are actually typing into this interface so that we can, like, manually train what we look for, which is pretty terrible. We also, like, haven't started doing stemming yet, which is, like, you know, deletes, deleting, delete, like, they all have the same core, at least in English, and so we haven't yet, like, figured out how to, like, we haven't yet put the time into doing that programmatically either. Okay. Yeah. And then there's a question, is there, do you have any example code that you can share for low res NLP? Oh, it's all proprietary. So the answer is no. Sorry. I signed many NDAs. So one more question. How do deep learning methods apply on NLP? How are these things related? In terms of, like, that's all that it said. Okay. I don't know. Well, I mean, you can imagine that if you were to do, like, a series of deep learning on, like, say, like, the Slack message data, you could unearth a lot of stuff about people's patterns, people's work patterns, and how people get things done, and maybe seasonality of when people are actually using Slack or things like that, but we haven't even started to scratch the surface on that. All right. Yeah. There's a lot still to come, I think, in the machine learning AI world. All right. Well, thank you so much, Dorella. Just a round of applause for Dorella. Introduction to NLP. Thank you.