 And everyone sees screens, because I'm going to try to do a live demo, because, you know, let's live on the edge a little dangerously. So I'm Aja Hammerly, I go by Thagamizer, most places on the internet, Thagamizer was taken on Twitter, so it's the Thagamizer. I really like it when people tweet at me, questions, comments, you're completely wrong about that type of things during talks. And my phone is somewhere over there, and I believe Twitter's turned off on the laptop, so y'all won't see it anyway. So feel free to go nuts. And I blog once a week, this week's going to be a little challenging, at Thagamizer.com. Various topics, DevOps, Ruby, the art of speaking, all sorts of stuff. And I work at Google on the cloud platform. So if you're interested in Google Cloud, Kubernetes, other things we do, machine learning APIs, I'm happy to answer questions. I also have a plethora of opinions that I am more than happy to share with you. You can come find me, we'll chat. And since I worked for a large company, oh, my clicker, there we go. The lawyer cat has to say that any code in this talk is copyright Google and licensed Apache V2, it's currently on GitHub, but it will all go up on GitHub in the next day or two. And I will post about it on my blog when it gets there or post it on Twitter. So hopefully, since the room is relatively empty, people aren't just randomly here because they didn't like the last talk they went to, or walked in a little late. But what is NLP? Like, Asha, why did you use the three-letter acronym? NLP is natural language processing, which might just get you a bit closer to the idea behind my talk, but it's still a little bit fuzzy for me. So let's go to Wikipedia. Natural language processing is a field of computer science, artificial intelligence, and computational linguistics. Concerned with the interactions between computers and human slash natural languages, and in particular, concerned with programming computers to fruitfully process large data corpora. So that is one exceptionally long sentence. It has a lot of big words, so here's the definition I use. Teaching computers to understand and ideally respond to human languages. And by human languages, I mean things like English, Japanese, American sign language, British sign language, all things like that. Languages humans use. So to echo millions of middle schoolers everywhere, when are we ever gonna use this? Slash, why should I care? And the reason is, bad NLP is already here. Who has had to interact with a phone system where they're like, say your reservation number, and you say it, and it's like, I didn't understand you, say it again. I know I have, I remember screaming at one in a parking lot at 11 PM at night to make sure that my flight reservation went through. Who here has logged on to a website and you're on the website and magically this window pops up and it's like this disembodied person and this sentence, if you need help, I'm here to help you. And you're like, I don't understand if there's really a person behind that or not. I had that happen to me last week, it was creepy. And on all these things, natural language processing of some sort, frequently bad NLP is likely involved. But the promise of NLP is actually better user experiences. I want to live in a world where instead of having to teach people how to interact with computers, we can teach computers how to interact in ways that people already interact successfully. And so an example of non-ideal NLP, but many of you are probably familiar with, is this is my favorite slide in the entire talk. Computer, T, Earl Gray, Hot. So that right there is an example of bad NLP. The way I phrase the request is very specific to what I wanted to have happen. I wouldn't use that phrase with any other person ever. But when that was on the air, that seemed really futuristic. Turns out we can do better than that already. These random things that live in your house and potentially order dollhouses for you or other things are in lots of people's houses right now. I saw a great talk by Jonah and Julian yesterday, where they were arguing with multiple Alexis up here on the stage. I have a Google Home. I really like it. And depending on the particular brand of these, they have a relatively large set. And in some cases, they can nearly open completely limitless corpus of voice commands that they can respond to. And if you really want one of these, the Google booth, we're raffling one of these off at 4.30 tomorrow. Come to a lot of forum. I'll show you if you stick around to the end of the talk, how to enter. And then we also, I mentioned before, have tech support and phone trees. And these are actually getting a lot better in my experience. I can actually call my credit card company and say, I never got my credit card. And they will send me to the right person who can answer that question for me. And I really hope that this stuff gets better over the next couple years. But here are some use cases you may not have thought about. Accessibility. In the closing keynote yesterday, end of the day keynote, one of the points that was made was that children often use voice interfaces because they can't read yet. I used to work on software for kids, and we specifically specialized in kids with learning disabilities. So the ability to use voice interaction and not have to type and spell correctly is fantastic for people with dyslexia. Or perhaps people have a broken arm and can't type consistently right now, or maybe they're holding a baby. There's all sorts of use cases where this kind of accessibility matters. In addition to the stuff that many of you were probably thinking when I put the word accessibility on the slide, which is blind people. Blind people can use voice interaction. And the other thing that NLP can help us with is it can help us improve our understanding and our ability to analyze large amounts of data. So who works on an app that has a feedback button somewhere? So when I worked in EdTech for kids, the stuff we got via that feedback button was amazing. Five-year-olds have the best ways of telling you they hate your software. No, really, it was awesome. But once we got a little more popular, processing all that feedback was really hard. Initially, everyone got every feedback email that came in, and then we're like, no, this has to go to a folder. And we wanted to be able to route feedback to the right people. If it was from an adult about billing, that should go to the team that did billing processing and set prices and stuff. If it was a kid saying, you are a poopy head and I hate you, this should go to the things to read on your bad day folder. And if it was like, I don't understand what to do here, that should go to the team that's working on that interface to try to figure out if we can make the instructions better. But we didn't have a way to do that. It would have taken one person going through and flagging all of that, so we just didn't bother. And NLP can also be used to assist us in other ways. One of my coworkers made a tool called Deep Breath that analyzes your emails as a Gmail plugin. And if you try to send an email that comes off as hostile, it tells you you might want to rethink that. Imagine how many GitHub flame wars could be stopped if everyone had that thing. So why don't we have it already? Because NLP is hard, like really, really hard. You don't have to believe me, why is NLP hard? Because English is horrible. Pop quiz, this is a word. This word is seal. Everyone imagine what you think of when you think of seal. Who thought of this? Who thought of this? Who thought of a musician? Who thought of something completely different? So one word, at least four different things that some people in this room thought of. And it was actually fairly evenly distributed, which was kind of cool, because when I ran this with some other people, everyone thought of that guy. And I really liked the picture, which is why I went with the word seal. And then we have our famous homophones. Yeah, the theirs. So many English teachers, so little time. And then we have words that can be multiple parts of speech. For example, love can be a verb. She loves her wife. It can also be a noun. Love lasts forever. So I repeat, English is horrible. And I didn't even get into stuff like irregular verbs, slang, idioms, and all the other bits that make a language a human language and make learning a foreign language really annoying. And it turns out that English isn't unique. I particularly believe that English is especially horrible, but all human languages are actually pretty horrible. And they're all horrible in the ways I've discussed or different ones. Even the ones we've tried to manufacture to be less horrible are still horrible, because humans make idioms, humans make slang, language evolves. And they're particularly horrible for computers because there is no formal closed grammar for human languages. Some languages are more regular than others, but if you think about the grammar for a computer language, it frequently fits on a page or two of print and text. They're generally very regular. There's a limited vocabulary. And if there's an unlimited vocabulary, we have like, this can be a symbol. A symbol can start with the following characters can't contain the others and can't contain spaces. If you don't understand the words formal closed grammar, let's come have a chat. I'll drag out the Chomsky. It'll be awesome afterwards. Also, human language is horrible because humans are really bad at being precise. For example, if I say I'm starving, there's a remote chance that that's true, but it's probably not. And if I say to someone, you look freezing, again, remote chance it's true, I hope not, and it's probably not true. We exaggerate a lot. And I was reading a cool article while doing research for this talk that the word unique is becoming less unique over the last 30 years. 30 years ago in newspaper articles, unique was relatively rare. Frequently, editors and reporters would use unusual, which is the word they probably want nowadays. But now, I think it said you're almost as likely to have the word unique as you are to have unusual when you mean unusual. Unique is something like seven times more common than it was in printed articles than it, seven years ago. So, I'm using that as an example of language evolves. My other favorite example is literally because as of like two years ago, it doesn't mean literally anymore. There's also the problem of computers suck at sarcasm. Like really truly, and this comes back to, it's not just humans who suck, it's computers who suck too. This is an example I used in a previous talk so I could say, sure, I'd love to help you out with your problem. I could also say, sure, I'd love to do the dishes. Depending on how I say that, the meaning changes. It also changes by the context, the words that are around it. And despite what we learned from Hitchhiker's guide to the galaxy, computers are really bad at distinguishing sarcasm and they're even worse at generating it. Yay, somebody got my hitchhiker joke. So, I ask again, why is this hard? I hope I've confused you and convinced you. But the big thing is is that English is hard and natural language processing is hard and languages are hard because humans, I'm a human, many of you are humans, use language in weird ways. We use sarcasm, we use exaggeration and it's hard because our human languages are complicated and always changing. But since humans created human language, we can simplify this whole thing to why is it hard to? Natural language processing is hard because humans. Natural language processing is hard because people. We are complicated and we make things hard for computers. So hopefully I've convinced you that this is hard but not all is lost because we've been working on this problem for a really long time. These two famous mathematicians were some of the first people to talk about natural language processing. They proposed ways to do algorithmic translation between languages and I am of the generation that Google Translate became a thing while I was conscious and it was somewhat useful for me and I've seen automated translation improved greatly over the last 15 to 20 years. But we've been working on this problem for several hundred years at this point. Another great example, in 1950 Alan Turing proposed the Turing test. Who smelled the Turing test, hands? Okay, there's a couple people who didn't raise their hand so I'm gonna actually give the definition because a lot of people think they understand it and they don't, including me originally. So the Turing test is a way to test for machine intelligence and it uses natural language. A judge, an impartial judge, watches a conversation between a human and a machine without knowing which one is which. Usually this means the text is going across the screen. The judge knows that one of the two is a machine but it doesn't know which one is the machine. If the judge can't tell which is the human and which is the machine, the machine passes the Turing test and people have tried to solve this. One of my favorite examples is Eliza. This came out shortly after the Turing test actually. And if you were an e-maxer, you can do MetaX Doctor to let you play with it in real time. And because I have the time, we're gonna do that right now. Awesome, so this is the e-max psychotherapist. So I'm gonna say hi, I'm feeling sad and it says hi, how do you do? What brings you to me? And I'm gonna say, I already told you. I'm feeling sad. Why do you say? You already told me that you're feeling sad because I already told you, you stupid machine. Is it because you already told me you stupid machine that you came to see me? Yes, well I see, what makes you believe this is so? So this is a very classic example of NLP. There's lots of implementations of this. My project for the plane ride home is to port a Python implementation to Ruby because why not? I have to go to a conference and end up in some really stupid hallway programming process. But it's also really fun and building some of these is one of my favorite programming projects for working with middle schoolers. They like being snarky, they like talking back and making things that talk back based on what you say is a lot of fun. So that's one of my examples, come on. There we go, awesome. And Eliza is actually a classic example of a chatbot. So who's written a chatbot? I have lots of fun, you should do those. But the fact that chatbots have been around since the late 50s and early 60s kind of surprised me. I thought we invented that stuff with IRG. Turns out no, we didn't. It's been around forever. So I've now rattled on for about 15 minutes and I haven't showed you a single line of code and that's not really fair of me. So, codes. I have a thing, whenever I do talks, I like to give super impractical examples so that hopefully no one will actually use what I do but we'll use the ideas instead. And RubyConf 2015, I gave a talk, this was the talk. It was stupid ideas for many computers. And in that talk, I demonstrated how I could do sentiment analysis of tweets by scoring the emoji they contained. Sentiment analysis is a subfield of natural language processing and the goal of sentiment analysis to figure out if a given body of text is generally positive, generally negative or something else. And to use emoji to do this, I gave common emoji scores based on how positive and negative they were. This was my general scale. I may have partly done this talk so I could put the poop emoji on the screen many, many times. And I used emoji at the time and if you go back and watch the video on Confreaks because NLP is super, super hard. And in November of 2015, I didn't have the skills or the ability to train up a model, to do accurate natural language processing on all the crazy stuff that people tweet during a conference in Ruby in real time. But it turns out that most folks who do machine learning of any kind don't actually build their own models because building models is hard, takes a lot of time, takes a lot of knowledge about the various ways that they can go wrong. And over the last year, tons and tons of pre-trained models that you can access via API have come out. And I'm gonna use this one. We released Cloud Natural Language and I'm gonna use that for my demo today instead of my hacky emoji scoring scheme. So we have a gem. This is actually supposed to be a dash between Cloud Language. I'm sorry about that. Gem install, Google Cloud Language. I believe it's alpha, it might be beta. It's on RubyGems. All the development happens on GitHub. It's all open. And here's all the code I needed to do the scoring. I'm an analyze method. Well, require the thing, create a new object, blah, blah, blah. Analyze method. I create a document using the tweet of the text. I call the sentiment method and it gives me an object back that has score and magnitude. Score is how positive or negative it is, negative one to one. Magnitude is how much that thing it is. So if I get a score of 0.1 and a magnitude of five, that means that is a very, very neutral tweet. If I get a score of negative nine and a magnitude of 0.1, it's negative but it's not excessively negative. There's bits of it that are really negative. I have a blog post about how to interpret those if you're curious, because understanding those two things is a little bit difficult. Don't multiply them together. It doesn't work that way. I tried that. I'm gonna massively hand wave over how I set this all up. It's a Kubernetes cluster. I've got a math produced system with something that's pulling data from Twitter. I'm happy to explain it afterwards. Come to the booth where I can show you it's all running. I explained it all in great detail in my original talk, because my original talk was how to set up distributed systems to do crazy things. And I'm going to do a demo at the end of the talk if we have time, but I need your help. I'm pulling stuff from that hashtag. So if everyone wants to get a phone out or something and make some sort of tweet, you can blame my talk at some point in the tweet and try to give it a sentiment. Generally, sentiment's been running pretty good on the conference thus far. This is not surprising. I'm gonna take a drink of water. Y'all go tweet something. I'll be back in a second. Okay, I see eyeballs. You can keep tweeting while I'm talking because the next bit just involves me telling you about grammar for a bit, so that's good. So when this tool came out and I was playing with it and I'm like, dude, did anyone else here have to diagram sentences in grade school? Because I totally did and I hated every minute of it. Like, doing this thing over and over and over again drove me nuts. If you'd never seen these before, this is one form of sentence diagram. You have a subject, a vertical line, the verb, a half vertical line, a direct object, and all the stuff that modifies all those things goes on the lines below. So anyone else draw diagrams like this because I drew tons of them, okay? Talk to a bunch of other people and they drew diagrams that look more like this. In both methods of diagramming, the verb is at the center and everything that modifies it goes off to the sides. And also, all words are organized so that they're connected to the word that they modify. And when I started showing this to some of my friends at Seattle RB, they're like, ew, grammar, I hate you. And then I'm like, so, blah, blah, blah, direct object, they're like, I don't actually remember any grammar. So brief side quest, grammar. And I'm gonna apologize in advance to all of the non-native English speakers in the audience. Y'all know this already because you've all had to study this. You can tune out, I'll be back in a couple minutes. So one of the way we understand words is by labeling them based on their function. This is called parts of speech. Verbs, verbs are actions. You can't have a sentence without a verb. One kind of action might be jump. But you can also have verbs that aren't active, state of being verbs, like thinking. And then we have nouns. Lots of us learn that a noun is a person, place, or thing. That is strictly true. A person, like maths, or Alan Turing. A place, like the bathroom, or phoenix. A thing, like a cactus, or a mountain goat. But you can also have verbs that are ideas. You may have heard the phrase abstract noun. Democracy, love, those are abstract nouns in this concept, in the way we break down words. Adjectives, adjectives describe or modify other words, usually nouns. There's great podcast that explains why that isn't strictly true, because, again, humans are awful. Attributes, adjectives specify the attributes of things. Blue, small, five, those are all adjectives. They can also compare things, near and far. And if you are of the same generation I am, you are thinking about a sketch from Sesame Street when I say those words. I almost embedded it, but I didn't because copyrights. But please go search up, search near and far, Sesame Street on a video search, it's hilarious. And then we have things called that I learned as articles, A and in the, but modern grammar calls them determiners. And they help clarify which noun. And determiners also include words like this and that. All articles are determiners, not all determiners are articles, it's like squares and rectangles, it's fine. And then we also have parts of a sentence, which are different than parts of speech. So the root is the thing you need to have a sentence. You need a verb, therefore the root is the verb. The subject is the noun that does the verb. And the direct object is the thing that the verb happens to, most likely a noun. It's all you need to know, but pop quiz, here's a sentence, the cat eats fish. The subject is cat, the verb is eats, the direct object is fish. Sidequests complete. Everyone else can pay attention again now. That was your reminder of how grammar in English works. So we're originally working on sentence diagramming. Here's the basic idea of these sentence diagrams. Subject, verb, direct, object, other stuff. And I wanted to figure out how to do this. And to do this, I need to know what part each of these words is solving. Well, there are tools for that. And instead of using the sentiment method, now I'm using the syntax method. And it returns a list of tokens. And it returns way more information about every single token than you would ever want, at least that I would ever want. There's a ton of stuff here. This is the token for the word cat and the sentence the cat eats fish. Here's the text itself and where it appears in the sentence with the offset. Here's the part of speech. It is a noun and it is singular. English doesn't actually have a lot of modifiers on nouns. We don't tend to use grammatical cases. We don't have grammatical gender at all in English, which is kind of cool. Grammatical mood, tenses generally don't apply to nouns. They usually apply to verbs. All that information is available if it's relevant. Go use this on something like German or Spanish or something that has more cases, more grammatical gender, things like that to see how that works. And then this token is enabled n-subj for subject. And so what this is saying is that cat, in the sentence cat, the cat eats fish is the subject of the sentence. So with this, I have enough to make myself some awesome rock and ascii art diagrams. Here I'm just finding whatever token has the label subject, taking the text, doing the same thing to find whatever is labeled root, calling that the verb, ascii art. And I get some absolutely amazing ascii art here. One cool thing, if you haven't seen it before, I know we have a lot of new Rubyists. You can multiply a string by a numeric. It has to be in that order, you can't switch it, but I'm multiplying a space by the language, by the length of the words, the subject word here so that everything lines up correctly. So awesome, let's add the direct object. Go find the thing that's labeled direct object. Even better, more rock and ascii art. So that's all easy, but I'm missing a word. I need to figure out how the fits in. So the natural language API gives me one other useful thing and it gives me a thing called head token index for each word. And this is the index of the parent word in the corpus of text for the current node and not parent word, parent token. Most of the tokens are words you'll see sorely that's not always. So this is the token for the, it's head token index is one. That's the list of tokens. So what this is saying is that the refers to cat. Everyone follow that, but got a little tricky, okay. So I'm just gonna go through all my tokens and find everything that refers to the subject, everything that has the same head token index as the index of my subject. I'm gonna print that out. And you know, a little more ascii art. I have a basic sentence diagram. I couldn't figure out how to do diagonals in ascii art and I'm pretty okay with that as it turns out. So this is awesome. And I was very proud of myself and you follow me on Twitter, you saw that I tweeted something because I was super happy about this. And then I tried a sentence like this and that didn't go so well as you can probably imagine. So I took a step back and I talked to some of my friends and they're like, well I didn't do sentence diagrams this way. So I fell back on one of my favorite tools. Everyone's got their favorite tool set and one of my favorite tools is a gem called graph. This is actually the gem that I gave my very first conference talk about a long time ago. And all graph does is a gem that makes creating node edge graphs, not charts, like bar charts, graphs, like math graphs. Easy, it provides a simple DSL and then it creates dot files and a program called graph vis reads dot files and builds visualizations for you. And all you need to know from graph is that it has two methods inside the DSL node which takes two option arguments, a required ID and an optional label and edge which takes an ID for the two and the front. And they can be the same if you want to loop back. So that's all you need. And this is all the code I need to build a graph based sentence diagram. This little bit here is some graph boiler point this just says in this block my DSL applies digraph because it is a directed graph. You know, math, it's awesome. I'm gonna use the index as the ID and the text as the label for each of my nodes, make a node for every token. I'm going to make an edge from the current node to its head if it doesn't refer to itself. I didn't, the loopbacks got confusing. They were mostly only applied at the root. So that's the cat eats fish and you'll notice that I actually have the punctuation. The punctuation is considered a token as well. And then that's the more complicated sentence that didn't work at all with my first set of code. The cat eats the fish with a side of milk. You can see the prepositional phrase there because milk head token is of because milk is the end of the prepositional phrase and of is the beginning of the prepositional phrase and grammar is awesome. So I've got a couple minutes left. I showed you some silly examples. This is what I really like to do when I give these talks because if I have to make serious code all the time, I get bored. But they're all sorts of practical uses of NLP. We talked about customer feedback. We talked about summarizing. We talked about making ways to make our products more usable for a wider variety of people. And I'm hoping that some of you have ideas of your own at this moment. And hopefully I did my job. And if you want to get started, you want to play around with this. The Google Natural Language API is a good way to play around with it. You get the first 5,000 requests to each endpoint, syntax, sentiment. We have another one called entities. First 5,000 a month are free and it's priced per 1,000 after that. And I can't, we can look at the pricing, but it's very, very reasonable. And just because I'm like, dude, this is so much fun. I ran the Jabberwocky through it and it got the syntax analysis correct. I was a little bit surprised by that. So it's a good way to play around, experiment with the technology and just kind of see what it's worth it. I highly encourage you to just play because we learn all these new concepts by playing and digging in. I work for Google, we're at RailsConf, we have code labs and stickers and answers at our booth. I also have dinosaur stickers up here if you come hang out with me afterwards. We have a talk this morning called Google Cloud Labs Ruby in the sponsored track. You already missed it, so watch it on Confreaks. And one of my coworkers is doing a talk on instrumentation, what my app is really doing in production, tomorrow at 3.30 in this same room. And as I said earlier, we're giving out a Google Home. There's the link. It's also at our booth. I wrote it on the whiteboard or on the chalkboard myself. So this is what I see, thank you. And I ask if you have any questions. And again, 30 minutes exactly. So I have time for a couple of questions. Yes, over there, I tried it. So the question was, how does the sentence diagramming approach deal with incorrect grammar? And I tried it. And a great example I tried was Bunny's Hop. And I tried it out the period and it couldn't figure out that Bunny's was the subject. But I put the period on and figured it out. So it takes its best guess. It is a machine. Machines make mistakes and it's getting better. All of these models get improved over time. They get better and better. For the most part, it's pretty darn accurate. My grammar as a general rule is really bad. I have a copy editor for my blog just to make sure that I don't do horrible things. Mostly I do horrible things with commas. And it's generally pretty good, especially for the common mistakes that people make. Improperly using semicolons and properly using commas. Less common mistakes, it's not as good because it doesn't have as much training data. So the question was, how does emojis play in? Because I use that, and I'm probably maybe because I use that as an example, I don't actually know. I've been running this through Twitter all right and I promised you guys the closing of the demo. Here, I'll show you. So this is my, so this is running in real time and you may not be able to see that. Oh, you, yeah. So, yes. Here, let me hit the button so that you can start streaming again. Thank you very much. So our current sentiment is 57. So even if you guys are trying to be horrible, you are outweighed by the positive because it's extraordinarily positive. And there's emojis and tweets and thus far it hasn't seemed to affect them significantly either way. But I don't know. I don't actually don't know the science behind that and I don't know enough about our underlying model to be able to answer that accurately. So it doesn't seem to completely ruin it though. I know that. So I'm gonna summarize your question. Let me know if I get the summary right. How does cleaning up the data and ensuring I have higher quality data improve the syntax analysis? Okay, we agree on that summary, awesome. I've actually been really lazy and I haven't done any cleaning of the data because I'm a tester at heart and I like to try to break things and anytime I have to put additional pre-processing in that's slowing me down. I can't imagine that it would hurt but I've been pretty happy with what it's done without. The entity analysis is actually pretty good without. Through, I didn't show the talks, I didn't have time but I through, I love pecan pie with ice cream through entity analysis and identified that pecan pie and ice cream were the two most important entities in that sentence and I agree with that sentiment. So and it's generally, entity analysis is generally used for identifying proper nouns, cities, companies, things like that. One of, somebody I know I used to work with does language processing of SEC filings, looking at various things to try to understand what's this company saying and you care about big name companies and stuff and cities and things like that and those type of things but you know, for the sentence, I love pecan pie with ice cream, pecan pie and ice cream is a pretty good analysis of that. Other questions before I give out dinosaurs and sit up here? Okay, you can come get dinosaurs. Thank you very much for your time, I appreciate it.