 All right, let's do, great, it's working. I wanna make sure that was working. So welcome, how is everyone's RailsConf going? Good. So I'm gonna start with who am I? I go by Goose, but I'm also Matthew Mongeau or Halogen and Toast online. You can find me on pretty much anywhere under that name. That's the usefulness of it, no one else uses it. You may be wondering where Goose comes from. It actually comes from my last name Mongeau. Now you might not be able to spell or pronounce this and because of that, you might call me Mongeau and that just ends up getting shortened to Goose. Now, everyone frequently reminds me that this comes from the movie Top Gun, but I haven't seen Top Gun and there's a very dark truth that I found in Top Gun and it led me to do a little bit of an investigation and I found there are a number of movies that feature characters named Goose. So the first one is Mad Max and the Goose in that movie dies in a horrible fire. The movie City of God has a character named Goose who gets shot and dies and there's the movie Top Gun and I assume that he dies in some kind of automobile accident. The alternative to that being called Goose was that people once tried to shorten my name to Mongeau and that was all fine and dandy but when you're in developer chats, it becomes kind of confusing and people don't always necessarily say the best things about Mongeau. So you run into this problem where you're like, you're not sure if they're talking about you or the database and you don't want to have an existential crisis with a database, a no-SQL database because it makes it really hard to build new relationships. Perfect. You know everything you need to know about me now. All right, so we're gonna get started with is it food, a journey of mines, machines and meals and when I give talks, I like to think about why do I want to give this particular talk and the reason why I wanted to give this particular talk was I wanted to learn about machine learning and I watched a bunch of talks and other things on it and I always felt like the information was either way too high level or way too low level for me and I couldn't find that happy medium that would make me happy. So one of the problems that I often had was I'd watch these talks and I'd see things like this. A lot of charts talk about things like this in reference linear regression or gradient descent or other things like that. And then I try to look up definitions for things like gradient descent and I get this. Gradient descent is a first order iterative optimization algorithm to find a local minimum of a function using gradient descent. One takes steps proportional to the negative of the gradient or of the approximate gradient of the function at the current point. That is how I feel when I read these types of things. So I wanted to set up some goals so that my talk wouldn't be this. And I have two main goals with this talk. I wanna focus on practicality. I wanna talk about a real world use case and frame everything that I'm talking about kind of around that idea. But I should first start with some disclaimers. I'm gonna try my best to balance between practical and technical. And as a result, I might end up oversimplifying some things. This is really just meant to be a high level overview and my goal with this is to have a few of you at the end of this go, oh, that was neat. Maybe I'll try it too. So I wanna start off with a brief experiment. I wanna find out is something food or more specifically as people like to remind me that images can't be food, is it a photograph of food? So I hope this will be a fruitful experiment. Is this food? Sorry, sorry. Is this a photograph of food? How do we know? Exactly, we've seen things that look like this or exactly this kind of thing and we know it's food. But that doesn't really get to the meat of our problem. So some of you may never have seen this particular dish before, yakitori. But how do we know that this is food? It smells good. Okay. So part of it is like we can immediately recognize the ingredients and break it down and kind of process it and it looks like something that we've seen before. That's pretty sweet but what about this? Is this food? See, this one's really questionable but if I give you the contrast here and I say is this food and then I say is this food, right? You might say that this one isn't food but this one might be because we can tell that there's something inside of there and we actually have a lot of experiences where we've opened up this package and eaten the delicious crispy Kit Kats and been very happy. And it's really awesome if you hand a child a package of empty Kit Kats and they're not used to this experience yet. They get very, very disappointed because there's nothing in there. All right, so I just have a couple more. Is this food? Is it a picture of food? I'm not sure. All right, I think we can clearly identify this as food but what about this? I really hope nobody said yes. But I really wanna talk about is we're able to look at these pictures and identify their food because we know about pattern recognition and in many ways this was core to our survival in order to be able to do pattern recognition. It's one of the key factors that led us to developing language, to being able to develop tools and do agriculture and we're really exceptional at it except we hate slow and menial tasks. If I gave you 10,000 images and I said are these all food, you would get bored really quickly and not want to do that. So that makes us try to venture out and automate the problem. Now this is an interesting kind of problem to try and automate because as humans we can recall information in an extraordinarily fuzzy way and between past experiences we can piece together the information we had before in order to form new information or understanding about the world around us but this process is very hard to program and so machine learning can be used as an approximation of this kind of behavior. So I wanna talk about three different types of machine learning in this situation. We have unsupervised learning which specifically refers to taking information and kind of clustering it and grouping it together into individual parts and the computer decides what that necessarily means. So you might have a chart that has all of these things plotted out and here you can see there are these centroids that say ah, this is the blue group, this is the green group and this is the red group and it doesn't really matter where a dot lies, it just looks to what centroid is it closest to. So anything in the green area would be green, anything in the red area would be red and anything in the blue area would be blue and this is the kind of unsupervised learning. It just kind of clusters everything together into individual buckets and that's kind of useful but sometimes we wanna know can we label something and so that's where supervised learning comes in and supervised learning you provide all the information up front and say these things are this, these things are this and these things are this and then the system tries to learn how to identify new information in that way and the last of these three categories is reinforcement. Reinforcement is kind of similar except every step of the way you tell it whether it's right or wrong so it tries to make a guess and you say yes, you are right and it takes that information and applies it to the next iteration so that it can get better over time. Now for the purpose of identifying images we're only gonna be talking about supervised learning here and specifically we're gonna be talking about neural networks. So I want to give you a brief history of artificial neural networks. So one of the first artificial neural networks created was the perceptron and it was invented in 1957 by Frank Rosenblatt and this machine was actually designed for image recognition back in 1957 which I find extraordinarily fascinating but one of the problems with this style of image recognition was that it couldn't really learn how to do specific types of algorithms and so it could only learn linearly separable patterns and that's how I felt about that definition. Now the problem with the research and the fact that it couldn't progress forward was that it caused neural networks to stagnate for quite a bit. Not a lot happened and because of that I'm just gonna fast forward to 2015 and talk about TensorFlow and that's the brief history of artificial neural networks. All right, so what is TensorFlow? So TensorFlow was developed as a system of building and training neural networks which are represented as something called a data flow graph. They can look something like this. Each layer of this graph which is represented by one of those oval shapes takes in a tensor and returns a tensor performing some operation on the tensor. What is a tensor? So when I look this up I got an image that looked like this and when I saw this I just assumed I would never be able to understand what a tensor was. But the simple answer is that a tensor is just an n dimensional array of information. You can choose whatever dimension however you wanna represent it and it's just a set of information that goes in and then you have another n dimensional array that comes out. They can be whatever dimensions are necessary for solving the problem. So in this situation you take in some array and it goes through each individual step and sometimes it splits out into more steps and converges onto a single step. So something will come in at the beginning, where it gets way all the way through each of these steps and eventually you'll get some result on the outside. Now this particular data flow graph represents inception or specifically inception version three. So you might ask what is inception? So inception is a pre-built data flow graph useful for categorizing images and it was originally built on the ImageNet dataset. Now this is a pretty interesting website because you can use it to get human classified images for categories and there are tons of them available to you. So it's really useful if you're trying to do any kind of machine learning to be able to download these sets of images and use them in your projects for training purposes. So that covers what is inception but how does this all tie together? And I wanna return to my original question, is it food? Why am I trying to answer this question? Well, as you may surmised from my shirt I worked for a company called Cookpad and we happen to have lots and lots and lots of data about food and specifically we have lots of images of food but when you have a website that allows users to submit content you run into some problems. They might not always be trying to submit food to your website and you kind of care about that because if you don't care about that very bad things can happen and people can get upset really, really quickly. So there's a couple of things that we wanna do. We wanna protect our users by ensuring that the images posted are actually food for a number of reasons. We don't wanna show them something that's inappropriate. We also found that users like to do things like put text on their images and say, oh you can find the real recipe over on this other competing website and we don't want those types of things to happen either. So where I work at Cookpad we essentially do that. We have an app that basically looks at an image and classifies whether or not it's food and so I thought it would be really interesting to try and recreate what was already created and do it with a Rails app. I mean our main application is a Rails website but the machine learning stuff is all done in Python. So I decided I was going to build a Rails app. Mistake number one. Actually in all fairness mistake number one was when I went into this problem I told myself no Python. I'm not gonna use Python, I'm going to use Ruby because I'm giving this talk at RailsConf and no one wants to hear the dirty word Python. So this would be fine. There's a gem for kind of handling this called TensorFlow.rb. Now I tried this a number of times but I actually wasn't able to get it running on my machine. My guess is that there's actually an issue with Clang and this gets into compilers and no one enjoys trying to debug a compiler on their machine. You just kind of want to download it and have it work. So that didn't happen and so one of their suggestions was to use Docker. Mistake number two was of course Docker. So after setting up the Docker image there were a couple programs written in C and C++ for kind of solving the problems that I wanted to do and I tried compiling those and those didn't work. So I kept retrying and retrying and retrying and then I got my favorite Docker problem which is your startup disk is almost full. So after figuring out through the documentation how to delete all of my images I decided to start over and so I kind of want to document my road to success for solving this kind of problem with one small note. Your mileage may vary. So let me go over installation and setup. So this was my starting point. I decided to install Python finally, set up a virtual end and go through the process of installing TensorFlow. Now I want to make a quick note here. As a Rubyist I want to embrace the Ruby community. I want things to work in the Ruby community and here I'm telling you to use Python but I want to be clear here. This isn't saying that we shouldn't as a Ruby community embrace machine learning and try to make it part of our community but if you're getting started you do not want to fight against your tools and right now the best tool set that I found is using Python. The community is really strong in machine learning. They put together the correct tooling. I highly suggest not fighting against it. I suggest using it, going with the flow and using the tools that work and so for me I got this setup and it just worked being able to use TensorFlow. Now the next step after installing TensorFlow through this process was to figure out how can I make it solve my particular problem. There is already a tool inception which works on image net images and knows how to classify those but I want to do something different. I want to do retraining. I want to change which image set it's able to identify to an image set that I actually care about and we can do this through something called transfer learning. Transfer learning is basically if we look at this data flow graph we have one step at the end and if we pull off that step we can replace it with our own step and the benefit of this is we can actually use everything that was previously learned and apply it to our current set of data. So if you want to work with inception v3 and have it recognize your own image set you basically just need to collect a bunch of data and you need to collect it into a folder. You can call it whatever you want. I called it data because I'm not really creative and inside that folder you just decide what categories you're going to have. Now which categories you choose is really crucial. It's not enough just to have a bunch of pictures of food. If all you've ever seen in your entire existence is food then all you know is food. So you want to be able to identify things that are not food. So if you can think of cases that are particular to your situation that you want to identify against you should also have images for those. In our situation we didn't really care about flowers but those came with an example so my company used them for some reason. But we cared about people. We don't want people showing up in any of our images. We want to protect our user's privacy. We want to remove any pictures of humans. And additionally we wanted to avoid text for that problem that I mentioned before. We had lots of users who wanted to say ah go watch the video for this recipe over here. So we don't want that to happen. So we created these categories and inside that we put lots and lots and lots of images. Now for my training purposes the folders that I had had between 1000 to 2000 images except for text which has around 600. And the nice thing about this is the images can be really small. TensorFlow actually only operates on images that are about 299 pixels by 299 pixels. So if you don't have an image in that size it will automatically resize this. And this is another thing that you care about with your test data. If you don't have your image properly resized and it tries to resize it if your subject isn't in the center you might lose that information. So it could make sense to go ahead and resize your image ahead of time to make sure that your subject is in there properly. All right so now that I've collected all of my data my next step is to try and retrain. And I'm no expert in machine learning so I didn't want to write this script myself. Luckily there's already a script that does it. So I pulled it down from the TensorFlow repository on GitHub. I changed some directories so that instead of outputting the temp I set output it into my local directory a few files and then I decided to run the retrain program telling it that the image drawer was my data directory. And then I waited and waited and waited and waited. And while I was waiting it outputted things like this. It said looking for images in all those directories and it started building bottlenecks. It's like what is a bottleneck? I do not have any clue. So I looked this up and the definition for a bottleneck was an informal term referring to the output of the previous layer. And the reason we care about this is because in order to train the network it's going to have to keep passing images through and seeing how well it's doing. And you don't want to have the image have to go all the way through your data flow graph every single time. So actually it builds it once and caches it so that it can reuse it. And if you actually do this from scratch you won't get this pretty printing of 100 bottleneck files created. It'll actually go file by file telling you it's created the bottlenecks. This is what happens after it's cached. If it's already cached those values and can find them then it will just print this out instead of doing them all by hand. The next thing you'll see is this. It'll start referring to things like train accuracy, cross entropy and validation accuracy. And this was kind of interesting to me. I wanted to know why it was printing out these three things. And what's really neat is TensorFlow is going to do a very interesting thing where it splits out your data into three individual parts. One for training, one for validation and one for testing. Now training is the data that tune your model on. So it's gonna say like ah these look like these and I'm gonna keep making things better along the way. And you try to make sure that through the process it never sees the testing data in order to validate. So once it's finished this program will actually run the testing data through and see oh do we get these correct. But it doesn't use it as a method of training it only uses it for validation. And that's used to avoid overfitting. We wanna make sure improvements in training accuracy actually appear in the unseen data set. So if there isn't an improvement that it can see it doesn't actually care about those particular weights. And then the last part cross entropy is your loss metric. And this is important. Every single time you wanna train a neural network you need some kind of loss metric to say this is bad. You wanna minimize that value so that you can make your system better. And all of these values are focusing on the accuracy. But the results of this running this are two different files that get output. The graph and the labels. Now the graph is kind of complex it's an encoded format of that previous thing that I showed you that has each step. And that's not really interesting to open and look at because it's kind of a weird encoding a mixture of texts and binary. But the labels file is actually really simple. It looks like this. And the reason why is because when we're talking about a tensor it's mostly dealing with numbers and at the output you get some category it belongs to. Zero, one, two, three, four. And you care about the actual label of that category. So it just outputs this file with those labels so that afterwards it can look up that label and say ah, this was food, this was a person, this was a flower. So let's get to the point where we can use it. Now once I've retrained I want to be able to take that train network and label my image. And I just found some code online again in order to be able to do that. So I just copied this label image.py into my current directory. I changed the graph and labels to my local directory and it ended up looking something like this. So this is actually fairly simple compared to the retrain which I won't go into. The retrain's about a thousand lines but this is approximately 50 or so. And all it does is it basically takes that last layer, that's the final result part on there and runs your image through just that last layer since that last layer has already been trained to do everything that we needed to do. And so this process is actually pretty quick. We get to the point where we're at the end here and you can see that it just looks through the labels file and finds out which label it needs to pull out. So great, I've done this step. I've gotten to the point where I can potentially label images. But let's take a look at that actually working. All right, so I have a few test images in here. So here we have a directory with some images in it. I have sushi pizza, which I suggest you never eat. I have this post box, a hamburger, a pizza and some dirt. So let's find out which of those are food and which ones aren't. So in order to do this, I just have to run Python label image and then my image. So let's start with the sushi pizza. And so you can see here, hopefully make it a little bigger, that for food it said that it had 0.9 as its guess. And that's really high. So it thinks that this is probably food. Let's try the dirt. And here we can see that it's in the category of other. So that's great. We don't want dirt to show up as food. So we can use this on all of our images. Test three is the pizza. And of course, it's not so sure that that's pizza. It's at 84% instead of 90 for its food. But it's still relatively high enough. And that's a point to take into consideration. It's never gonna be 100% accurate. You're never gonna get 100% this is food unless all you've ever trained it on is food and it only has one category. There's always gonna be a small percentage that it could be something else. And you never wanna achieve perfect accuracy because you're probably gonna overfit and it's gonna be bad at solving the general problems that you want. All right, so let's go back. We have this and we can see that it's training correctly. It's been trained correctly and can identify food. All right, so I got to the point where I wanna make this work with a web server. I wanna be able to use this in my Rails application. So how do I do that? I have code in Python and I want it inside of Rails. And so, as I mentioned before, I'm already doing this exact thing at the company where I work. So my suggestion here is actually just to treat the machine learning aspect of it as a separate service and call it from your Rails application. So, in order to build a service, I decided to use something called Flask, which is kind of like Sinatra. Again, it was just pip install, upgrade Flask. And then I just converted that label image.py file into a small server that I could use to get the response that I wanted. And it actually didn't take too much. This looks very similar to that label imagepy file that I had before. But here I've created a server out of it. And it only has one route, classify, and it's gonna return the predictions for what things are. Now it's kind of important to return the predictions because you wanna look at not just the top prediction, but all of your percentages to see if it's a high probability. It could be something that you're not looking for. Maybe it's worthwhile to label something that it thinks is 30% a person or 30% a text if your system cares about that. We basically red flag images that fall under those categories and then they're reviewed by an actual person at that point if they need to be. But we found that we actually get about 98% accuracy on all images that are uploaded. All right, so let's use this. I built a small Rails application that is actually going to take the images and identify whether or not they're food by calling out to the service that we built. All right, so here it is. We're gonna choose a file and let's start with our sushi pizza. Food. All right, let's try something harder. I'm gonna give a taco baby. Let's find out a taco baby is food or not. Not food. Yes, good job. All right, we can try that with any number of images, but again, it'll have the same predictions that we had before. So if you wanna get started, that's all it really took was downloading a few scripts offline, collecting a bunch of images, putting them together and running them and finding a way to make it work. Now, when I first started this path again, I said I was going to use Ruby and not Python. I wanna make sure that no one makes that same mistake. I spent an entire day just messing around with Docker and messing around in the command line, trying to find which flags can I use with Clang. Okay, it's not gonna work with Clang. Let me try to set up GCC on my system and make it work with GCC and going through that whole song and dance. And after spending an entire day on that, I went the Python route and it took five minutes. And so the lesson here is to make sure that you understand what you're trying to do and not fighting against it. Don't try to do it your own unique way. If you're learning, once you've learned it, then you can go back and do it your own unique way and suffer, if you'd like. Now, in my process, I read some things and I suggest also reading these as additional material. There's demystifying deep neural nets by Rosie Campbell. I highly, highly recommend this talk after my talk. It's a really great, it goes more in-depth into some of the more technical aspects of things. And then if you're interested in a book, I highly suggest Python Machine Learning by Sebastian Rockshaw. And that's it. What was the probability of taco-baby? All right, we can find this out. It's other. It might be a person. 30% it might be a person. I don't actually know the answer to that question. So, oh, sorry, I have to repeat the question. The question was, how many images a day is Cookpad classifying? And I'm actually not sure on the exact number, so I can't say. I know that it's quite a few, but that's it. Sorry. Yes, I can. Okay, the question was, when we're classifying text, what does that kind of look like? What kind of fonts are we using? Or is it an overlay? And I'll answer that question first, but I wanna make a point that as an international company, we actually have our biggest market in Arabic-speaking countries. And so we found that Arabic text was the majority of our problem. Because of that reason, all of the, most of the text in here is going to be Arabic. So we just have lots of examples of Arabic text with some emoji and other things that we're trying to identify. You can see some of them are on top of things that kind of look like recipes. Some of it is English text. Not sure why it's about hookworms, but we have lots of different examples in here. Different fonts, different backgrounds, things like that. I have not messed, okay, so the question was about Google Cloud's Vision API. I have not used that yet. So I'm not on the machine learning team specifically, so I haven't messed with those tools. I've only kind of gotten trickle down education about machine learning and decided like I was interested in this and I wanted to explore it more. So I'm a Rails developer, not a Python developer, and I kind of wanna stay as far away from Python as I can. But I'm becoming more and more interested in it as an avenue of learning these things. So the question was what would it take to be able to do this in Ruby and have it not suck? So actually it's gonna take the community to take a look at TensorFlow.rb and trying to get it compiling on all systems. The main developer for it doesn't use OSX, he only uses Linux, and he's not necessarily writing it for that particular thing. So if people can get involved to have more knowledge about TensorFlow itself and C++ coding and Ruby and making extensions for those types of things, that would be extraordinarily helpful in making this a reality in Ruby. So the question was, when classifying images, how long does it take per image? How does it scale as you add more images into the system? What's the memory usage like? I don't have those particular statistics. I found that categorizing images didn't take too long and even on our system where we have a lot more training data, it's still a roughly short amount of time per image because as I mentioned, you're actually only putting the image through the last layer of that data flow graph. The whole graph has been trained up to the point where the last layer gets to reuse all of that information. And that process is fairly quick, but I would still suggest doing it asynchronously rather than synchronously like I did in my example. Okay, so the question was, as you're collecting images, how do you know what's the right amount of images to collect and if your accuracy isn't correct, how do you try and correct for it? And so the answer is, there's no way of knowing what the correct number of images are. Because again, it's an imperfect science, right? So you wanna collect a lot of images, but there's no cut-off point where that number becomes the correct number. The more images you have are great, but it's not only just the more images you have of, in my case, food, it's more images of things that aren't food. And when you notice that things aren't working quite the way you want, if you can figure out, okay, we wanted it to identify it as this, but it didn't, getting more examples that are closer to that and giving it more training data that kind of represents that single form that you're looking for can help improve that. Okay, so the question is, what size data set do I think is the appropriate size for doing this kind of image classification in production? You know, it really comes down to a couple of things. I think the image size data set that I had for just this testing worked well enough to use it in production. My accuracy is probably gonna be a little bit lower, it's 80%, but I can start it moving and then kind of find is it working or not. If I'm always trying to push for that certain percentage, I might never hit that exact one. I think our goal was to get it to about 98% at Cookpad, and if we could do that with a 98%, we were fine with the 2% that wouldn't get categorized correctly. So it's kind of trying to figure out how important is it for this classification to happen and for which specific cases do you actually care about it being important? We don't want humans showing up on our thing, so we want it really good at identifying like this is a human and we don't want that picture. For a various number of reasons, we don't want that to happen. So we care about that accuracy in that situation. So I think you need to specifically decide like which situations do you care about. If I showed you a picture of a duck and a picture of a duck ends up on the website, like I care a little bit less about that, especially if it's a cooked duck, but if it's not a cooked duck, then, I probably don't care as much as if a person showed up or text, because those are the specific situations that are actually difficult for us to mitigate. So we don't actually retrain that frequently. Part of it was once you get the system kind of stable and you don't see any problems, there really isn't a reason to keep trying to retrain it. If you can get the numbers that you're looking for, it's fine, oh sorry, I should repeat the question. How often do we retrain? So it doesn't happen that often. Far less than you would think. I think unless there's some catastrophic problem where we're like, oh no, we miss this situation or another situation that could come up is users find a new way to abuse our system and we wanna say like, all right, we need to prevent this, how can we prevent that? Then we might retrain it on an additional set of data, getting better at classifying this new set of problems. Have I ever participated in the image classification competition? No. So that's cool. Yeah, I mean image classification has a lot of potential uses moving forward and especially as we get better and better at doing it, there are more situations where we can find to use this. My favorite story as I was reading about image classification was this farmer in Japan, mainly because I live in Japan right now, but his family was a farming family and they sold cucumbers. And I don't know if you know about food in Japan, but they're very weird particular things that they identify and say like, oh, this isn't a regular cucumber, this is a premium cucumber and we're gonna charge you $30 for this premium cucumber. And all of that kind of classification was done by hand. And so an engineer who wasn't really a programmer but could basically build the hardware around this used image training to have it identify like, ah, this cucumber is this category and this one and this one and automatically sorted his cucumbers and kind of made that process a lot better. There's an article online you can read about that particular problem, but the point is that image classification can be used in any number of industries for any number of reasons. And I think as we move forward as developers, it's gonna become more and more commonplace to wanna be able to do things like this. So I think I'm out of time. Thank you everyone. Thank you.