 I'm going to try and give you the shortcut and the cheats way to do this stuff on mobile. OK, how many iOS developers do we have? All right, cool, quite a few. OK, more than I expected, that's great. OK, so I'm curious, how many of you have watched the TV show Silicon Valley? Almost everyone, yeah? So this is probably one of the most famous episodes of the hot dog, not hot dog. Everyone knows what I'm talking about when we're talking about this, yes? OK, so I'm going to teach you how to do this. But rather than do hot dog, not hot dog, we're going to do satay, not satay. So what was the call out? So basically what I've done, and I'm just going to go through this recently quickly, so don't worry. We're not going to go too late or anything. I'm going to show you how you would sort of take something like this, and if you were given sort of like one night to do it, or you're giving a really short time to do it, how would you approach this problem? So the first thing is the data set. So the data set that I started out with, and I'll talk about some of the challenges that I had doing this as well. So the data set I started out with was 2,000 images of satay pictures. If you're ever having to, I realize one of the things I didn't put in here. If you ever have to download a whole bunch of images very quickly from the web, there's a very cool Chrome extension called FatKan, F-A-T-K-U-N, which allows you to bulk, download a whole bunch of images. Of course, you will check that you have the copyright to use those images, first of all. The biggest challenge with this, though, is what is not satay. And this is also the challenge that the guys who built the HBO model for the hot dog also had, right, is how do you define classifying a negative to say something? OK, this is what satay is. That's pretty easy. What's not satay? So I took 2,500 images, mostly other foods. But we threw in a bunch of other weird random things as well. If I was doing it more, I'd probably increase this a lot more and put even more out there images. I'll talk about some of the results maybe later on. One of the big things you've got to do, though, is train it with basic image augmentation. So I realize we haven't really talked about this a lot. Obviously, the more advanced people will understand what image augmentation is. Image augmentation, especially in Keras, is very easy to do. You just use an image data generator. And basically, you tell it that each time you load these images, I want you to sort of randomly rotate them, randomly rescale them, randomly zoom in a bit, randomly image flip it maybe so that if something was on the left, now it's on the right, that kind of thing. This allows you to take that small data set of 2,000 images and really turn it into a lot more images. And you're basically just sampling from that data set, running it through the image augmentation and changing it. So you can actually sort of define how many batches you want to do of something like that. So the models. So both Martin and Andrea have talked a lot about models. I'm not going to harp on about this that much. Obviously, from mobile, the smaller, the better. It's all about the number of parameters in your models. But then sometimes you find that it's even more than that. Hence, the weight rounding, quantization, all these other things that you can use to improve. So I took these models. And one of the reasons why I took these models is that all of these models are available as sort of off-the-shelf models in Keras. That you can basically go into Keras.applications and load one of these models trained with ImageNet. And then basically just cut off the top and retrain it for whatever you're trying to classify. This is some of the sort of feedback, or this is some of the results that I got with this. A lot of the models don't turn out to be the size of what they should be. Even you saw in Andrea's thing that his SqueezeNet was 12.7 meg. And Martin told you that in the academic paper, it's only like 500k. And you can see my one was like 5.6 meg. Things change, right? When you get out there in the real world, you find that you need to be flexible. One of the really weird things here, though, is VGG16, which should be, if you download the ImageNet weights to it, it's about 550 meg. And it's come out at only 110 meg. Now, you can see I put the Keras size. It's basically when you save the model. The parameters is the number of parameters. I'll do a walkthrough in a minute. You'll see some of these things. And then what we're going to be using is Apple's new iOS 11 thing. I'll talk about that more in a minute. And then also, Apple also made some pre-made some models that are using this. And those pre-made models weren't made. I'm almost 100% certain they weren't made in Keras. They were most likely made in CAFE. But you can sort of see how the size is compared. So let me just show you really quickly a walkthrough. So the whole idea here is just to do basic, you know, the whole idea here, I need to zoom in. How's that? Better? The whole idea is that we're basically just using transfer learning. So we're using everything that ImageNet has been trained on to extract features from images. And we're generally just looking to retrain the last few layers. So you can see that here I'm basically sort of defining where I'm getting the stuff. I'm defining an input. And then basically, I'm just loading up, in this case, VG16. But I did one of these for each of those five different models before. So in this case, VG16, I'm saying that I want the ImageNet weights. I'm saying that I don't want the sort of last few layers of the network, the top of the network. And then I just run through those layers and set trainable to be false. So meanings that I don't want to have to retrain all the convolutions. I'm just going to use whatever the convolutions were set for ImageNet. And I'm just going to retrain the last few layers. Martin's talked about this before in depth when we've talked about transfer learning. OK, so in Keras, we can do a very quick summary to see the model here. We can see our total params. Then I check on the other layers. So basically just take the output of that VGG, flatten it, add some dense layers, occasionally add a dropout. Dropout could be left out here, too. And then just run it through a softmax. And I'm doing it just down to two classes, right? Satay, not satay. That's all I'm caring about here. So obviously in ImageNet, this would be 1,000 classes. I was a bit disturbed to see on the Apple forums where people were talking about how to do this, that a lot of people seem to think that if you're retraining ImageNet and you want to classify satay, not satay, or something along those lines, that you should have 1,002 classes. Doesn't work like that, all right? We're just trying to classify two things. We have two classes. OK, so now there's the summary of the whole thing. And here's my code for the image augmentation. Basically, I'm just defining an image generator. I'm setting the rescaling of it, so I'm basically changing it from being 0 to 255 to being 0 to 1. I've got some zoom there. I've got some horizontal flip. I set up two generators, one for the training, one for the validation. And then I basically just do a simple model.fit. All this stuff is very basic. I think if you're brand new, if you're a beginner, then this is great stuff to learn, right? I'm guessing for most of the people here, we've gone through things like this before. But I want you to see that even with really basic stuff, you can get pretty decent results. So then all I do is basically save that out. And I just save that as a Keras model. Back to my slides. Now, the cool thing with this is with the Core ML tools is it has the ability to read a Keras model and transfer it from being a Keras model to being a Core ML model. So Core ML is Apple's new, wonderful sort of machine learning system, right? It has the ability to run on metal. So it means it can be sort of GPU optimized. It can be improved in a whole ways. It's all done behind the scenes. You don't need to worry about any of that stuff. But OK, to set it up, so there's two sort of parts to Core ML. When people talk about Core ML, Core ML is actually the SDK that's running on the phone, right? Core ML tools is a Python package that we use to convert from cafe models or Keras models, in this case, to that Apple format that we're going to then put on the phone. So to set this up is really simple. The one thing that sucks about it is that it has to be Python 2.7. Unfortunately, for whatever reason, there's a few things that are sort of limiting with it. OK, it also needs to be TensorFlow. At the moment, it's still just on TensorFlow 1.1. I imagine that will be updated over time. Don't forget iOS 11 is still in beta, so I'm sure these things will be updated over time. And also, they recommend Keras, the highest version of Keras is 2.04. At the moment, we've got 2.06 out. I found most of the time when I used 2.06, I had no problems. But occasionally, I did. The problem, though, is if you train the model in 2.06 on one model, then it doesn't load very well in 2.04 on a different machine or even on the same machine. OK, to install it, very simple. PIP install core ML tools. I will sort of go through and show you what you do. Now, here's the thing. When you're basically defining a model to convert, you kind of have to define basically the inputs of what's the input to this model. What are the outputs? And this is very similar to what Andrea sort of talked about. Are your outputs a classifier with two classes? Is your output going to be? So some of you have seen me do the super resolution models. If I was doing something like that, you would be outputting a multi-dimensional array rather than actual classification. The other thing, too, is that you've got then settings things like, is BGR? So is your model the order of the layers, red, green, blue, or blue and green, red? Different models use different orders. And you also have color biases. And the color biases is exactly what Andrea was showing you in C++, that with models like VGG16 or certain models, you will subtract the mean before you put it in. And so with your color biases, you would basically say that the color bias for red is whatever the subtraction is going to be. Very, very simple thing to do, very simple thing to look up. So let's look at it in Jupyter. So this is really simple. You basically just import CoroML and Keras. Actually, I didn't even need to import TensorFlow. I just basically import a TensorFlow to show you what version of TensorFlow I'm using and stuff. You then basically just load the model as if it's in Keras. So the only thing that you need to import from Keras is load model. You don't need really anything else from that. But you basically just load the model. You can see here's the same model that might actually be a different model that I've loaded here. But you can see the same sort of summary of loading a model into memory. And we're going to use that. Then to convert it, all we do is we have this one line. And we basically just say coromaltools.converters.keras.convert. And we basically put it in our model, our input names, and then our outputs are going to be the classes. And you can see there I've got the class labels also named there. And what it does is it goes through and it then basically converts all that Keras stuff to coromaltools. So you'll see back earlier on when I showed you the list of different models, two of the models weren't working. So SqueezeNet at the moment wasn't working. And also the MobileNet model wasn't working. Mostly because they're using things that are maybe a little bit unusual in Keras. And Apple hasn't written. The way coromaltools works is it knows how a certain Keras layer is, and it then writes the conversion for that layer. So anything that's sort of like a custom layer, it will then often have problems with. Things like MobileNet's model uses depth-wise convolutions so that at the moment is not supported. I suspect that over time you will see more of these things be supported. It seems that Apple, for whatever reason, has preferred to support Cafe a lot more at the moment. So it seems to have more things supported, but I suspect later on they will come. You can even go in and write your own ones if you want to for that. Also, I suspect in the future we'll start to see native TensorFlow support for this as well. The challenges they just need to write it all. So OK, once we've converted our model, here's the sole line of code that we use to basically convert the model. Then we basically add some metadata to it. And then you can actually see what the model is. So we can see that this is what the model has put together. Now, if you're running the latest version of OSX, which is still in beta, you could actually make predictions from the Corel model in your Mac to test it. So unfortunately, I'm running 10.12, not 10.13 beta. But so it would throw an error at the moment. But if we had that in, and obviously when that comes out, you'll be able to just use these lines of code to actually then load a picture in and make a prediction on it and see, OK, is it doing what I think it's doing. Especially at the start, you're going to find something like Satay not Satay. You may have got the classes order wrong or something, because Keras is doing all that automatically. And you might find it flipped. So you need to sort of check that out. And then basically, we just saved the model. And this is all I do to save the model. So now, bringing that model into Xcode and iOS couldn't be easier. All you literally do is you take that exported model and drag it in to your Xcode project. You add it to your build, which would be on the right here. I haven't shown that, but you add it to your build. And then basically, you'll see that when you click on it, this is what you should see. It's giving you all that metadata that we had before. It's also what it's done is it's made an interface file for this. And it can make the interface file in both Swift and Object C. And it shows us here now what our input needs to be. So we can see that it basically needs to be an image file, RGB, 229 pixels by 229 pixels. And it's going to return a dictionary and a class label. Now, to put that in, to use that in your app, all you have to do is basically instantiate the model. So you can see, I can't really mouse over, but you can see here that I'm basically just instantiating the model. If you see the line, let model equals try v and core ML model. And then basically, you can see there, all I've got is just satay01.model. That's the whole code for instantiating your model. And then I basically add a completion handler so that after I've passed an image in, I've got something to handle the response that gets back. And that response is going to be like this. So if you look at this code, this is very simple code that's basically saying, OK, if the classification is less than 97%, say that it's not satay. Because it tends to get more things wrong in that direction than in the other direction. And then I've also got some code in here. So I figured, OK, if you're going this way, you might as well get Siri to read it out for you and tell you whether you actually got satay or not satay. So you can see down the bottom, I've got synthesized speech from string and basically just passing in the complete sentence that I've allocated there. And you end up with something like this. So do we have audio? This looks like a satay by 100% sure. This looks like a satay by 100% sure. It's not satay. It's not satay. This looks like a satay by 100% sure. OK. So I said that you were doing it. What were you using to do it live? Because I was trying to work that out. It's just quick time does it now. I forgot all about that. I used to have Reflector installed. But anyway, I want you to see how simple this could be to just build an app like that. It's not perfect. The amount of data that it's been trained on, it's very good at spotting the sticks of satay. So anything that looks like a stick, it will jump onto that as, oh yeah, OK, that's satay. But where probably the model is flawed is if anything has too much fine detail that's straight in an image, it then may confuse those for being the sticks as well. But I wanted you to see that where this is going, and I think this is not just with iOS. I think you're going to start seeing things like this with Android as well, where a lot of these things are going to get easier and easier to do some of the things. Now, certainly what Andrea was doing with more advanced stuff, with the object detection, or anything that you wanted to do that was maybe more advanced, you are going to need to go down to C++ at some stage. But to do a lot of these simple things, you could build a classifier and stick it in an app. And you're looking at excluding training time a few hours' work. So to sort of sum it up, I would say that Core ML makes getting a model on iOS very quick, very, very quick. And I think it's going to get better over time, meaning that we're going to be able to convert things from Keras, from TensorFlow, easier over time. It is still rather limited in what you can do with it. There will always be cases where you're going to need, especially if you want really small models or you want things that are really optimized. You're definitely better to go on Andrea's way. The cool thing with this, though, is that this is all being done on the Metal 2 framework. So it's actually using the GPU in your iPhone. And I think we're going to see one of the things that Martin talked about. I think we're going to see in the next generation iPhones that we're going to see much more powerful GPUs coming. And it's certainly happening with Qualcomm's chipset and some of the other chipsets for Android and stuff as well. We can see that this is the direction to go. And more and more, if you don't have to, there's no point in pushing something to the cloud to be classified. If you can do it on the phone, you're going to get it much quicker. It's not going to cost the person much in bandwidth, all those sorts of things. More and more, we're moving things to the Edge case. Like I said before, I think TensorFlow support probably will come. Apple's kind of hinted at it a few times, but hasn't announced anything. The other thing I would say is that last weekend, or last week, TensorFlow 1.3 dropped. That also has some cool new things for using in iOS with CocoaPods and being able to import stuff with that. So I've put up on GitHub the full source code to the app, Satay Not Satay. And I also put up the Keras and the CoreML stuff later on. And also I've put up a really cool, here I put up a link to a media article, this really good article that was written by the guy who actually made the hotdog not hotdog one. And they spent like three months doing it or something and tried all the different models and all the different ways of doing it. Interestingly, in the end, they built it with React Native. It's not built in native, but it's not sort of like iOS native or anything like that. That's it. Any questions? Everyone just wants to run. No, no, no. OK, good point. Yeah, good question. How do I label the images? I didn't need to label the images because by using the image data generator, I can just tell it that each folder is one label. So I literally just have one folder called Satay Images. Chucked all the Satay Images in there. Had one folder called Not Satay Images. Chucked all the other Images in there. I just have images in there. Oh, just downloading them. Right? The images called the images Satay. Sorry? You want to Google the images and grow the Satay. Yeah, that could be one way you could do it. Yeah, you're trusting your, you know, I, that extension that I mentioned, I really helped you to be able to just highlight all the images and just click download. And then download the next page, click, you know, in fact, it will automatically highlight all the images and you just click the ones that you don't want. So you can sit there very quickly. You can get, you know, 1,000 images quite quickly. So this is a good way for sort of hacking, labeling data. And it is a really interesting case, too, because technically I'm not using any of those images, right? So have I violated the copyright of those images or not? No one, that's, of course I wouldn't have done that, though. Any other questions? A lot of the learning classes, do you see that there's some of the addition of some of the classes that you could use? It's not a binary class. I'm wondering what that's all those things. You see, that's something that's, like, you can even recognize what's not a Satay or something. Oh, so what, the problem is with not a class, is that includes everything, right? So you, so I kind of in the end decided, okay, let's just go for, most people will compare this to one form of food and Satay, right? So I went for, sort of, most of the 2,500 images were just 2,000 images of different types of food. I, one of the problems that I had at the start, I was that I, is that, so I got one of my staff to actually download those 2,000 images and she had actually downloaded a lot of images from recipe sites. The problem with images from recipe sites is they have, often have text in the image, right? Which then sort of started to throw the model, that the model was clearly starting to look for text. And I'm sure if I had an image of Satay and it was, had big words Satay, it would suddenly decide, oh no, that's not Satay because there's text there, right? So you wanna be very sort of aware of things like that and look at your, the distribution of your data set and see, okay, well what's common here, what's not common here, right? And so by doing, by basically just going through and culling out the ones with text and sort of putting in some new ones, that fixed the problem straight away, right? Very, very, very quickly. Class and balance, in a sense. In some ways it's a class and balance, but the problem, yeah, so the question was, is this like a class and balance? Um. If you were to show in the 97, you actually really sort of, having that proper Satay image that you. Yes, right, so by going through the 97% confidence, I, and that was partly because I didn't want anything that sort of like has a number of lines that look like the sticks to throw the model. So I basically just started doing it and actually it's only a renowned then that it will throw, but it'll be really weird, like I was showing it to Andrea earlier on, right? And when I took a picture of something like this, it was fine, but then when we took a picture out that way, where there were lots of details in the image, it said, oh, it's Satay, right? So yeah, you want to be sort of like aware of that. The cool thing with the way of training with these image data generators though, is you're just constantly sampling batches from those two things. So it's not like, so the model will actually see a certain number of Satay images and the equivalent number of not Satay images. In that sense, it's randomly sort of doing it. If you saw, you can use the similar sort of technique I did last time with pseudo labeling, was trying to do that where I was trying to sort of trick the model by making sure that I had 75% from something that I knew was right and 25% from something that I knew was maybe not 100% right. But it is a little bit of a class imbalance, but because you're sampling from them both and it's just random, you're not actually going through every single image in a row kind of thing, it works out. Now, ideally if I was trying to make a production model app like this, I would definitely want a lot more images for both Satay and for not Satay. And I would constantly just be, certainly also train them a lot more. These ones I trained, after about 10 epochs, it was getting very good results. After 20 epochs, it was probably sort of like 95% accurate. So I figured, okay, yeah, that's enough. Okay, so in something like that, yes, you would need to sort of work out an example of frames with that. It would also use probably either image objected to the direction for that, which would be a little bit different. Or I may even use something like segmentation for that. So a segmentation model. So all the sort of self-driving car stuff is using segmentation models, non-classification models. Any other questions? Nope, that's it. So one mentioned earlier on about this, if you've got any questions or something like that, please feel free to come and ask us. I think most of the spots, we said that everyone who's sort of signed up already, like sign up for showing interest, we've sent it out already. Most of the spots are already sort of gone. I think you have work months already covered most of the sort of stuff. The big thing that with this though, that I would say is that it's very much a balance between theory and implementation. So everything that we teach you in theory, we expect you to be able to build. In fact, we will get you to build your own model of doing that sort of stuff. But anyway, that's it. Thank you for coming.