 Thumbs up. Yeah, I guess we get going. Good morning. So let me just reset this give the something up and this one Wonderful good morning Welcome to day two. I guess everyone had a very good first day from what I heard and Hopefully you're well fed now Just to get to know you who arrived and traveled to the conference from outside New Zealand I expected this one hand. I have few hands. Excellent. I learn You well informed You will know half my talk because he came through the international border and I'm going to talk about biosecurity and What are we doing in that space? Have you seen one of those before? And I'm talking about the poster and not the bug if you have seen the action bug Talk to me straight away But you find these At the port in magazines It's one of the media campaigns that the Ministry for Primary Industries, which is a big organization and I happen to work for them is running And part of the mission is biosecurity keeping pests out of the country and One of the pests we're really really concerned with is what's called the brown marmorated stink bug it's a mouthful and Yes, think buck will do We don't want it in the country and I'm going to talk a little bit about why and how that happens But to get an idea What happens is like we want to know we have lots of traps so to warm you up to this What do you think is the figure? thousand ten thousand Hundred thousand how big is New Zealand Okay, I give you a hint over two hundred sixty thousand square kilometers So across all pests not only insects and bugs on this for all kinds of stuff. It's 60,000 traps kind of deployed right now through the country not equally But if you just spread it out and said, okay, it's like four and a half square kilometers The trap it's not too bad, but then because we have to cover so many different spaces. It's really hard so We rely on other measures to actually get to know kind of there's something is in the country because traps work as an early warning system That's kind of the first thing if you're from Auckland about two years ago Queensland fruit fly you'll heard about that. It was here in Greyland right in the city so not somewhere in the countryside they found a larvae and it can really destroy the crops and At the end of the whole thing we found over 20 got eradicated and hasn't been seen back since So that was successful. It was really good so it's many different species, but We're really interested today in actual Bugs and shield bugs So why we're doing this one is trade Over 65 percent of New Zealand's GDP is primary industries forestry, fisheries fruit, veg theory, all that stuff So if that gets hit everyone gets hit in the pocket. So protecting that industry is utterly important There are outbreaks you will see it in the press from time to time so we have to deal with that and When it happens we want to control it and Again, we can't do it on our own. So a constant idea is how to engage the public Because most people have a mobile phone with a camera Pretty good resolution. I'm going to talk about how we're actually trying to use image recognition To give it into the hands of the public eventually To actually detect these things where I think oh it looks suspicious and I heard about it Snap and say like okay. It's good or not. So this is what I'm going to run you through Mainly the idea kind of the whole concept What we're trying to do and again, it's a proof of concept. We're right in the middle of it So this runs until March roughly so you don't see anything polished and finished I'm just giving you an insight in terms of what we tried what we learned how we put it together And that there is a ton of Python in the back end Bit of machine learning. That's not a deep dive or anything. It's just you know high-level okay What are we using what it looks like? Don't worry too much about it As they say any technology sufficiently advanced is indistinguishable for magic So we're going to use that and we'll see kind of there many many good libraries lots of help and Good stuff out there. So you don't actually have to be a total expert to get at least a prototype out Or give it a try get a feeling for it a Bit of taxonomy So that will be your biology class section of the day We're doing big of taxonomy 101 because that becomes really Important for kind of matching up data and kind of what we're feeding in and what we're using there And then how we go about getting the data Tools and environments, that's the most pythony bit of the whole talk and then putting it all together and Showing a little bit kind of how far we got and what we have so far and what looks like so The Ministry of Primary Industries is running Research and technology practice now for almost two years and the idea is we're picking up ideas out from the business out from the public We had a sponsor Senior management and say yep, let's try that and they put a bit of money into it and We bring people together often Outside the agency either cross agencies universities industry and for this Effort it was or is currently University of Canterbury Y Caddo landcare research and a few other entities The idea is that we want to reduce the workload on our staff because everything that gets reported We have a hotline where people can call up and say like I found this in my dustbin or in my shed It gets investigated and gets triage. It's not even your process We're running three laboratories throughout the country animal health and plant health And my house is like sick horses on that. So if you're importing exporting a racehorse, that's where you have to go Fungi myrtle rust you will have heard it in the news. That's plant health and apart from the insects obviously to Fungi and all kinds of bacteria and viruses and whatnot as well So if we launch a public campaign the workload increases and our scientists look at a lot of stuff That is not relevant at all And it could be with a bit of knowledge a bit of insight could be sorted and triaged out then We want to Because museum has like a lot of coastline and kind of like orders You want to increase the surveillance of what comes in through the ports cruise ships, for example freighters airplanes and whatnot and Engage people who Want to help and make it easy Right now. It's not necessary that easy. I mean yes, you can call up without telephone line reported on the web page If you're actually a farm manager, you'll be kind of more kind of knowledgeable kind of what to do But just for the members of the public is actually still pretty hard to make it quick and easy You might have seen it like with city councils is like, oh, there is Broken street lamp you want to report that by the time you've told the council kind of it's there and Spoken and kind of can you fix it? It's quite a process. And so you see all these apps rolling out That make it actually snap it Reported ten seconds click done. So that's behind that as well in public engagement and hopefully But we can detect it early because the earlier we get in there the easier it is to do something about it And hopefully eradicated so in the laboratory because I said come people reported and Stuff comes in you actually see typically images like that And yeah, it could be identified as a kiwi, but it's actually for size. So we could actually get the images of like, okay Yeah, that's how big the dollar coin is and that's how big is what I found We see like stuff from packaging We see the actually kind of like box pretty close up and you know turned around and then obviously kind of where they were Directly found on fruit in the wild It's not the early stages or kind of later on and Every case that's reported and that's with looking at Comes a case file that's investigated in the laboratory Not only visually and matched by entomologists. You kind of know okay looking for how many legs What does the antenna look like of the insect and things like that and then match it because a lot of them can Look quite similar, but they're different species and they have different impact on the environment Gene testing kind of like crunch it up and that could be also technology of the future We're instead of like taking the photo for example in traps kind of the pheromones or Like parts of the insect get just DNA analyzed on the go Once that blips up that triggers also that we have to look into it But right now focus is on okay, can we actually use everything that's coming in all our case files and Make the analysis in the triaging a bit easier so that the experts only look at what's really necessary we have Databases information systems all kinds of stuff and I just picked one of them That actually captures every case We got analyzed and I'm trying a little bit of the data kind of what usually happens What was the species where was it found? I Haven't information around it so we can go actually back in time and see also look for patterns and things like that That informs also the public engagement, but it doesn't contain everything and Because we are as MPI and the only one in the space the other sources as well So part of the trouble is kind of like bringing it all together in terms of the actually information and then Analyzing it right a Little bit about the machine learning part. It's a buzzword Well another talk previously here as well hear about it all the time For me, it's it's quite a paradigm shift from discrete programming You know, you have the insight, you know how to set up and write your filter to something that is more fuzzy In terms of you can't really hang your debugger of it anymore We can't you know really tries it It is in a way the magic and someone told me like well the magic is many fancy stats It is fancy stats, but it's actually wider field than that Machine learning is a wild field and Just point overview Oopsie an overview Um Dengish between the supervised and unsupervised learning Unsupervised is really nice because you set the machine up to learn itself recent case was learn to play Go without actually Being trained to do so on the basic rules figure it out yourself Place on so many games things like that Um Well entry point because we actually also wanted to know what we're doing while we're developing this app We said like now. We're not even starting to look at that. We go down the supervised learning route And as you've seen initially in the images that are coming in I mean we have to be also pretty clear kind of what goes in there and what we are training on So we know our data set at the end pretty well as well. So it shouldn't be that much more effort I'm then going down the regression route and then into neural networks and There especially convolution neural networks Mouthful, but it's because images as input data They're big Mention kind of like a standard smartphone and the resolution you get from an image and then each pixel With red, green and blue another in the image itself and big easily couple of megabytes And that's just one input point and then you map it all out Usually in this fashion And it becomes really really big and the deep part from the deep learning and deep neural network Comes from how many layers you have in the middle I guess a lot of you will have seen this before but The image every pixel map to an input node Stuff in the middle and here's the output node and the output nodes determine which classes This example, this is kind of the 101 from machine learning recognized handwritten digits It could be zero to nine In our case, it'll be what species is it? So our classes back here Will look different To set this up as quite tricky It's hard You need a lot of knowledge lucky enough with people who are that smart make Precooked networks available. It's pre-trained and There's plenty of them around one well-known as resnet for example. I mean if you look at Stuff from Facebook Google and so on kind of like all the Bigger players they actually put it out there an open source usually on GitHub, so you find it there and They have all the pros and cons because there is no free lunch It's not like hey here's the universal algorithm and here's your neural network and it can classify anything It really depends on the input data and what you're trying to solve so in that way it's still specific So if you're talking about facial image recognition Different to identifying insects and funny enough, there isn't that much out there on insects and Not even on scale. It's actually if you look into it. It's a hard target. It's small a Lot of them look the same and even if it's the same Because when they grow from larvae to actually the adult one they change quite a lot and There's a lot of variety. I'm almost like both humans um We were lucky enough that we just could fall back on one of these pre-cooked networks and Some of them Pre-trained on like a thousand categories from chairs dogs cats whatnot and Then you can cross train them. So you actually take your pre-trained network Create your own classes and then say, okay, you're only actually changing the last layer here to map it back to Confirm your new categories So you don't need the enormous Grunt the deep inside to actually build and then run your own neural network from scratch and Haven't touched on that before I only got into that part about a year ago And it's actually amazing how well that works for it was really kind of like okay Yeah, we're trained from scratch and can I convince my boss to invest into the nice machines with the fancy GPUs and good graphics cards again, we were a bit lucky because at my work with a big GIS department Geospatial information systems and they have nice grunting machines. So you can hide check them However Remember kind of the initial case why we're doing this you want to have it running on mobile phones and They amazingly grantee But they're not that grantee that can run kind of like a standard network and just recognition after you trained it So you want something a bit more lightweight the other thing is if you're training You want to actually match your model to what you're training for So if I find a model on Share trading Not good match for insects, but if I have something that is animals or Birds somewhere kind of like in that round it might actually be a good basis to actually cross train it and then use it for insects and Their mechanism in there and the tool set is getting actually really good to verify that In the back end we are using tensorflow because there's so much out there and Especially at the universities everyone is using at least New Zealand seems to use it so that was almost by the fault but then The tool set around that It's becoming really good We can actually visualize and kind of like get an insight in the data So if you remember back kind of where it was talking about okay, it's not the script programming anymore You can't hang your big bugger into this thing Well, this becomes your debugger this creates your insight and You can actually then tell is the machine learning a fit that is too close. It's like an exact match or Is it kind of like loose enough that it actually can pick up stuff that is kind of similar or not think of it as a puzzle you have all the pieces and Your actual image in the puzzle is Beautiful New Zealand landscape blue sky and month cook a lot of the pieces will be just blue But will be green grassy and then rocks and crevices and things like that so you can No, child you do puzzles, but I actually sort it just by color and go like okay all blue or rocky stuff grass and so on And then once you have it categorized that way you said okay, there's nothing much kind of in the skypaces I start with something that's easy to recognize can we can find a pattern and Then looking for these patterns Kind of training on that that's kind of what you can read out of these things as well What is it picking up on how does it match it and? After a few tries and back and forth we landed on this one Google's mobile net model it's out there everyone can download use it there's a lot of documentation around it and It is meant really as mobile first So there's researchers who came up with that they really had in mind okay This thing has to run on a mobile phone at the end of the day and in comparison to other Models we touched on it is actually really small So in terms of your actual footprint on the device it's insanely fast It I was really amazed kind of in terms of like how fast it can actually crunch an image that comes directly from your camera and Yes, there's a payoff in terms of accuracy Even with the same training and being careful you don't hitch this hit the same rate and some nets we were hitting kind of like up to 1992% and there we had a drop like 28% things like that So you have to trade off, but you have a couple of parameters in there We can actually do the trade off and as mobile phones become more powerful You can dial it without actually changing them too much and or pick another model And so far the experience has been really good with it and I can recommend you use or look into it so now with all this We have picked the model we want to train it we have input data. I showed you stuff from the lab Now we have to actually label it because we have to find our categories and we can be pretty sure We want a stink bug category or a nasty stink bug category, or I don't want it a New Zealand category and things like that If you then actually look into what's available and what we can feed in Doesn't work very well Biology itself had that problem of labeling specimen animals things like that plants for a long time and biology taxonomy is Important part of entomology and actually identify okay this species is exactly that if I identified it's Not the science where it's stable once something is identified Things actually got also moved around and you find that in the labeling as well. It's like oh, yeah in 1758 it was here, but now in 1982. It's here. That's actually the same Kind of like an action, but I don't know you actually belong here. You don't belong to the dogs you belong to the cats For New Zealand the NZOR is the reference So if you're making policy If you're importing exporting things like that everything is determined by kind of okay How is it categorized in there? And it's species set of interest to New Zealand, so it's not necessarily All the species in the zealand and that's it So for example brown marmorated stink bug is in there Because it's of interest and it's also not everything Not worldwide so it's really Specific and it gets extended all the time. I mean that's why I have the numbers there and come to the snapshot of like Hey, that's where it currently sits It's a lot, but we'll also find out this kind of okay there get their gaps kind of we need some Things that we can label through this mechanism and we want to do it automatically if we can So that's where we get our labels from that's kind of what we agreed on Yeah, we could don't call it stink bug. We call it by its scientific name So to give you a better taste for it because that's pretty well. I found it abstract I mean I Didn't finish biology at school. So skip that bit. So there had to go back and learn all about it. So let's look up our common house cat and It's an open interface. You can pull it anytime. You don't need an API key or anything It's really nicely documented. So if you go there and you just fire this request With and I picked here the scientific name Scartos you get an API response. It's a pretty long response I've had on the screen kind of would be really kind of like small but it contains a lot of information and Not only we're sitting in the taxonomy in the tree and I'm going to show you that in more detail in a moment It's also is it actually present in New Zealand? where is It's related to farm husbandries things like that as an out in the wild Where can I find more references things like that? It's all kept in there. So if you're interested in the topic it's well worth having a look and It's a nice resource So one of the things that cams out from the response is the actual classification and This is just one way of doing it if you look actually into about taxonomy. There are many ways of doing it For us what's really actually Important bit is the class Memo right guess that much the actual family and they have cats and then the species and Even if we can't get the species level if we have the genus or the family That's all really good because that gives us an idea. Where are we on the tree of life and is this of interest or not? Does it have to go in front of a scientist and needs investigation you have to set up parameter and do the whole Spell of okay. We have an inclusion and please don't take your grapes out of great land so keep these ones in mind because everything else is going to disappear again and How we do it with our infiltrator imagery So I've picked four target species They have interest. They're not the brown marmorated stink bugs But they can look pretty similar And some of them are harmful some of them are not and There's your taxonomy and Hopefully we can get down to the last level However, what we're actually interested in we're cutting everything off to the left a shield box Kind of from there on kind of dive down and our labeling is going to contain only things below that And we're not going to train on anything else. So we're specializing our network in our setup from there on down and pick our sources accordingly the environment lab that also looks after plant health is here in Auckland and They have a collection. They have actually a room with all kinds of specimen. It looks like a museum you have these big drawers you pull them out and Then you can pick out a green shield block You can actually look at them and the scientists literally kind of compare to the specimen or to Libraries and things like that for all the case files it gets dumped into a semi-structured file system and It was never geared for machine learning. So The first trying to have just saying like, oh, yeah, there will be a scientific name in there And we just draw everything and feed everything in just blew up horribly Didn't work and that's kind of where we found the dollar coins and where the labeling was wrong and things like that and It's not a huge set That we were looking at initially just over 18,000 images But still you don't want to do it all by hand so Python to the rescue we did some mapping and We used the API that I've shown you before plus. There are a few other APIs around So if that failed and and sore wasn't developed to provide us with the details or not a good match Then we could use other APIs. They also provide a score How precisely have you hit actually a name and with that because if you look at the Hierarchical path here You have different levels of the taxonomy Different hits we said like on a stick matter. All right. That's a high score high hit This one here. Well, yes, it'll find it, but it doesn't know what to do with this bit So it'll be a match, but not as a high score and then in combination of the whole path of all the scores You can say, okay, there's a pretty good match and you have your whole taxonomy tree And then we map it out fill the gaps and Find our label and then we rely more or less. Okay, this image this Species that's our label and that's how we built our training catalogue at the end of the day It wasn't enough initially our first run which was a bit more stringent had like a thousand images across Ten categories. It's good enough to give it a go But you land roughly Around a hundred images for each category, which is not huge input for much machine learning 500 would have been really good More than a thousand even better. And if you Google it It doesn't match really a lot of the stuff that comes back from Google is actually not right if you really have an expert looking at this I'd like. Oh, no, that's an insta three and it's not the species So we couldn't use that It's hard to get hold of despite different organizations initially like in Canada and Chile in Australia and so on the running kind of similar programs in the interest And now what we're diving into is DIY approach as that kind of we have collection We have the extra specimen take a camera and let's take plenty of images That's the next step. We haven't done that yet. So I'm really curious to see kind of what the result is there All right, what did we use? our office environment is Windows and line of business applications run in that environment So there's no way around it, but then Certain systems we have the Linux VMs out there and also on Amazon Just easy to get kind of the pre-prep machine. You don't have to do kind of the whole install. It's like Docker on demand just deploy it. It's all set up. It's a right there You can start using it and you don't have to fiddle with all do I have the right driver? For your graphics card is CUDA version matching my tensorflow and all these things and then also Mac OS especially From academia and because people will count them. So it has to go across everything and Python works really nice for every kind of under Linux and Windows It was always a bit of a mission what I've found on her from other people also And I con was really nice and actually made our job and transferring between these different environments in the cell much much easier So even having looked at that in terms of your package manager or you have to go between environments highly recommended Was really good And it actually behaves under Windows Especially we can compile and binary stuff like an umpire and whatnot It's done a lot and it works kind of for the architectures we had and then still Python 3.5 3.6 Latest version and TensorFlow itself works on that quite fine, but a lot of the stuff around it hasn't caught up And please yeah, no Python 2 It's really not necessary. So we haven't touched that TensorFlow for 1.4 is pretty recent and it came with a nice update in terms of like this hauling alluded a little bit to it in terms of the Roughing there's a tool tensor board that has been there for a bit but now Comes as part of the package no extra install and whatnot and it's almost like pip install TensorFlow and you're done That's how nice It has become After I said that it won't work anymore But so far was the experience Jupiter was really helpful Jupiter is an environment where not like an IDE you treat your code as a document and Some people really hate on it and some people really like it because you don't do proper programming in the Jupiter environment and Yes, I Wouldn't use it like for everything, but if you have to analysis if you have Investigating its exploratory if you have to exchange information with other people. We'll document it on fly and output it Really nice environment and it helped us a lot Pandas Which is in the next community also kind of like everywhere Was really nice to use and I almost used it as an in-memory database for tabular data and Made the whole job of kind of like handling your stuff easier and And then for the prototype bottle, which the micro web framework I Guess most of you will be familiar with it. The nice thing is it's just really literally one file And if you have like three pages nine points yourself, and that's it. It's really easy. That's really fast and Yeah, wasn't was nice to use Lots of other stuff around it, but especially for the data ingestion trawling through File systems downloading stuff from databases combining it matching it with a taxonomy API in all this lifting shifting filtering extracting transforming loading We did it all in Python and it worked beautifully it worked really really well, so that's what it currently looks like and It's not really mobile app. That's just kind of the browser window looking like a mobile app So we haven't done the part where you actually can deploy it as a native app and run it with a mobile net Back end on the mobile phone most likely Android But we haven't really figured that bit out yet, but to give a look and feel kind of we have this as a working thing we can actually go upload your image and Do that? Plung a few tags in there That crunch crunch and it comes back currently with a hit score. I mean doesn't look any beautiful anything but point eight seven percent, so it's pretty sure Not to those nasalis. That's that one and actually is I wouldn't show you an example. It didn't work right Yeah, and so you can go now upload the image and taking that out to our business and so swing because I could show that on the command line, right I Just run the shell script Pointed to the image spits out the numbers the same thing It's amazing how much more attention this gets because it's a web page So pro tip if you're showing something where you think like yet all her magic is just not running a background And you have one line of output Make a web page make it with colors Make this green Green is good. So Just wrap it up As I mentioned It was a lot of Python. I mean natively come with TensorFlow. We don't have a choice but also for The ETL getting the data and looking at images but not talking to API's It was a good experience and very little context-twitching Usually in our environment we go then between Java C sharp bit of PowerShell bit of bash and Yeah, it takes actually a toll Here I Was almost a monoculture, which doesn't sound good, but it actually really really helped The tooling and the frameworks and I mentioned that before They're getting really really good and there's so much help out there and so many tutorials Someone told me there is actually no good business model with machine learning right now apart from you providing courses Training people or your Nvidia and your selling GPUs. Everything else is not making money really yet Will change but yeah, there's a ton of good stuff out there and We can really Play and experience it at some point you just have to go home It is really helpful because you think like now this has to work And I guess it's kind of the shared experience here as well, but yeah sleep helps highly recommended So for that, I hope you learned a bit about stink bugs by security in New Zealand what we're trying to do with this project and Thank you very much