 Welcome everyone. If you're watching this now on Twitch, welcome being here. And if you are watching this later on YouTube, then thank you for watching it on YouTube as well. Um, and thank you guys for subscribing. Actually, I'm up to 225 YouTube subscribers. Never thought that that would be possible by just recording live streams and putting them on YouTube. But thank you so much. Um, so for today, we have three very different topics and I just decided to smash them into one, um, because we have, uh, three hours anyway. Um, so the overview for today is first we will do camera trap image analysis and that will be presented by our guest speaker, Aimee Freiberg. Aimee is a former bachelor student of mine and she now is at the University of Bern. Um, so she will be talking about her master project. So, um, ask a lot of questions, um, because that just makes it more fun for her as well. So let me move then to the glitter room for today. So, um, camera trap image analysis, I will be talking about standards for analysis. And then in the end, the third hour will be me teaching you guys how to make an R package. Um, I have two more announcements to make. The first one is the 17th of February. There will be the exam. So this date is final. I actually have a second date as well, um, which will be on the 7th of April. Um, I think the first date is an Agnes. The second date is not fixed yet. So I also don't think that that's an Agnes yet. Um, and besides that, I have to do a mea culpa, mea maxima culpa because last week, I forgot to upload the assignments. Um, that was stupid of me to not upload them. I am sorry. I'm really, really sorry. So to make up for you guys this week, there will be no assignment, but it will be the assignment of last week. So in the end, you win some, you lose some, um, but, um, it just means that the lecture for today will not come with any assignments. Good. So I'm really sorry about that. I don't think anyone noticed because I didn't get an email saying, where are the assignments? Um, or people just thought no assignments, woohoo, which is also fine. Um, but yeah, please do the assignments or at least look at them and, uh, then we'll be good. All right. So our invited speaker. So let me actually put a mea on. So hello, a mea. Welcome. Welcome. Let me also show your presentation. Hi. And then the floor is yours. Okay. I will move you up a little bit. I will move you up a little bit. Okay. You do your thing on the stream and I'll just do my presentation. So just before, if you have any questions or something, you can write them in the chat. I'll try to read it, but I think that my internet is lagging a little bit, so I might read it later, but any questions are welcome. I'm going to present to you my master thesis where I decided to do camera track image analysis using machine learning and what that exactly means. Uh, I will explain to you. So next slide. Before Danny already said a little bit, but, uh, currently I'm doing a bioinformatics and computational biology masters at BAM. Um, and before that I did my biology bachelors, uh, at the home of university where Danny taught me how to code and where I decided I want to do more programming because it's a lot of fun. And I finished up all my exams and now the only thing left is my master thesis and I kind of am doing it in the topic on conservation ecology. Next, please. The grand picture of the project is that we have kind of like a partner in Central Africa, which, um, does his project, who does this project in the Chinco Nature Reserve. And the goal there is to do environmental conversation. And, uh, by that I mean that we want to detect which species are there, how many of which species are there, and what their habits are. So, um, where in which habitats are they, when do they sleep, when do they eat, do they occur in groups, these kind of habits are overall pretty difficult to detect because when humans are there, animals are not going to come out and do their normal thing. But especially in conservation, we're always interested in the shy species of the species that are really hard to detect and the species that are super rare because these are the ones we need to, um, conserve the most. And to conserve something we need to know what's missing or how to help the animal. Um, very special in this, uh, reservation, which will be interesting later in the image analysis is that it has very different landscapes. So we have a little bit of jungle, we have a little bit of savanna, we have wetlands. So we have all types of habitats that are possible, which means that also the images will be very different from each other, even though the reserve wasn't actually that big. Next, please. So before I talk about all the technical stuff, you might not know what a camera trap is. A camera trap is pretty much, just this little box where a camera is in there and there are different techniques on how it takes images, but ours is just a motion sensor. So anytime something moves by it, it will take a series of 10 images and save those. And we have around a hundred cameras cameras in the reservation and, uh, there are different locations. We have the coordinates of them and they just drive out, set them up and then they leave them alone for about six months because that's how long the batteries last. And when the batteries are dying, we pick them up again and then we wait until next year. And these six months, we call it around season and we produce around a hundred thousand to 200,000 images per season, which you can imagine it's a lot of images and a lot of data for a person to look through every year. Next, please. This is one of the images that we take. So you can see there is an animal on there. Um, it's daytime, it's in the jungle, but it's a little bit camouflaged. Um, this is generally the type of images you get. Of course, there are different backgrounds, different habitats, different animals. Also overall for this presentation, sometimes the images are going to be super hard to like see some things of them. So having maximum brightness on your desktop will be helpful for some part of this presentation. Um, okay. Next, please. Yeah. This is just another one. This is a night-time picture, same camera, same location, just a little different animal, just so you see, uh, what we all capture. Next one. Sorry. Oh God. I just covered my own notes. Um, that's fine. That's fine. This is a technical difficulty. Now I just, uh, I, I want to be small again. I can't be small. Uh, okay. Um, so the overall, it put me in grid mode and now all my notes are covered and I can't see it anymore. I can't remove it. Um, but I'm going to read from this, from the side, but I might look to the left now. Sorry. Um, I don't know what I did. I can just flip you if you want. Yeah. Okay. So, um, this is a super big project that I'm working on. I'm not the only one that's working on it. Um, I'm just, you know, a little wheel of the whole machine, but the overall idea of what I am doing to analyze the images to reduce work of effort and financial strain. Um, by that I mean is that theoretically at the moment we would have to hire someone, uh, full time to look at these images just all day, every day, look at these images and tell us what's on them. And it's a pretty simple task for people. Um, but for computers and difficult tasks, but overall this is very expensive to hire a person full time just to look at images. Um, so that's why we kind of want to reduce the effort to actually get the data out of these images, which we can do by automating the process and having the computer do most of the work for us or even just like part of the work. Um, that's the most tedious. And for this, I plan to make a pipeline so have different steps after each other to analyze these images and use several different machine learning models to that have different functions to do so. And the future best hope would be that these models could be loaded onto the camera traps themselves and then the images could be analyzed real time and the data could be sent real time. So we have real time information of what is happening in this reserve without having to be there because now we still haven't analyzed images from about six years ago. And, you know, time is very urgent with actually doing something and seeing of measures work. So having current data on what is happening right now is pretty crucial to actually do something efficient. So that is the best hope for the goal of this project and we'll see if or when or how we can achieve it. Yeah, next one doesn't change. I mean, one second. I'm just going to retry myself. Oh, God, I've never used Skype. Okay. So the overall question is how we can deal with such large amount of complex data because overall the task seems really easy for people, but it's actually a pretty difficult task for computers. Next, please. So we have the usual analysis of data, which is that you have your normal data and then you say what your computer should do with it. For example, you tell it to calculate the median of your data. So you tell it, I give you the data and you have to this and then the computer tells you what the median is. So it gives you the result. And machine learning has a little bit of a different setup. Next, please. I switched it. Sorry, I have to look at the three minute lag. Yeah, I know. It was like a two to three second lag, which is always a little bit annoying. So with machine learning is that we give it the data, for example, the image as you saw before with the burden on it, and we tell it that there's a burden on it. So we already give it the result. And when we give those two informations to the computer, it has the task to learn on how to figure out to get the right result. So we're not telling it, you have to do ABC to achieve this result. We're telling it, I will tell you what the answer has to be and you have to figure out how to get this answer correctly. So that's what we mean with like machine learning and all that stuff. So it has to learn how to get the correct answer. This is just an overview of a neural network. I'm not expecting you to completely understand this or have to, you know, explain this to anyone else. I just wanted to use you to see it once. We have these little dots which we call nodes. So you can equate it to kind of neurons in your brain and they're connected to each other. And there are three main parts of a neural network. There's the input layer and you have all these different nodes in the input layer. And you can, you have, for example, when you put an image into your neural network that each node will get one pixel. So you have a certain amount of pixel and those are the number of input nodes that you will get. Then you have the hidden layers. And these are the layers where the neural network will adjust things and kind of test out and learn how it can get the right results. And this is kind of something where you have no result. The computer will do this himself or themselves. And which is, I guess, a plus because you have to do this but also a minus because you have no control over what is happening and even if it's working well, you don't really know what happened to make it work that well. But generally, this we call the hidden layers because you have no influence. You do nothing. You can decide on what the layers are but you cannot decide on what is happening within them. And the last layer is called the output layer. So that's your result you get. And for example, you have the result option. The image you have an animal and you have a car and you have a house. So those are four options. Then you would have four output nodes. That's the general outlay of a neural network. Next, please. Of course, there are different type of neural networks and the one we use for images are called convolutional neural networks. And with these, we have the standard input layer and we have a convolutional layer. And what that means is that we will conjure the image so we will put a filter on it. A filter is useful because it can highlight things that you generally wouldn't see that well. For example, you can put on a filter that highlights the edges. So for example, when something goes from really bright to really dark, there's usually like an edge between them. If you put a certain filter on it, you can highlight that edge. And sometimes this information is really interesting. And there are many, many different filters that can be used and this is kind of what the computer figures out. It's like, oh, which filter do I have to put on to see the differences that I need to see? That's what the convolutional layer does and it is always paired with the cooling layer, which is just that we merge pixels together. And then you can repeat this. So you always put the convolution layer and put it together. And of this double layer you can have one, two, three. Yeah, three is a pretty common number, but you can have even more and the more you have the deeper your network will get. And then you have a few other layers which are pretty common to other types of analysis, not just images, and your standard output layer. But this is the special part about images. Yeah, then we put filters on them. Of course, I always talk about machine learning and the question is, okay, what is exactly learning or what do we need to do? So to learn, we have to train it. And we have a full data set and we call it a full data set. The data that we have labeled. So the data that we have information on that somebody has looked at. For example, before we had that image with the burn and I give it the image and I tell it it's a burn. So this would be called labeled data. And we have this full data set and we take only part of it as a training set and part of it as a testing set. Because we always want to have some data left to see how well is my network actually doing. If I give it random data it hasn't seen before on a random map, but if I give it similar data but it hasn't trained on it will it still do well. That's why we need to keep back a testing set to have to see the performance. There's something also called a validation set, which is part of the training set. And it's linked to something that happens a lot in machine learning, which is called overfitting. If you've ever done some things with statistics, you know you have like this graph with the random dots and you have the options to either connect all the dots with a line to make like a squiggly line or you could make a straight line through them, which might not get all the lines but it would have like an average between all the lines pretty well. So you make like a linear fit. And the squiggly line will work best with the training data because you have perfect, you get all the points that's perfect. But if you would add more dots that are not within the data that you've trained on, it might be worse than just putting a straight line through it. So if we have perfect fit, it might do worse on data hasn't seen before. And that's why we try to avoid overfitting data. So we want it to do well, but we don't want it to just be like, oh, I know this training set perfectly, but anything that isn't exactly what I've trained on, I will not be able to do because I'm too focused on just my training data. And the graph on the right, you can see that we see how strong it's overfitting with the validation set. Yeah, please next. Okay. With all the models, there are many different ones and it's a topic that is rapidly evolving. Even if you have done something two years ago, you might want to redo it today because there are more models than currently are improving. So the current top one accuracy is around 90.88% of the best model. And top one accuracy is if you show it the bird image and it tells you the first guess, my best guess is a bird, it does that with 90.88% accuracy. And the top five accuracy is the top five guesses. So somewhere in the top five guesses that my network will give me, it will have the right answer in there 98.8% of the time. Just to show you how strongly networks are developing, even in the last years that the 12 best neural networks have just been released in the last year. So even as I said two years ago, these networks are pretty much don't use them much anymore because there are so many new ones that we can implement that just have better performance. Of course, all these models have been used on like a standard image set. They're pretty simple images of like a shirt or a face or something. And they're generally a lot easier than camera traffic images, which you saw with a lot of leaves. Animals are kind of camouflaged. They're not always straight in the middle. But the best results we get with machine learning from camera traffic images analysis is around 86%. And just so you have a reference, humans get around 96% accuracy. Like if you trade a volunteer they will get around 96% accuracy. So this is what we're trying to achieve. So we're around 10% away from what we want to get. So the question is why are we missing these 10% and there are many reasons why machine learning models have difficulties. And one of the big ones is that animals are often camouflaged and hard to detect. And I gave an example image and here you can see why it hasn't shown up for me yet. I can't even see in a main image but here you can see the animal in the back. I think it's a small deer and it's pretty similar to the background. So I have it on my left pretty small and if I wouldn't know I think it's a miniature deer that's what it's called. I think the animal is here, right? Yes. This little black spot in the background. So you see if you don't as I said put your computer up to full brightness it might help you to detect the little deer. But this is often what we're dealing with especially these twilight moments where the night camera hasn't really activated but the daylight is also pretty weak. That's when these animals really blend in and we humans have difficulties and the computer has kind of the same difficulty of detecting this animal. It's also that sometimes animals walk right in front of the camera and sometimes they're really nice in the center and sometimes they're super far in the back. As you can see in the next picture you have these two antelopes and the one you can see is kind of on the side and in the back and the other one is nice straight in the middle. So the computer has to understand that this is also the same animal that size is not really the same and the color is not the same but it's kind of the same species and so even doing the same thing doesn't work on all images because the images themselves even though the same thing is depicted can be very different and this is a task that the computer also has to overcome. Next. Sometimes animals are pretty quick and we take 10 images of each animal so sometimes there might be a motion detection and in two of the 10 images the animal is just like with a tail or sometimes just a snout or it's really at the bottom of the image or as you can see on the picture that I provided the animal is just super close you know it's kind of blurry I don't know it looks like some kind of antelope if it has some kind of coloring that would differentiate it from other antelopes I don't know what the exact size is it's really hard to determine so the question is how even if a person was like I wouldn't be able to get this exact species how would the computer distinguish it so these problems we also have as I said before day and night images are also really different and I show one with the same background so this is the exact same camera just day and night and it also highlights really different things so there you can see the light in the front and on the night images you can kind of see the trees much more and also with animals they look pretty different day and night so if you try to and the day night at the night images especially also are like in white or gray scale so we can't depend on just knowing using color differences within animal species to distinguish them also animals often occur in groups and sometimes they're right behind each other or they're 10 in a row and sometimes you only see one horn and even for you personally sometimes I know an image where there was like a large herd of bison or something and they were just like it was just pretty much legs it was pretty like 30 legs I don't know how many belong to actual animals or full animals so it was pretty hard and in the image after you can see these two pigs that walk behind each other and for us this one is pretty easy to distinguish but the computer could just think it's like a pretty long pig that's the it's just a really long pig so it sometimes has a hard time to say if animals are overlapping if that's one or two animals and we have something that's also happening which is especially in our image especially in our images this is a big problem which is called background bias I'm gonna explain to you in the images so somebody made a test and they kind of wanted to see what does the machine learning model look at and red is what it really focuses on and blue is what it doesn't really care about and as you can see it really cares about the trees in the back and it really doesn't care about the animal and computers are not that smart they have this big picture they're like I don't know what to look at I don't know what's interesting in this picture I will focus in on the animal and of course this is interesting to me but the computer doesn't know pretty much each pixel could hold in the beginning equal value and there's so we have um yeah no I'm not ready yet you're too fast so we have images with different background for example here we have a little bit of grass and a little bit trees in the background and sometimes we have jungle as you saw before and some animals occur in some habitats much more than others for example you will rarely see a zebra in the jungle um and you will rarely see specific type of monkeys in the savanna or the desert so the computer might learn oh anytime I have this background there's a zebra or like 80% of the time so it's going to learn okay this background means that there's a zebra out there which is of course what we wanted to learn or sometimes things like oh this tree there's a giraffe on the image but it's the tree that's the giraffe and there might be a different picture that has some other animal on it and it's like oh the tree is on there that's a giraffe and that's what we call background bias it learns the background based on the results but it's not looking at the images that solve so it bases the results on the environment and it happens a lot as you can see here this could be like okay there's an antelope there and I see trees so anytime these trees are there that's an antelope and that of course is not the case next one so here you can see as well you have the same the exact same background two different types of animals but if we see one of the animals much more than the other one it might think okay this is the identical background it's always this one animal and who really want to avoid this because this is one of the main problems we're actually facing next one please yeah there are also the problem which people have as well that species are super similar that maybe they're not really different in shape but the only difference that they have might be in some color like some birds they might have different chest colors but the overall shape and size is super similar so how could we differentiate these birds with night images which just have gray scale so sometimes it might not even be possible even for people so that's also a problem that we have and the biggest problem that there is which is the whole reason why we're trying to get computers to solve it is that we don't have enough training there so we don't have enough images that have the labels on them and to train machine learning models you need a lot of labels we currently are working with around 124 gigabytes of labeled data and that's I think 100,000 to like 150,000 images and somebody had to look at them all and especially if you're in a small project and you don't have a lot of people working on it doing this type of work is really difficult and the really big models that are currently happening I think they're even trying to have like half a million images or something and that makes it super difficult to train a good model if you don't have the training data and it's kind of a conundrum because you want to train the model to not have to look at the images but to train the model well you will have to look at a lot of images so there are also some things some models that are trying to work on solving this problem as well to need less training data next please okay so I started my project and I was thinking about what do I want to do what do I need to do to do this to classify these animals and there are essentially two tasks to classify an animal which is one is seeing if there is an animal or not and the other one is if there is an animal what is the animal and the first step we do is just making camera trap images smaller because having smaller size just makes things usually run faster and for example I'm trying to currently do a test with about like these 100 gigabyte images and I calculated this going to take me about 12 days so the smaller the size the faster your program is going to run so you really want to be as time efficient as possible because it's going to overall already take you a long time then the second task was to see if there is an animal or not and there I had two options one is called animal spotter and the second one is called mega detector animal spotter is a program that is a statistical program that runs with mark of chain models and mega detector is a machine learning model so a neural network so mark of chain is also machine learning but mega detector is a neural network and I wanted to see which one is better and by the way if you're interested in this at all mega detector is made by google and you can just download it on your own laptop it takes about if you know how to use the terminal it will take you about 10 minutes to set up so if you want to analyze your own images with mega detector it's pretty easy and I did some tests with mega detector and I found it to be a lot better much much faster I think the next slide I prepared should show us this yes so mega detector was much faster than the other program it has one time I did a test and the other program animal spotter took around 7 hours and mega detector took around 1.5 hours so a lot less time which is really important when you have to do a lot of images and even though I resized the images it didn't really make any difference for the mega detector program but for the other program it was much faster and I also checked how good actually is the mega detector because it's a model that wasn't trained by me so it was trained by google on all kinds of images and mega detector has the options that it can detect empty images but also different categories and what could be on them and I will get into this in a second but what I found is that it's really good at seeing if an image is empty so the more empty images you have the higher your accuracy is going to be and in the 92% accuracy I had when around 30% of the images had animals on there and the 99.8% accuracy I had when I think 0.04% of the images actually had animals on them and with this you also see for example if a person looks through it and wants to see if there's an animal on it how much kind of unnecessary work they do so if only 0.04% of the images actually have animals often the question is why do we have to look at the other ones as well so even just using that one step could make the whole process a lot faster of course we're also missing a few percent so the question is what kind of mistakes does it do and it always does the same mistakes as I showed you before in the beginning the little deer that was super hard to detect the model really struggles with that as well it's pretty much as bad as we are with this and there are always options that we can retrain models so I could say okay the model is pretty good but I want to retrain part of it so it's really specified to our problem so there are still options to improve the model if I want to have higher accuracy but that's something I just didn't get to yet we'll see how it goes when I finish up and if I think oh I need to tweet some things I might go back to this model and just try to improve it a huge advantage of the mega detector is that it tells you what was found so we have the options that it finds empty images it finds animals or it finds humans or it finds vehicles vehicles in this moment is not really that interesting for us but detecting animals and humans is really what we're interested in because there are a lot of privacy laws and maybe not in our case but for example in Switzerland we try to monitor links and a lot of these cameras are located on hiking trails and of course we can't just blast these images of people everywhere because that's just against the law so if we because this detector gives us the category we can just filter out anything that is not an animal this is important for us and also it gives us less work to do because we're very specific on what images we want to look at and what images we just don't care about actually that it also gives us the confidence level of what it has detected so it tells us I found something but it might not be sure at all so it says okay I found something and I think it's an animal but I'm 10% sure or it's like okay it's an animal no doubt and this is important because under a certain confidence threshold it's not that good I would say if it has 10% confidence in what it found I wouldn't trust it and with this we don't have to take everything it gives us but we can find out how sure does the model have to be for us to actually trust it and we can filter it by that and the last one is that it doesn't just tell us that it found an animal but also where it found it and it draws a little rectangle on it which is really nice and we also can use this rectangle to just cut out the animal which will come handy later but next slide please so here you can see the little animal and this is pretty much if we visualize the output of the mega detector this is what we get so we get the little face and we have a little red square around it which is the color for animals and then it's written I think that there is an animal on there and they're super sure it says it's 100% accuracy it's 100% sure that this is the animal looking into the camera chip but just so you can see what the detector actually does with the images and I would say it's pretty good it's pretty close it's cutting it pretty close to the ears and also top of the ears so it's not just it doesn't have a lot of room around the animals or it doesn't cut anything off so it's pretty good at actually precisely locating the animal itself next please yeah so I thought we're going to get a little bit interactive have a little bit fun and the question is are you better than mega detector and we're going to say that we're going to look at the 92% accuracy images and I just want you to write down on which images you see an animal and on which images you think it's empty and then after that I'm going to tell you if there is an animal on there or not and then you can see so if I have 10 images if you get 9 like 10 right you're approximately like you're a little bit worse but you can't achieve 92% since I didn't put that many images in there yeah so get a pen and paper ready like take a pen and paper and how long did you want to show them the images yeah so I will show the images for 5 seconds because that's how long the mega detector takes approximately per image good so get the pen and paper ready just write like numbers 1 to 10 already on your sheet of paper right and then you're just going to have in 5 seconds decide animal yes or no right and I would suggest as I said before put it on maximum brightness because sometimes it's kind of easy but sometimes it's not easy at all it is really hard it is really hard it's really hard and I gave you most of the like some of the images I show you the detector itself also got wrong so I wanted to make it difficult for you good so turn on chat if you're ready so we know that at least one of the persons is doing it so I guess Misha is the only one still listening but just throw in chat when you have your pen and paper ready I have my pen and paper ready so I'll be joining you guys I haven't done it like I we quickly scrolled through it yesterday but I haven't classified them so I don't know either so I'll just have the numbers like 1 2 3 4 5 6 7 8 9 and 10 and alright we get it ready we get a ha ha perfectly fine alright so then we are going to start right and I'm just gonna when the first image I'm just gonna count to 5 and then we're gonna switch to the next one right that's how we're gonna do it good alright first image 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 no 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 3 2 1 and we're done good so how many images did have an animal on there yeah how many did you see you missed you missed one image or you missed one animal just just tell us how like you think 5 alright so Misha thinks 5 any other images from the audience anyone else think they saw more or less alright Xanaxin says that she thinks she saw 3 and Bacon you were doing it as well how many did you think you saw it takes a little while right people have to type and people have to yeah we're not in the hurry I think alright Bacon also thinks 3 animals I think I spotted more I think I spotted more I think I spotted like 5 or like think I think Misha's closest I think but I have a big screen right so that's that's one of my alright so take it away next slide right yep next slide is the answer on it and it's 7 7 images had animals on them so yeah some of them were super hard to detect and I'm going through go through them again and I made little red boxes around it and if you didn't see it don't worry I really zoomed in and took my time to actually look at them so don't stress that you didn't see it alright this one was pretty easy yeah it's a little bit hidden but like the ears give it away yeah I mean yeah I was like oh we're gonna start with the ok one it's not the easiest but it's gonna show but um yeah the next one is going really slow the next one was empty yeah second one is empty the third one should also be empty I think yeah empty that's true that's true although I thought that in the middle here this thing here but it's probably just kind of an artifact but yeah it looked like a big chicken but that's actually pretty interesting because this is just a big root and this is something the the model routinely thinks is an animal with 100% accuracy it's very determined that this stupid root is an animal and all the mistakes it makes of thinking there's an animal where there isn't is because of this single root but it's the same what humans do right if you see a garden hose the first thing that you think is a snake so that's kind of the way but yeah it's really convinced that this is yeah I would have misclassified this one alright fourth one yeah this one super super sneaky this one's super super sneaky because this one is super mean it's in the bottom left and it's barely on there you can see the little tail slinking around but this is actually something the model detected so it did see it I personally for this one I placed the little squares around it but the model did detect it so if you saw this that's good this one I haven't seen at all like for me this is just a black picture even on like full brightness like I actually thought about taking this one out of your presentation putting it in paint and then just inverting the colors the question from Misha actually can you tell the code that if it's on the same position on all 10 pictures it's not an animal well the problem is that there is no code pretty much you know we don't really program anything we make a model that's kind of what the annoying thing about it is because we just give it the images and we give it the result we don't tell it what to do it does it itself so that's kind of the yeah we can't like we know it always does this wrong but we don't really we can't reach into the code of the model itself yeah so machine learning without code yeah in theory it is right you just have neurons that kind of see images and then they adjust themselves and then they give you an answer right so you can't go into the neuron and fix some things yeah you can't adjust so it's pretty much so generally the way it learns is so you give it for example 10 images for the results and the nodes have some like preset it's like okay we just kind of try random things in the beginning and then it's gonna go through the nodes and then it's gonna give out the result that a thing is happening and then it's gonna compare it to the results we gave it and then it's like oh no I didn't really get I have to change something and there are different techniques on how you see what you did wrong but then after this it's like okay I have to change what is happening in these nodes and this is actually the learning part and that's how it knows if there's something or not and as I said before because I know these problems happen all the time and the model is not trained in our data we have these all these we call hidden layers and the upper layers are for the big big things like the big outlines or big shapes and the deeper layers are for the small details and it is possible to not have to retrain the whole model but just some layers for example I take two layers and retrain the details with my data so it's like okay the general shape of animals is always the same the general color of the animals are generally the same but maybe the snap color is a little bit different or like a few pixels of difference and this we can adjust by retraining it's called retraining the deeper layers that's what you call it but this is how you can improve this problem but you can't make these but there's not really code you can you can adjust the only thing you can do is train it and train it better that's how you get your model to be better yeah but yeah this one is for me like it's black no code in the world is going to make this visible for me yeah I don't know I looked at it on my computer I did full zoom in and there is it's also a miniature deer that's on there a lot of miniature deer in the area there yeah alright this one was easy I think everyone like spotted this one right like there's a porcupine and the eyes light up I think that that's one of these telling giveaways is that like animals in like these like nighttime pictures you can see the eyes very clearly alright the number 7 I didn't spot that one it's a monkey right no it's it's a bison a what? a buffalo it's the hind legs of the buffalo like it's the butt to the back it's a buffalo alright it's a buffalo butt yeah but there you can see the good example sometimes animals are super hidden and they're in the background and they're pretty small and they're kind of similar color these are hard to see definitely alright picture number 8 was empty oh picture number 9 I didn't see that one either ah yeah that's also a miniature deer just hiding in the bushes actually in my first run I didn't see it but then I looked through the results of the mega detector only then did I see it so isn't that overfitting in a way like the computer telling you there's an animal and then you're going to kind of zoom in and look because in the end the original image was labeled as like no animal then by you well um I mean yeah I'm sure that you can see that there's an animal especially if you compare it to other images of the same type yeah probably image number 10 again for me it's just a big black image yeah even when I zoomed in honestly this is like a 50-50 I might just be hallucinating that there's something on there but I believe there's an antelope in the front and you can see I think the front leg a little bit and also behind the hoofs you can see kind of but um if you miss that I can completely understand for me it's just a black image like I'm like see any of the resolution but yeah I just wanted to give you I see a red square yeah there's definitely a red square there but that's no horror detecting so but I just want to see to show you kind of what kind of images we deal with and also the problems that people face with making labels because of course if we train a model it ideally make it perfect kind of get 100% accuracy but like how do you get 100% accuracy of the stuff you train it on it's not 100% accurate so these images you're like I don't know if there's something on there or not I can't tell you and if I can't tell the machine what the answer is the machine can't really give me the answer I also can't check if it's right or wrong because I personally don't know so we see these are kind of struggles we face so I hope that you had a little bit of fun loosen it up today and I'm going to continue with the super technical stuff of everything um so after we do the whole detection stuff as I said before we want to sort the images because sometimes there is 50% of the images have animals on them and sometimes really really tiny part have animals on them and because the steps after will be categorizing the animals we don't really need an empty images we definitely don't want any humans on the images so we separate it and if you imagine you have 100,000 images and only 1% actually has animals on them we have a lot less work in front of us so that's why we try to isolate only the animal and just next please sorry the lag is happening um after this is where my kind of testing phase of my master thesis coming in so if we follow the middle line which goes straight to the last box that's calling, that's the training the image classification model I'm going to go into it a little bit later on what it is and why I chose it but the standard way of doing it is that you have your image, your full image and you just give it to the model to train and analyze but as I said before we face huge problems with background bias and one way I thought we could eliminate is just to remove the background and that's my mega detector so handy because it makes the box around the animal as you saw before it does it pretty well and we can just cut out the box with the animal on it so we remove about like 80 or 90% depending on how close the animal is to the camera of the background which makes the probability of background bias much lower is what I'm guessing that's my estimation so that's the lower pathway would be cutting out the animals with the rectangle and then train the image classification models on these cut out rectangles and there also have been other tests and it does seem to improve it so I'm hoping that also with our case especially because we have very many many different and also very like obscure and a lot of fleas and things happening in the background and the stupid root that it does a little bit better when we cut out the animals and there could be a third step so the question is is cutting out the animals with just the rectangle enough and we followed the upper path where we have cutting the animal out with the polygon and the polygon is pretty much you know the exact outline of the animals and v7 is also a neural network model and it's really good at making precise outlines of animals but it's not good at doing it when the image is super big and there are a lot of things on it so you kind of have to tell this model where the animal is approximately and that's why I put the step of cutting out the animals with the rectangles before it so make a detector will kind of make the overall analysis of the image to see oh this is where the animal is approximately and then v7 will be like okay I can make the outline of this animal and I personally am interested in if cutting out the exact shape of the animal so really having no background is helpful or is a little bit of an overstep is having very little background with just the rectangle does it really like does the machine model really care or what cutting out the exact shape of the model will be a good step to do as well and I don't know what the answer is, I haven't gotten that far yet as you can see it's not implemented yet but that's kind of my path of my master thesis that I'm really interested in seeing like what steps can we do before so we can help the model be better maybe not improving the model itself but making the things that the model has to do with these kind of hurdles to jump over kind of removing some of the mess yeah and the model I want to train is made up pseudo labels and it's a little bit of a weird name but it will hopefully make sense in a second when I explain it to you and why I chose it I think that's the next slide if I'm right yes so this is you can ignore the image above for a second I will get to it later but the accuracy of made up pseudo labels with these not with camera trap images but with these a lot easier test images is 90.2% accuracy as I said the top accuracy that we currently have is 90.88% so the question is why do I not take the best model there is well because models can become super super big and the size of the model is usually determined by parameters and as I said you have these little nodes and they adjust themselves to make like better filters and within these nodes you have these parameters and the biggest or the best model has about 25 million of these parameters which makes it a huge model and this one is still pretty big but it only has 5 million and as I said at the very beginning the ultimate dream goal is to load these models onto the camera traps themselves and onto the camera traps themselves and have a real time analysis of these images but if you have huge models that are just too big to load on small mobile camera traps then this is not possible so you always have to keep in mind kind of the restrictions you have for example what about improving on the sensor side for example add a microphone array I don't really know what microphone array is so I have a microphone right animals make sounds so you take 10 photos and you record like a 10 second sound clip at the same time you see something which looks like an owl and you hear so use multiple data sources together yeah I mean generally I think this could be an idea but so from where I'm sitting I think one a lot of animals that we look at don't make sounds the miniature deer doesn't make a sound like most of the time they don't really make sounds not that not that so they don't make that many sounds or not that often the question is if an animal that's not on the camera but pretty close makes a sound would that be linked to the wrong animal you know there's like you know what happened at the same time and you see a deer in front of it but as you said there's an owl in the back and then we have an owl sound at the same time as we have the deer on the image so it might actually cause more confusion yeah it might be because then everything would be classified as a cricket because there's always cricket sounds in the background but updating the updating the the image so instead of using a certain megapixel just boost it by ten fold will that help or will that actually hurt because of course the images will start getting bigger but the amount of detail on the images will of course be more contrary to popular relief having better resolution images doesn't actually increase performance sometimes reducing the size of the images increases performance that might be counter intuitive but often for example the animal sputter program I talked about in the beginning it gets better even result wise when we have less pixel so that's good to know for the UFO field like all of these shaky like cell phone camera UFO pictures that at least that's the right that's what we want to have we don't want to have high definition pictures we just want to have like shaky iPhone pictures and uh I think what we could improve I think with the cameras the biggest problem is like this twilight sound because these are the images nobody can detect because it's super dark but the night mode hasn't turned on yet so this sensor is really bad and the problem the camera has is also that it kind of gets worse once the battery gets really low but I mean getting new cameras investing in new cameras better cameras is a huge expense and it won't solve you with the hundreds of thousands of images that you already have yeah and overall kind of only having something that works with the best possible machines is kind of a very limiting success you know I personally would see that the better your model performs even though your data is bad the better your model is because it's not feasible that every project has the best stuff there is um did you ever think of building your own camera traps um on base and so I don't know what are you doing it's like a raspberry pied or the like the little um so I have nothing to do with the camera traps this is a project that has been running for I think six years and it's a guy that used to work with our university um I don't know I think that once invested in these cameras I'm not I don't see that they're going to change them very soon because I don't know how much camera traps actually um cost they're probably not cheap if they run for six months continuous like um they don't do wi-fi things right so that the cameras are all stored on like an SD card on the camera yeah someone has to fly there get the SD card out put anyone in change the battery in and yeah I mean all these things are theoretically possible but that's not at all what might feel this my plan is I get data and whatever data is I will try to do my best with you know of course these things are possible increasing the camera traps but mine is like I want to achieve the best possible result with what I was given you know it's it's kind of like a cheap way out from in person to say like I can't do it your camera trap wasn't good enough you know my plan is even if your data is shitty I want to be able to give you something that's reasonably good and that's why I'm more focusing on the side of what can I do with what I already have um to improve what's happening yeah I think that's a good answer Misha guesses that they're around 50 euros a piece okay well we have 100 at least so 10 to 100 yeah so that's like 5000 euros of investment if they cost 50 euros because I think that 50 euros is like pretty cheap for a camera trap yeah so I'm just going to go back to the model itself because I think I'm already over time I'm just going to explain this and then I'm going to be almost done so oh you're okay with time like you can you can talk as long as you want that like it doesn't matter to me I shortened already the whole presentation for myself today so what the real stuff right like this doesn't go on the exam so it's like I think it's the nice extra for students so if you want to talk for another 30 minutes it's fine um then we just do that and we just kind of I'm just here for entertainment um okay so the meta pseudo labels is something um that's not really a model in itself but it's two models and we have one model that's called the teacher and one model that's called the student and now you can look at the little image that I have and in the upper row um you can see that we have the teacher yellow and there we have the training data on the left to it so the training data is the data that we have given labels to that we know what's on there we have written that there's a bird or an antelope or a tiny little deer so this is the label data and then we do the standard stuff as I said before we give the images and then we give the results to the model and then it learns and once it learns the steps come a little bit different so then it produces something that's called pseudo labels and there I made another slide please um I don't see it yet but I'm just going to start talking so we have little um um for example we have this image of a little dog and on the left side on the bottom you can see human labeled data so if I were to say what is on this image I would say 100% it's a dog so I'm fully certain and it's not at all a house it's not at all a wolf any other options are wrong just 100% wrong and this one option is 100% right um but the pseudo label data sees things a little bit different and it's not just interesting on interested in if the answer is right but kind of like how close it is to actually the real answer for example here you see that the right answer is still a dog with 99% uncertainty but uh a wolf is also kind of close right if we have the house and the flower and the wolf anybody would agree that if you mistake the dog with something if you mistake it with the wolf you would be more correct than mistaking it with the flower and that's kind of the idea that you kind of maybe like reward your model by saying you're wrong but you're also kind of close or you're also super far away so it's not just like you either have wrong or right but you also have a gradient in between you were you detected something that was closer to the right answer than something else and this is um a little bit easier for it to learn so that's why we call it pseudo labels and that's also why the one was called made up pseudo label data and I think you have to go to those slide before then this one you have this one okay so the next one you see the pseudo label data which means um and here comes the real benefit of this model pseudo label data is data that we give to the model so images but we haven't looked at it so this is uh any any so we have around two terabytes of data and I could give around like I have 100 gigabytes of label data and I could give an additional 600 gigabytes of unlabeled data where this teacher model gives it pseudo labels so as you see we already have 700 gigabytes of data used for training in comparison to before where we only had one gigabyte of training data and as I said before training data is often a limiting factor so this is uh one of the advantages of having this made up pseudo labels um so this pseudo label data is then used to teach a student model this is a different model um independent of the teacher model and it's not trained on the label data that I have used or I have labeled but uh only the data that the computer has labeled so these pseudo labels and then the student learns and makes a little bit of a test you know on the validation data then like you you will do an exam and then you will check your exam and check how good you did and there are several options and there's also the option of like oh no everybody failed I did forward lean teaching them so maybe you have to do some reflecting on how you actually teach and this is what the teacher model does so teach just the student it gives the student a test and then it sees the results of the test and from then it decides on how to better learn itself to make better labels so it kind of updates its label data to get the student to perform better and this is a circle so the teacher learns it teaches the student the student uh gives the results the teacher tries to be a better teacher and you know it goes round and round where eventually the student becomes better than the teacher um and this is the overall idea and it might not sound like a groundbreaking concept but this is very new and it hasn't been done before and it also has never been used on camera trap data so I actually don't know how it's going to perform um but I hope because it's supposed to be good on very diverse data and it only needs a little trading data um and usually you need about 10% of labeled data compared to your unlabeled data so we can instead of 100 gigabyte data we can train with 10 times more data which is a huge advantage um and especially if we want to um think about you know you kind of always want to think about the bigger picture maybe it's not just me who's using it maybe not our research group maybe we want to help other programs that have very different habitats um that they can also use a pipeline where they can train their data even if they have very limited training data um and this is why I chose it I'm currently still trying to work on it it has very specific requirements for the computer and how to install it I'll have to do a lot of adjustments I haven't gotten to them yet um but yeah that's the go and then um in the future I will see how well it does um and I hope it does well and then of course there are all the adjustments we can do before how well does it do with full images how well does it do with cropped images um and I will do all these tests in the scope of my thesis and I hope it uh produces something that's very usable for my group as I said we have two terabyte of images that nobody has looked at and it's been just sitting there for six years so if I can make something that they can use and they can finally have some data from this project for how long I will be super happy they will be too yeah probably but uh that's the ultimate goal I don't know maybe I'll make a model that has like 50% accuracy and it's totally trash but I mean that's also an experience but um overall that's the plan and I think um yeah that was it I think maybe I don't know if there are more slides but I think it's done and the image pipeline thing I am so the last overview that you can see with the different pipelines so you're still working on two parts right the coming out of the animals using the polygons so to to kind of remove all of the background and then uh the next one is to the classification where you want to use this uh pseudo labels I mean logistically getting the made up pseudo labels running is the most important step so that's what I'm currently working on and the the v7 is kind of like a you know a little cherry on top perfect um because as soon as I get it running at all like I could imagine like okay I give it just the images and suddenly it's super good it's already at 97% and it's the question like why do we have to do all the steps in between or the cutting the rectangles is already super good or something or the rectangles for example are not better than giving the full image and if the rectangles already aren't better than the full image then it stands to question whether a polygon would be better than the full image so it's still uh up for debate um what's happening I guess I have to wait on the results but running through these things with the sizes of the data I have can take like from between two weeks to maybe a month so I have to be patient just training stuff with the amount of data that I use just takes a super long time but yeah that's it for me thank you for listening I know this was very good one but I hope you enjoyed it and if you have any questions I guess you can ask them now or later to Denny maybe you can answer some yeah well you can hang around like I can just hide you if I want or are you running out doing grocery shopping now that could be as well but yeah if there's anyone having questions um ask them now or um of course there might be questions from the people watching it later on YouTube as well but then just mail them to me and I will mail them to AMA and if I can't answer them I also use YouTube I can just check out the video in a week or two and see if there's anything and answer the questions that will be perfect as well but yes I hope you had some interest and see why machine learning is maybe a more difficult thing than the media makes it out to be it's not a simple solution for everything oh there's a question actually why the differential priority in focus when analyzing pictures random or given I what kind of priority do you mean it's a difficult question um not sure I get the question completely um maybe could you rephrase it Xanax in because I'm ah okay the models so the slides where you showed the bias the models had when analyzing the picture so it's about the I think the background right where ah yeah the background um well I would say generally um I mean generally you can say that computers aren't that smart and they just don't know what's happening but it um it can be for example of course we don't ever just give one image and especially in camera traps so there is camera trap images you can think they're pretty much like all other images but it's actually not true um because for example you have the image of me I have a face and then I have a background and Danny has a face and a background and for normal images it's you know you usually like for example a lot of people use Instagram images to train their models because you have a lot of images of the same stuff but it's always a different background and with us we have kind of the opposite we have the same background and different stuff on the background that we have and it's kind of sometimes if we give the same label with the same background it's going to look for consistencies within the background or within the whole image for example if you as you saw the root before for example here the trees are always the same and it's just so we say there are 10 images with the antelopes and it's always this one and the antelopes have always different positions might be farther away might be closer but the trees in the background are always the same and it's trying to find similarities between these two redetected and that's many why it focuses on things that are not the antelope themselves but the background because it's trying to find consistencies between them that's why I think it's like focusing on other parts because it's noticing that these are super similar between the images with the same labels but it might be also other reasons you know that's kind of what it does right like it just looks at an image and then it learns a pattern like it sees hundreds of the same labels saying antelope, antelope, antelope and it just looks at all of these images and try to figure out what's the same in all of them right and since the antelope is always slightly different but the background is the same it just kind of learns the background and it was one of the questions that I had you say that most of your images are in many cases empty you talked about a certain point where you got the best accuracy when only like zero point something percent of your images had animals in there but is that not unfair in a way right because if you show it because um let me see if I can find that slide quickly because you said that the more empty images the higher the accuracy of the model right and you ended up with an accuracy of 99.8% but that was at the point where you said that only 0.04% of the images had animals on there but that's of course weird because then the accuracy should have been 99.96 because just shouting empty for all of them right would have given you a better accuracy because just saying no all of the pictures are empty in the end would give you a better accuracy in the model and I think that that's one of these things where when I did machine learning in the past I used random forest I was always told that you should see if the model is not that important the thing which is important is the amount of correct classifications compared to the distribution of this in your training set right if you only have like one picture of a monkey then of course this thing calling something a monkey is of course very very unlikely but if you have millions of pictures of monkeys then of course it should be a lot better in detecting monkeys compared to something that you don't have a lot of pictures of yeah I mean this is just pretty much I'm still at the beginning of my thesis and currently I was very limited by my computer which meant I could only do very small tests and I just wanted to kind of get an overview of what is actually happening you know I was just doing quick tests but currently I'm running I want to run a test on all labeled images we have so 124 gigabytes which I said were like 100 to 120,000 images the maximum test I did was 800 images so and but I sometimes like I want to do these pre-tests to just get an overview and not have something running for 11 days and then it's not good but with this we're planning to actually make a confusion matrix so to see how many of the images that had animals were actually detected as animals how many animals did it miss how many empty images did it think it had animals on them and how many empty images actually we're empty so it's really important to see what kind of mistakes were made but I couldn't give you info on that yet because I just don't have the data do you run it all locally on your computer so I have a university computer but as I said I need a lot of disk space and mine only had 100 gigabytes and I just couldn't do it so I got a new university computer and now I have to reinstall everything and then I can run the test all right if you want access like you should have access to our server here still yeah I'm trying to run the server we have a university server but installing tensorflow is a difficult thing they told me all right I can I can I can put it on my agenda see if I can install tensorflow on our server yeah I mean a lot of part of my master thesis and generally programming is trying to get things to work that's the technical problems so it's not just writing the code you also have to yeah have the dependencies that your programs need have the right versions have things actually fit together in it one output is compatible with the other program these are all things that are kind of hidden and I didn't put them all on my pipeline I just did the big steps but you know they're in between each steps they're like five tiny steps and then of course I have to quality check every step because just doing it is useless if you don't know if you're doing it well so I always have to run some quality tests which I will do once we do the big big test which will take me two weeks to run or the computer two weeks I'll send it off and I'll leave it for two weeks and then I hope it does well and then we can and then I can tell you if it how well it does with actually detecting animals when they're animals there and what kind of mistakes it makes but this is an important point so just because it does well doesn't actually mean it's a good model with this simple test all models are wrong some are useful and even here like being able to discard 80% of your images saying that there's nothing in there is already a big help right in the end if you have a student like a master student that has to go through all of these images and classify them giving him 100,000 to do or giving him 20,000 to do with pre-sorting by some algorithm is it's going to make a massive amount of difference of course yeah even having just like even having some of these steps is already a huge improvement so yeah I don't think say like oh it's completely useless until we get over this 96% even as I said before we have the top 1 accuracy and the top 5 accuracy even if you have for example you have a model and then it outputs to a person the top 5 guesses and as we saw before like the top 5 guesses are usually have a pretty high percentage even that would increase the speed because it could be like oh I think it's one of these 5 and then you click the one that's right and my professor thought of doing like something's pretty funny like an animal tinder kind of so you have this algorithm and it shows you the image and then it gives you the top 1 the top 1 guess and then you can swipe right to say oh yes that's the right guess and give that information back to the model so it can learn or you swipe left and it's the wrong guess and then it can give you the top 5 guesses and then you can pick from there and it can learn or it's like oh no you were wrong completely so that's also a point where you can see like we can use it to kind of do some steps so that but we just validate dated and it's learning through validation yeah all right those are also ideas that we have so it's not just all done with this and we're not saying like if yeah we can still do part of the work but the amount of work that is happening now where everything is manual is just not smart to do I'm waiting for the first cage trap pictures to appear when I log into my email saying click the mini dear to continue right now you have all of these training sets for like self-driving cars where you have to click all of the like stop signs or click all of the bicyclists right and that's just to train like Tesla's new vehicles and all of these things because that's what they're using it for so but good you're gonna have nightmares about finding the mini dear and you're like this picture is black there is no mini dear in this picture well that should be an option right all right perfect hey mate thank you so much for being here if there's no more questions then we'll do a short break and I will be back in like 10-15 minutes and then we'll continue with the real part of the lecture which is I totally forgot I'm so thinking about machine learning at the moment standards for analysis so less exciting than cool images of deer and porcupines and stuff and I just I can still read the chat later so even if you have questions to my presentation during the lecture I can still read it yeah and of course if you're on YouTube and you're watching this in five years then just ask a question on the YouTube and I will send it to Aimee in the future and then she can answer it in five years anyway so thanks for watching I think it's been a long movie one and a half hours so it's good thank you Aimee so much and I will see you guys after the break then so enjoy the break I think the first break is what did I select I selected another animal wombats I think so I will see you guys in 15 minutes and enjoy the wombats I hope all right so be right back