 Our next presentation is from Peter Sadausen, the Assistant Professor of Computer Science at the University of Hawaii, Vietnam, who is presenting fishnet, species classification and size regression using deep learning. Now deep learning, hopefully you'll hear a bit more about that from Peter and why that's different. And the label datasets of one million fish. Welcome Peter and welcome again, Peter. And Ricardo, if you would please play the fishnet video. Hi, I'm Peter Sadausen, an Assistant Professor of Computer Science at the University of Hawaii, Manila, and I'm presenting fishnet, species classification and size regression using artificial intelligence and a dataset of one million fish. Our team is a collaboration of the University of Hawaii with the Nature Conservancy affiliate in Indonesia and the Ministry of Marine Affairs and Fisheries in Indonesia. We also have funding from the Walton Foundation. So fish stock in Indonesia is a challenging task and we wanna use artificial intelligence to help them achieve marine stewards ship council certification, which requires monitoring 5% of the snapper feet for about 10,000 boats. This is challenging because the boats are very diverse with different types of fishers. There's an extreme amount of biodiversity spread out over 5,000 kilometers in the Indonesian archipelago. And we wanna empower fishers to manage their fishery, their livelihood and their data. So we have hope to be able to do this because of three key technological developments. Digital cameras have made it cheap and easy to collect photos of what fishers catch. These photos contain metadata such as time and location and we can use that to verify classification. Computer vision and artificial intelligence have really taken off over the last 10 years. These algorithms are systematic, reproducible and automated so we can redo the processing if we need to. And there have been efforts to collect large databases of images that can be used for training these artificial intelligence systems. And this requires big collaborations and crowdsourcing of annotations. So the Nature Conservancy has worked with Indonesian fishers to create a very large dataset of snapper and grouper catches mainly. So this dataset consists of 200 million digital images collected from over 500,000 vessels. Basically the team gave fisher captains digital cameras and asked them to take photos of everything they caught after placing the fish on this colored board here on the right. These images, many of them were annotated by a team of biologists at the Ministry of Marina Farrison Fisheries with species and size information for each individual fish. And over one million of these fish have been annotated by that team. This data was then processed by computer scientists at the University of Hawaii. And the processing includes cleaning up the data and detecting incorrect annotations and such. So using this dataset, we've built a AI fish stock estimation system that consists of three steps. So the first is to segment each image to detect individual fish from images that might contain multiple fish. And then each individual fish is classified with species information and the length is estimated. So step one is segmentation. To do this, we used a big neural network model known as Facebook's Detectron 2 region CNN. We actually segmented a number of these images ourselves in order to fine tune the model to detect fish and these fiduciary markers on the board, these little colored boxes. These are used for estimating the length of the fish. And the results are very good. So images that only contain one fish, we get over 99% correct segmentation. And for images that contain multiple fish where the fish might be overlapping each other, we get over 93% detection of individual fish. We also measured the intersection over union, basically the overlap in the segmentation and we get very good performance on that. So once the fish have been, the individual fish have been extracted, we can do classification. And we use another big neural network for this which takes in an image of a single fish and outputs probabilities over a set of 163 different species in the dataset. This is a pre-trained ResNet 50, fine tuned on 100,000 labeled training images from our data. We get 94% top one accuracy, which is very good considering that the dominant class is only 30% and 99% top five accuracy. So of the 99% of the time the correct fish species is in the top five list of the output of the bus far. Step three is length regression. For this, we're extracting some engineered features, the length of the color boxes and the length of the segmented fish. And we plug those into a random forest model. And on a held up test set, we get a root mean squared error of 2.2 centimeters, which is very good. And for fish stock estimation, we don't actually care about the individual fish error, but we care about the distribution of the predicted error. And so on the bottom right here, we show that the predicted distribution of the length and the ground truth as annotated by humans is in agreement. So lastly, this has huge potential for transfer learning because most AI computer vision systems these days use models pre-trained on this ImageNet dataset, which consists of cars, cats, and landscapes. But this is really suboptimal for specialized tasks like medical imaging or even classifying fish species because the features are just different. But with this dataset, it's about the same size as the ImageNet dataset, we could actually pre-train big image classifiers on this dataset and then use that for other fish species and other fisheries. So we have a lot of hope for being able to do transfer learning from this dataset to other datasets. The other thing we're working on developing is pushing this technology out into the field to make it more usable. Thank you. Thanks for sharing. That was eye-opening for many of the listeners. And I think for most of us that aren't very tech savvy, the difference between machine learning and deep learning is something that we're only starting to understand. I'd love you to give us some idea on your future vision for how your work is going from when you started with rulers and sitting them in the right spots. I remember when we started trying to record sizes of abalone in very, very choppy waters in Australia, the only way we could really do this with any efficiency was to actually put a bar in front of the video cameras and just tap the abalone as we swam underwater. And that gave us a fixed distance between us and what we were recording, which allowed us to know what the size would be on that plane. And I've seen that you've taken a different approach where you've got a board with segmentation. And I imagine that in a few years from today, they'll both be looking at our techniques and thinking, wow, those guys were in the early stages of development. But I see that with your deep learning, you potentially are going to let the computer decide if a species fits a model or not. In other words, build a model around a species rather than you having to actually annotate all pictures. And potentially in the future, we won't need boards and such like or distance from camera measures or double cameras to allow us to get these. Can you give us some feeling about where this is going in the research world and what we can potentially look forward to in the future? Thank you. Yeah, thanks, Kim. So we definitely need some information to extract the size. So we were lucky enough that Peter Mohs at the Nature Conservancy planned, he thought ahead by having the fishers put the fish on this board. Because really, the machine learning algorithms can only do so much, right? They're not gonna magically be able to estimate the distance to the fish and estimate the size. So having some sort of fiduciary marker is definitely important. I think the part at the end where I talked about the transfer learning, everyone here doing computer vision is using this idea of transfer learning where you download this data set or you download a model that's really good at detecting images of cats and landscapes. And then you only use a hundred or some hundreds or thousands of labeled training examples in order to classify different fish species. So everybody is using transfer learning. That's what 95% of AI applications are these days. But I just wanna stress that you can often do better if you train your own specialized model without relying on some downloaded model. And so that's kind of what we're excited about with this really big data set. But yeah, there's definitely opportunities for a collaboration. The original ImageNet data set was the biggest user of mechanical Turk, this like crowdsourced annotation system 10 years ago. And it really drove AI forward just having this giant data set of a million labeled images. And so doing fish stock estimation, I think we could do a similar thing where we push for getting lots of different species and some big data set. And then you could build models and everybody could use these models for their own specialized projects. Just before I hand over to Matt for his question, can you give us some insights in the same question I gave to Sherry? Any particular tips for people who are working on collecting imagery of the types of tips for new players about how to line up your images or what was the thing that through some of the models that maybe you could get on the front end of? Thank you. Yeah, that's a good point. You know, I think having some sort of markers that allow you to estimate the size, I think is important. But beyond that, as long as you have something, I think you're good. If it's totally white background, you're kind of stuck. Yeah, I'll just, yeah, great presentation, Peter. Thanks so much for the assembly and your efforts putting it together and the whole team as well for their work. I just have to add to this conundrum of accurate size and biometrics from two-dimensional images from my own work, which stretches back into the 90s doing photogrammetry, but manually matching points as opposed to using, you know, sift or slown methods to match points between photographs. From my experience, you know, if I hold up a picture of a whale shark, I know it's being matted out, but it's still a picture of a whale shark. It's not a real whale shark. And there's no way we can tell what size it is, unless either we have markers in the background or multiple images of the same objects. Mathematically, from my opinion anyway, I don't see it as a feasible thing to train the models without multiple images or some distance from camera mathematics in there. But I think your dataset is absolutely critical for building accurately matched verifiable data for species and, you know, making that accessible globally, I think is really, really important about work that you've done, if it could be possible. Could it be possible? Yeah, so the dataset we have is owned by the Nature Conservancy and we have a data sharing agreement with them. So we're able to publish our model but we're not sure if they're willing to share the dataset yet. But that's something, you know, we see that as something that would be hugely valuable. So we're definitely interested in doing that. But, you know, there's a lot of effort for those biologists in Indonesia who actually annotated a million fish. So we see that as, you know, their data and we're just using it to build these models.