 Coming in introducing the first presentation this afternoon, the speaker to start us off is Zhen Neng Wang of the University of Washington. The presentation centers on the electronic visual monitoring challenges and developments with AI. Welcome, Zhen Neng. This is Professor Zhen Neng Wang from the Department of Electrical and Computer Engineering, the University of Washington at Seattle. It is my greatest pleasure to share some of my existing efforts on electronic visual monitoring for AI ocean. We have been very lucky to collaborate with many marine scientists from several fishery science centers of NOAA, including Alaska, Northwestern, the Southeast, on this electronic monitoring innovation project in the past 10 years. The purpose of electronic monitoring of fishing activity is to systematically survey the fish species and sizes during the fish catching. Either we should base the onboard monitoring or the long night rail catch monitoring so as to achieve real-time reporting and the regulation compliance. With the corrective visual data, we aim to achieve fish counting, lens measurement, and the species identification during the catching activity using visual data from onboard cameras. Through also working with Alaska Fisheries Science Center, we had the opportunity to work with an EDF project on sugar-based discard monitoring about the fish counting, lens measurement, and the species identification of waste cost of fishing activity. Here is an example of the onboard camera shoot and the observer to monitor the operations. Waste cost of the sugar data set has lots of fish slime on the shoot background, as well as the camera lens blurring due to water splash. To overcome these challenges, our detection segmentation, lens measurement, tracking, and the species identification require special attention. That is, we combine the deep learning algorithm with a lot of computer vision techniques along with temporal consistency of video object tracking to overcome the errors caused by the mis-detection, unreliable segmentation, and tracking. Here are two qualitative examples of fish counting and lens estimation from waste cost data. The species identification performance of the waste cost data set, where only 17 classes that have more than 15 annotated samples are chosen for deep learning classifier. With this amount, the limited amount of returning samples from waste cost, we need to perform a lot of data augmentation and start with a pre-chained AI CMM model with Alaska data set finetuned on the waste cost data set. The achieved overall top one identification accuracy amongst 17 classes is 86%. On the other hand, we can achieve much better identification accuracy on Alaska shoot data set, corrected on 2015, 16, and 19 with 98 different classes. Of course, the Alaska shoot data set has much more annotated training samples with very few slime on the shoot background, nearly the shoot camera blurring. Moreover, our special attention on the long tail training strategy can achieve comparable accuracy even on those minority classes with much fewer training samples. More specifically, it would be even better to directly use the trend identification model from Alaska data set. We call the source domain for identifying fish from waste cost data set, the targeted domain, which has field and noisy samples without finetuning to avoid the borrower's annotation efforts. Unfortunately, when comparing the distribution of nine overlapping classes shown in the border phase in this table between these two data sets, even though both of them are long tail distributed, we find that they are quite different. And those overlapping waste cost data set are not dominated classes in the Alaska data set. This is why the long tail recognition strategy was used to train on the Alaska data set, that is the source domain data set. In addition to the distribution difference between these two data sets among those overlapping classes, this is the so called label shift between source and the target domain. Their appearance are also quite different, that is the so called domain shift. This appearance difference between two domains can be further justified through comparing the embedding features is traded by the trend AI CNN identification model, where a science of dimension of the embedding feature of Alaska data set can be easily visualized by two dimension as shown in the T S&E plot. Not that species embedding features tend to separate into clusters through the use of metric learning, where data with the same classes are forced to be close to each other, while different classes are being pushed apart. One clustering example of a short spine only head is shown on the right picture. Not that the embedding feature of waste cost data set based on the AI CNN model are also illustrated in T S&E plot, where red soul and the spotted red fish cluster are shown. They have quite different embedding features with lots of Alaska fish of the same kind. If we intended to perform fish identification on waste cost data set without the fine tuning on the trend AI classifier from Alaska data set to avoid the laborious annotation effort on the target domain of the waste cost data set, it is highly decided to overcome the domain shift due to appearance changes, which require an unsupervised domain shift technique so that the trend classifier based on source domain, that is Alaska data set, can be easily adapted to classifying waste cost data set once their extracted feature from post domains are better aligned. Thank you very much for your attention. Thank you very much. That was fascinating, and it reminded me of one of the earlier presentations this morning that you might not have been able to be awake for or might have been past your past sleeping time, where they found similar problems when moving their models between boats and what they found and what others found over the presentations of the day that just the layout of the vessel and the different light climates that the footage was taken in needed the model to be retweaked almost between vessels. And this is interesting because I think if we are able to develop a sharing model, we have to also work out where these biases are coming from and work out techniques so that others can learn from your experience of moving between Alaska and the West Coast. Was this a surprise to you considering you done the training and were there any other surprises which came to you in your work? Just some idea of the kinds of things that took you by surprise. Oh, okay. So for most the practical AI application, we have to deal with all different kinds of domain adaptation, domain shift. Domain shift normally are people divided into two different types. One is the distribution difference. The other one is called the appearance difference. And both of them require very careful technique to take care of these using all the training model to new data. And we came out with two different techniques. One is what we call it the long tail recognition. Once you are able to nominate the distribution of what kind of distribution, as long as you are able to make sure all different kinds of fish classes can be correctly recognized or identified with equal accuracy. Then the new type of data, what type of new distribution are there, we are still able to equally identify them. The second one is appearance change. Appearance change people call it a domain shift. And we came out with something which we are getting a very skilled on a transfer from one domain which has different kind of appearance. And this is really a big surprise. I didn't really notice the same model and we send the Alaska fish compared to the West Coast fish. They are embedding fish are very different. And we have to try to shift our model without the labeling of the new data set. We are still able to correctly recognize it. And this kind of technique is called the domain adaptation. And we are applying that and see some new progress. It reminds me of the time zones we're in. Some people are going to sleep and some people are waking up but we are in the same day. Matt, do you have any question for us, please? Yeah. Thanks so much, Chen Meng. Something that came up today this morning and also a couple of days ago was the integration of 3D data into combined with textual data. You're obviously making clear breakthroughs using 2D textual data. What's your opinion on the use of 3D data to enrich the learning and playing data set? Okay. If you think about inside the shoot, okay, if we know exactly what is the lens between camera and the surface, infer the 3D information to know the exact size of the lens of the fish. That's pretty straightforward in the computer vision field. Really, we do not need a stereo camera or a 3D camera in order to know the size, to know the lens of the fish. So we don't really use that but we do have another experience. When we are doing this long line, royal catching and the camera is put outside the boat and looking at the fishing activity when the fish being captured up and the fish because of the catching process of the fish deform a lot. So we want to know the exact size of that fish being caught up during this pulling up. We find stereo camera still cannot compete the performance of monocular camera to get the exact size because using the monocular camera, we have a lot of computer vision techniques. We are able to perform the deformation, estimate the deformation and eventually to be able to exactly estimate the size compared to if you use a stereo camera. Yes, you can get some kind of the sizes, 2D sizes of the deform body but that deform body is not the exact size of the lens of the fish. So you still need to expand it but that deformation using monocular, monocular camera turns out to be easier to perform and also using the technique of this monocular CNN AI model, we are able to achieve better performance than using three-dimensional information and the three-dimensional camera camera system. No matter you use a stereo or you even can use a LiDAR or whatever, they are not really in terms of information reach, they are not as good as monocular camera. So in terms of species identification lens measurement, we continue to stay with a monocular camera. Can I just ask one more question Kim, sorry? No. Is that okay? We had a presentation yesterday about species with very limited datasets such as darks in particular and I actually suggested that gathering biometric data may, using photogrammetry for example, maybe one option for understanding the characteristics of having one speaking from another. Do you have any solutions from your work with monocular vision that you think might be useful to a species where there are very limited datasets? Can you say that again? I'm sorry, I didn't really catch your question. No problems. So we have some species which are obviously really important when they have limited datasets, for example some species of sharks. We have limited images and we were talking yesterday whether or not three-dimensional data from photogrammetry could enrich biometric information. Relating to species identification and discriminating between species that look very similar. I was just wondering if there was any work that you've done using your monocular methods which might be useful in that process where it's very limited in framing data. Okay, I got it. Yes. Number one, to recognize a very, we call it the tail set of the data which has only few training data. Certainly just for identification purpose, sometimes a two-dimensional monocular camera are good enough. But if you really want to introduce the biometric information and also the three-dimensional information into it, these days a lot of monocular cameras, we are able to just use one single view or maybe multiple view to recover the 3D shape and even 3D appearance. So this is called one view. One view to create a 3D model and this has been done in many other fields, not really for species or for fish type of things, but it can be done. And we have shown we are able to use one single view to recover the whole bird, three-dimensional bird model. And you can look at after one single view, you are able to look at any kind of view of that bird. But you need to see a lot of similar bird from different kind of species of bird. Then the AI model is able to perform some kind of generalization. They don't see any kind of bird in one single view. You are able to create the three-dimensional model of that bird easily. Thank you very much, Denning, and for your team at the University of Washington. Thank you.