 And these Doodleverse tools, they've kind of been pulled together over the last two years or so. And they're really kind of the culmination of many years of wanting to provide software to the geoscience community for the purposes of carrying out any segmentation and to try to make it as generally useful as we can for a broad set of objectives. So yeah, really happy to be back to you Evan. Awesome. So the outline for today is that we want to introduce the whole project which we refer to as the Doodleverse, as well as just immense segmentation in general so to motivate why do we need these sets of tools and how can they be useful. Then we'll give an intro to the tool Doodler that we'll be talking about today. We'll give a live demo. Probably Dan will do the live demo. We'll then pass it off to everybody to sort of doodle all by yourself so you can get a feel for it. It's a tool that needs some practice to be able to use correctly and to understand some of the edge cases, because it's a human in the loop machine learning tool. So it's a live tool. It's running live at the time that you're doodling. Everybody who doodles the images, Dan subsetted a bunch of images we provided them on the Google Drive for everybody. They're all named with the last name on the Google Drive. We'd love for you to doodle those and then all collectively do is an auto data release of all the images and labels so we'll talk about that. We'll live demo some of the tools that are useful after you've done doodling, which is called, which is the utils folder. And then we'll pass it off to all of you for you to use the utils so you understand how to get your, a lot of information out. And then we'll wrap up do some Q&A and plan for the next week for next week's class which is more of the deep learning image segmentation pipeline which we call Jim, which we'll go over. And I just want to say that feel free to drop questions. I mean, we'll try to monitor the chat chat as much as possible. The, well, all of you are doodling. This is sort of a free time to just have Q&A and do some discussion if there are any deeper questions that you have about anything that we're introducing. So that's what I want to say that that's a perfect time to ask questions and to get our opinion about what's going on. So the, like I said, the best way to learn about doodling is to actually doodle. So the two hours that we have here are more like a clinic than a webinar where we're going to be giving you a lecture for two hours. I mean, thank goodness right two hours of us talking. So most of the time is people doodling on their own machine and talking about it. And I just want to say that, you know, instead of buying shoes or checking emails or doing other stuff on zoom calls, I often find myself doodling. So and instead of passing this off to other people, I tend to do it because I find that it's a relaxing activity. So you're two things that Dan and I like to trot out whenever we talk about doodling, that it's, we're effectively coloring here. And it's a great relaxation practice when you're on probably more stressful zoom calls than this one. This one's not stressful. So I'll pass it off to Dan. All right, thanks Evan. So this slide really is just a transition slide I guess we're going to be talking about doodler for most part today but first I'm just going to talk a little bit next slide about the doodle verse. What it is, and just a real brief summary about how all of the different tools can mesh together. The doodle verse is named after doodler, which was the first tool that we made in this series. The other two tools that Jim is also basically ready. And zoo is another tool that is integral to the doodle verse that we're still working on, but exists up on the doodle verse if you go to the doodle verse GitHub site which is down there in the bottom right hand of the screen and also in that qr code. The motivation was to provide a set of tools that work together for the purposes of a image segmentation which is pixel wise classification. But more specifically it was tries to address two different things one is a an accessible pipeline for geoscientists like us. It's been tried and tested on imagery that we care about, such as you know imagery from remote sense platforms like satellites and from aerial imagery and things like that, but also gridded data, like sort of non specific gridded data. And that could be things like sonar data or geophysical data or even could be model outputs. And I know that CSDMS is obviously a group of numerical modelers who are concerned with simulating at surface processes. So we think that even if you're not particularly engaged with images like photographic images, we still think you might get benefit from these tools because they might facilitate your segmentation of your model outputs, and it might kind of lead into new unexplored areas for you. So doodler is, is it serves two purposes. It's for segmentation of any arbitrary image, image being that kind of all encompassing term of any gridded data set on regular grid. You might just have one image that you want to segment or you might have maybe a couple of dozen images that you can quickly segment and do there is designed to take a lot of the work out of that enterprise, rather, you know, rather than an alternative workflow which might be digitizing polygons for example. There's a couple of different downsides to digitizing polygons one is that it's very time consuming. And another one is that it's quite difficult to get the edges correct like you have to actually line up the edges of your polygons. And another disadvantage is that often it's quite difficult to make a call about what you see at the transition between two different thematic classes that are parent and the image. So it is designed to take a lot of that it's designed to do heavy lifting, but it's also designed for making a more objective call as to what the pixels represent that boundaries. So it's for generic image segmentation purposes it's been tested on imagery all the way from kind of cell phone images all the way up to satellite images and beyond. It's purposefully designed to be generically useful for the for a range of image types, but it could also be repurposed and slightly modified if you wanted to for specific applications that you may have. And then because it's open source code you are obviously encouraged to fork it and to modify it as you see fit. And if you see and like any of these tools that we're going to present. And if you see benefit in the modifications that you've made to the general community then we encourage you to contribute that back and we'll talk a little bit about that later on. The main purpose of doodler is to boys used in the same capacity but it's for the purposes of training deep learning image segmentation model so it's for generating larger sets of labeled images that you can then subsequently train a machine learning algorithm. So that machine learning algorithm could be from deep learning or it could be from machine learning or it could be from somewhere else. What we have implemented in the gym software is a is deep learning the application of fully convolutional models in deep learning, specifically units, which have been proven in a number of fields, both within the geosciences to be generally useful and powerful image segmentation implementation so that's what Jim is and Jim is an end to end workflow that helps you ingest images and labels that could come from doodler or come could come from elsewhere. And then get those images and labels into a format that can then subsequently be used by machine learning algorithms specifically TensorFlow, which is, you know, very popular deep learning. And then provide utilities for experimentation of training those models and then application implementation of those models so that's as Evan said that's what we're going to do in the second class. So, finally, we're not going to talk too much about zoo but I just wanted to kind of give it a brief introduction here and we probably won't talk about it again until next week. But zoo is basically, it's designed to be a collaborative enterprise where we, we, Evan and I and our colleagues are going to contribute models that do generic things for image segmentation. We're working mostly on photographic imagery, and we're concentrating mostly on models that do kind of generic things like find water, find vegetation, find sediment and things like that that you things that might be generally useful for the geosciences science community but that we have a specific end goal for. It's also going to be and it is becoming a collection of example Jupiter notebooks that demonstrate how to take a model that has been trained using Jim and then apply it in different contexts. Once you have a model that's been trained, you get a sense of how accurate it is by virtue of pointing it to a validation set or a test set, but you don't always know exactly what the optimal way to implement that model isn't that takes a little bit of image segmentation as well. And so what that's what zoo is basically designed to do. So, those three tools they comprise the deal bus, and then there's a collection of other tools that we're building other applications that basically use these tools that we use for our own research. Next slide please so image segmentation is is the classification of image pixels, the pixel wise classification using supervised machine learning actually it's not just using supervised machine learning, it can be unsupervised machine learning to supervise this just when you provide the the machine examples unsupervised is where you allow the machine to discover what the classes are for itself supervised machine learning is very much the state of the art, especially within geosciences you'll see many many more applications of supervised machine learning because it's generally much more powerful. And what I'm showing on the screen here is an excellent example paper that came out that used as an algorithm is very similar to a unit here for pixel wise classification of shellfish reefs. And it's a generally useful tool because it allows you to classify at the smallest scale I one pixel. And if you're looking at gridded model outputs for example you could kind of that would be your smallest grid size, you're looking at pixel outputs or rectified imagery that's kind of the smallest spatial scale which you're able to make any inference. And so therefore you can use it for many different things like looking at the occurrence of a thing how much area that thing occupies what distribution of things exist and what their spatial proximities are. And it's also useful as a kind of generic tool for purposes of data cleaning to you can actually identify certain things and remove them from consideration in a subsequent process. That could be noise, or it could just be removing a specific feature in the image that you don't want a subsequent process to look at or analyze. Next slide. It's just a couple of examples I'm going to show of image segmentation that's been used in projects that I've been involved with you know that there's, as I said this has been kind of developed simultaneously across many research objectives that myself in heaven have had. These are just two examples from static cameras where you have a particular feature of interest on the left there we're interested in just finding water in flooded scenarios from webcams on the right. It's a more specific situation where we're interesting in numerating the different classes that you can see there for the purposes of coastal processes and tracking landforms in the coastal zone. And you can imagine that this is, you know that this could be extended to any number of classes that you care about from from simple to play two case scenario like on the left to to five case scenario, and you can see on the right and and more. And here's an example, you can sit through the whole thing and thanks. And here's an example of how we, I personally use image segmentation for the purposes of data pre processing and data cleaning. This is a typical series of images that are collected by my group at the USGS, the USGS plane cam. These images are then used within a structure for motion, which is digital photogrammetry to reconstruct the 3D elevation that you can see on the bottom. And it's really it's a model that's used to identify water and remove it from consideration because it allows much, much greater computational efficiency for the, for the subsequent process we don't have to manually clip out the areas that we're not interested in. And it also greatly reduces the amount of computational resources required to find the solution. In line with all of the examples that Dan has shown. There's been a collective effort that Dan led called co strain, which is a billion pixel 1.2 billion pixel data set of human labeled images that we created using doodler. So many of the people who were involved in the labeling are on the call. I'm going to share in Venus and JC might be on, not, but here's an example we labeled a bunch of imagery that comes from many different sensors both aerial images and satellite images and this just shows you an example of the range of different imagery that we use in the range of classes that we used, and that the utility of do learn a bunch of different scenarios. The point not only was to try to learn in a bunch of different places and to assemble a team and to understand an iterator agreement. But it was also to create this data set that could be used a hot start of model, where if you were interested in the same classes if somebody else was interested in the same number of classes but a different set of imagery, then these classes could be used as training data. So do learn is very useful for making these sets of training data with a bunch of different sensors that can be reused so this was a data, this is a data release the pre print. It's in revision right now but the pre print is available in earth archive and the data set is live through USGS science base. So you can see, you know, here's land sat. USGS ortho physics that were created so we've done a bunch of different these all have several 12347 classes, but the data set ranges from four class images to probably 10 or 11 class images. So it's been used in a wide range of situations so that the examples that Dan showed earlier in the previous slides are not the only examples that we have we just used it and a ton of different deployments. In general, all of these pieces tend to fit together just like Dan said and what we tend to call the image segmentation design pattern so doodlers used for labeling the data. And then there's this piece in the middle that's Jim. And then the piece that that will take the images and process the images, and then eventually we're hoping to build out and have built built up but but continue to zoo. So this is mostly rehashing what Dan went over on the first slide but just to sort of demonstrate that there are clear flows in this pipeline to get all of these tools and the data from these tools to be interoperable with each other. So it was designed and we're building it from the ground up to make sure that all of these things are interoperable and smoothly work with each other. Any data creation that any of you do in the future. Unfortunately, can be added to this in terms of this bucket on the right label data. Co-strain is just an example and all of the use cases that Dan showed of label dimmetries is just an example. But all of this label data is useful because it can for other people or for yourself can be backfed into the beginning of the pipeline so anybody could use the Co-strain data, or anybody could use any of the label data that you all create to build better models in the future and sort of bootstrap together things that work in models that work in and are generalizable and work in other situations or do research on what situations it works in like transfer learning and things like that. So this is just a demonstration of all the tools really working together. Doodler is the focus of this class. So the instructions that we provided an email sort of layout how to get started with doodler but in general when you're thinking of mounting a campaign where you're doodling images, we're creating something that if you're going to do it or going to muster a team to be able to do it. The first step is to, is to look at our GitHub repository. So the doodle verse headquarters is really the main port of call for understanding all the activities that we're doing. And all of the different projects that we've talked about live in their own separate repository in that organization. So you can see this is the doodler one which is doodle verse slash dash doodler. Enable we'd love it if you have issues to drop a new issue in for us and we can triage it from there. If you want to do a show and tell or just let us know what's going on. The discussion tab is open. We'd love to be able to talk in there about and see what everybody is doing so that we can help to foster a community of users around that and give you support because a lot of the problems that you've run into. We do. If you do run into them, we've run into them before, where a lot of the questions or choice points here you make, we've had to make those before. So the project itself is the repository is quite active Dan and I try to respond to issues as soon as they come in. If you're going to start using doodler want to point anybody to start using doodler the first step is just to download the code and make the conda environment. It's been I run it without using conda on some machines and it works fine but the conda environment is the surefire way to make it work. We know we have a current M1 Mac issue that we're trying to figure out how to move forward with that. But we haven't had any issues on Linux or Windows machines or previous non-Apple silicon machines. So you provide you bring images that you once you've downloaded you sort of select the images that you want to label and just as Dan said this can be generic. This is we're using images in a generic sense here. They can be model output they can be anything that's gridded that sort of looks like an array. You provide a class list. So you say you want to do water, sand, or people and computers, or something like that in the images that you want to label. And doodler leverages your browser to operate when you run. So we're recommending right now there's a little bit of a hiccup with using some browsers there's a little clap thing that we can describe later but we found that it works. The Firefox is recommended but it works on Chrome and Edge and other Safari I think. So you want to start doodler you are interact interactively labeling images. And once you've done the images once you've done a set of images you can stop if you just have a few images to label or as Dan said you can continue to label for some sort of follow on machine learning and in general the the pathway out of doodlers to use the util the scripts in the utils to prep the output for either follow on machine learning or follow on analysis. That gives you the masks or the labels or the overlays if you want to see that the nice examples that we showed earlier. So that's sort of the seven step workflow for working with or the step wise workflow for for how to technically use doodler. But we'd really appreciate just as Dan said I just want to say again we'd love people to join the project and make issues and help us out and be a part of the team. Now promise we'll get on to the live demo in a minute but this is just a video that was pre prepared of doodler in action this is it just working on a fairly simple image of just you know as you see this is a beach and the two classes that you can see in the top right there is water and land. They are the classes that you write in that text file so you in your in the repository if you if you clown it if you've cloned it, you'll see that there's a little file in there called classes dot text. And those you just write in the classes that you're interested and then they appear as buttons on when you launch the tool. It's running in a web browser as Evan said it can it's generally designed to run a local host which is that is that with address that you can see on the top there. It's also set up if you wish to if you if you want to put in the effort you can get it to run on a server and then serve that out to people. Evan and I have both done that you know it's a little hiccup II where if you don't know what you're doing with web serving, but it does work. And we'd encourage you to do that and and to share experiences of how you did that as well as if you're interested in helping us do that. But it's written in Python it's native Python but it's using a tool called plotly. It's actually dash plotly, which is kind of a port to JavaScript to allow it to be kind of this web interactive web viewers got two tabs. As you can see what one tab is to really just try to maximize the amount of real estate that your image occupies on the screen. You have a bank of controls on the side that we will go into during the demo. You can provide doodles of every class that is visible to you on in the image, ideally in every part of the scene to you see that in this example I'm you know quickly going across the entire image, because I want to make sure that most, most of the images covered. Generally set the settings on the right hand side for a whole group of images, you don't necessarily have to modify those settings for every individual image. And then when you're done with an image you switch to the next tab, you select a new file, and then you switch back again and off you go. So that's basically the gist of it. I was running anywhere browser but we strongly recommend Firefox. There's a paper that we provided a link to and these are two images from that paper if you're really interested in kind of going into the guts of what's going on here there's two ways to do it is just look into the codes but and there's also to kind of refer to those codes with the paper. The paper is designed to kind of introduce the topic of the this this paradigm of labeling images which is called human in the loop. There are a number of really good labeling tools out there, and that kind of mostly follow the line of polygons right there's, there's, there's kind of simple polygonal workflows that exists and then there's other toolboxes that allow kind of take a little bit of a workout of polygonal workflows. But as I said, you know that the advantages to this approach is that it deals with transitions quite nicely you don't have to necessarily label up to the edge. And that's especially relevant for earth science imagery I think where you have this mixed pixels situation where you have a pixel that could be several things. You know, gradual and abrupt transitions. And, you know, especially in areas, you know, imagery of natural environments that's especially important. So you provide the human in the loop aspect of this is that you have a human that's providing the annotations to a machine learning algorithm that's running behind the scenes as soon as you stop. You hit that compute button, the image. There's a bunch of features that get extracted from the image, which is really just passing convolution kernels of different sizes of the image to extract certain features. And that then goes into a multi layer perceptron, which is a simple neural network that provides the probabilities of each class in each pixel. And instead. So we researched two different ways to go from there we researched taking the maximum probability of each class in each pixel. We also researched an alternative workflow, which is a little bit more sophisticated that's called the conditional random field and basically what that does is another machine learning model but it's completely independent from the, from the first one. You provide those, those initial probabilities and it figured it constructs a graphical model that evaluates the likelihood of each pixel, given the features that it extracts from the image. So you have the ability to undo some of the labels that were provided to it from the, from the NLP, the multi layer perceptron. You'll notice that as you gain experience with doodler that oftentimes those two, those two solutions are very much the same but in some specific situations it actually has a little bit of more agency. The agency is really governed by the amount, the, the value of the parameters that are going to do in the demo. But you'll see, if you have it launched already you'll see that there's two. There's, there's two banks of settings in the, in the software one school post processing settings and that's really referring to the CRF. And then there's the classifier settings and that's referring to the MLP. And then I'll talk about each one but the, the two things to know is that the CRF it has this blur factor in this model independence factor and you can actually tweak that a little bit to get you slightly different results. I mean, I think it's important to communicate this juncture and I want to say it before I forget it I think that doodler is designed for relatively rapid labeling, right. It's, it's primary goal, and it is really for generating training datasets for the subsequent process as we've said this training of deep learning models. And so through by experience with kind of arrived at a kind of a decision and an understanding that the, the labels that you provide the deep learning algorithm don't necessarily have to be 100% perfect. They can have a little bit of error in them, because if you give because deep learning algorithms are inherently probabilistic, and they cut through complexity really well and that's why they're so popular. But there's an understanding amongst machine learning researchers that, well there's two understandings, one is garbage in garbage out which is real. If your labels are garbage then you will get definitely get garbage out. However, you can, you can flex that a little bit and there's a little bit of leeway in terms of the amount of error that you might see. So doodler will generally give you good results but you may see that there's a few pixels that are that are wrong. So you don't need to worry about that too much it really does depend on your intended out your intended purpose. And we offer two things there one is that we've got post processing tools that we're developing that will help you kind of refine that kind of smooth over some of that pixel level noise that you might see. And then the other thing is that if you're using this for the purposes of training these gem models, then we have kind of determined through experience and experimentation that there's a, you know, is that you give the models sufficient imagery generally cut through that complexity to the point that I've got models that I'm working on that are actually much better than I could do myself. You know the outputs of those models are generally better than I can actually label myself. So I won't say that before I move done. Next slide, I'm going to talk about another important thing that wants to communicate, and we will get on to the demo and I promise is that it's generally designed to work on a commodity laptop or a commodity desktop right where you know we know that geoscientists generally tend to have, you know, really souped up computers with lots and lots and lots of RAM. This will work obviously very well if the more RAM you have but well general advice if you're in a kind of a low RAM or low CPU kind of environment is to use smaller images because it's just it's going to consume much less memory and you're going to get your outputs in a much more reasonable time. The other motivation for using generally smaller images is that you don't have to zoom in and out and pan around which actually takes some time. We are developing different tools that actually a person on this call two people on this course I want to give a plug to Sharon Fitzpatrick and Venus coup they are young computer scientists and software developers who are helping us basically turn doodler and Jim and that we're building applications and one of those applications is another tool that we're going to hopefully come out with next year, which is going to enable us to doodle on maps and pan around, but for now doodler is really designed for this smaller imagery. Doodler does provide the tools in the tool tip in the top right hand corner that does allow you to zoom in and out, but it's not necessarily the most efficient workflow. So if you do have large images, then we advise you to chop either chop them up, or if you can downsize them, and then you can just subsequently upsize the labels as we this is a kind of all part of the same message really of communicating that if you're using this these labels subsequently for training segmentation models, then some amount of interpolation as a result of upscaling and downscaling is actually tolerable. If you, if it's intolerable for you to actually change the, there's the size or the aspect ratio of your pixels then we would, we would advise that you chop your images up. For example, I use a lot of very large satellite scenes and they just chopped up into smaller bits that are much more manageable and it's much quicker to doodle them without zooming in and out. Next slide another thing that we encourage you to do well not necessarily encourage you to do but something to be aware of if you're working on a larger project where you're trying to generate lots of imagery for the purposes of training these models is that it's quite useful to know how difficult your task is. We found, we did a couple of different experiments with this early on and we found that even among so called experts who are well used to looking at imagery this is imagery of coastal environments. It's hard to tell from the graphic but these are just aerial images of coastal environments where you have deep water and breaking waves and very shallow water and then land. We found it was actually fairly hard to make that call between shallow and and and deep, for example, these are subjective things. And so going through an exercise of of comparing the outputs from multiple people of the same images is very worthwhile for a couple of different reasons one is that allows you to decide on a final set of classes that are doable by a group of people. And then further allows you to then quantify what the expected error might be in that. That's really kind of your irreducible error that's something that you can't do much about. You can change the settings on the software and you can adopt different strategies you can talk among yourselves about what the best strategy is for doodling in groups. And then we talk about both in the doodler paper and also the coach train paper that I haven't referred to before, but just bear in mind that you may have some disagreement, and then there are ways to get around that disagreement and they're there talked about in the paper these are figures actually from this is very similar figure in the doodling paper that talks about how you might quantify that. In addition to checking agreement. It's also this question of how many classes to use, and I just want to tell you we don't have a solution for this. Every time we doodle it's an argument or a heated discussion to make sure that we have the number of classes that we have in doodler the utilities provides ways to merge classes so if you start with like 10 classes but eventually you're going to you think you want to make a model with just three or four. That's possible but it's not possible to go the other direction. So that's always something to keep in mind and there's some push pull associated with this of getting better results with if you're with a smaller class list or hoping to make a more better model. I just, we're just putting this up here to tell you that this is a persistent issue and one that we can give some guidance on but don't have an answer to hold you to. So I think that this I think we should just move on to the live demo right. This is the yeah. Yeah, so when Dan as Dan get it gets this on this should have been. Yeah, this is the live demo slide. So I'll stop sharing my screen. And then I just need to share the right screen and so I'm going to give a demo here. I'm using a little tablet, because I do a lot of I do a lot of doodling as you probably tell so I actually have this little tablet that I use. It does work fairly well for you know if you're using a mouse as well and obviously most of you probably using a tablet and a stylus this thing costs $300. And it works pretty well. So the first, and I've got a couple of images here there's, there's, I can't remember exactly I think it's 11 classes here but only a few of those classes actually have a present in the scene this is this first image I'm going to show you is just a situation where you just have one class. And all you do in this situation is you just kind of give it one example doodle right and then it will. It will complete saying right so this is the most simple example that you think of this is just water. And then what it's done there is it's completed the scene and now I'm this I'm on the second tab. This user ID up here this is purely optional this. This is just for your own accounting purposes, it will default to some string that you can ignore. And this is any relevant obviously if you need to keep track of who's doodling what, and it's especially important obviously then if you're using multi multi doodler context. Then you're looking at, I've just got a few images loaded here if you had many many like many more images you'll see a little scroll bar in here and you can kind of just go through and select one. And what it does is you know you've got a folder of images that you've already populated you put them inside your assets folder. It's just going to step through each of those it's going to present the images that you have left to doodle. What's going on in the background is it is just got this little timer thread that's basically checking on what images have been done and what images remain to be done. The images that get that have been done they get copied over to your labeled folder. And then the program just really it's just really really cloudy but it reads the two lists of folders and then it only presents you the images that are left to do. So here I'm in a situation where I've got a little bit more of a complicated scene. So I'm just going to go ahead and doodle it and I'll try to talk as much as I can as I do it. I tend to get the easy stuff done first so I can readily identify water. You'll see that I'm kind of, you know I'm that water here is distinguished from white water because I'm a coastal scientist I want to know where the waves are breaking. But there obviously you know water is a super class here of that I've, you know, I've got the subclass white water in there as well. And you'll see that I'm being very quick and rough about this. I just generally tend to be quick about this because, as I said before many, many, many aspects, you know, quality is important and quantities is just as important. So I'm already done with this section here. Now I'm just going to step over. You know everyone adopts their own style of doodling and you know I'm different every day so I've got a different styles to. And this is you know mostly sediment here and depending on your intended application, you know just really does dictate on how detailed you need to be. And that's also sediment over here. And I'm trying to give it as you know, I'm trying to go up to the boundaries as much as I can. Let's have a look here terrestrial vegetation. Yeah, let's I'm going to do the purposes of this class I've got to call terrestrial vegetation I'm going to just call these big bushes up here. And maybe this little bush here and then other background, I'm going to basically capture all of this partially vegetated kind of natural terrain here. But again this is just for the demonstration purposes. Here we go. And I'm going to make sure that I'm going to be fairly careful about what I'm doing in between these smaller classes here. This is really where the benefit of the stylist comes in, where you can really kind of give it the information that it might need. That's probably more than nothing then I've got development as a class which is going to be my roads and my buildings. So I'm kind of shaking a bit but Hopefully this is going to work out pretty well. I'm going to just scribble on these little buildings. You might take a little bit more time here. Just for the purposes of demonstration, I think this should be sufficient. Here's an interesting situation where you know I don't really know what to call it this is, this is a bunch of cars that are on the beach. So you know I'm going to use the unusual class here because this isn't necessarily something that I care about in my model but I don't I don't want to actually include it as something else that I might care about. Dan, can I cut in here and ask questions from the chat one is can you demonstrate how to delete the doodle, make a mistake and then delete it. Yeah, why don't I do that so here I've, I've, I've slipped. Whoops. And now I'm going to select that doodle. It's hard with a stylus. And then here it says erase active shape. So there you go. That's how you do that. And the other question is about doodling outside the box. Doodling outside the box is strongly encouraged. You know, it's all part of the way that this is rapidly done. So yeah, that's, that's totally fine. The algorithm won't care about that. And I just want to mention here two things like if you're watching this, I imagine you're like, well, you miss a spot or you miss label the certain area and I think that that's a very normal thing and that's why we talk about this and also to adjudicate different doodles to see if you're correctly labeling a place that bear that's bare ground and other people are as well. And also, we've talked about actually getting on the phone or getting on a call and doodling all at the same time so you have a very clear understanding of what these classes mean in the real world in the actual imagery that you're looking at, and it turns out that most people do not have the same understanding, even of these current class lists, even if you are a coastal scientist you might have a completely different feeling. Absolutely. So you'll see that that took a little bit of time because I had lots of doodles in lots of different classes across the entire scene. You'll definitely notice that the more classes that you have the longer it will take. I'm not this computer I'm running with here doesn't have this big amount of RAM or anything like that and it's not particularly fast so that's a reasonable example I think of how long you might take to expect to see that. Okay. I guess that's the, is, does anyone else want to see anything else like anyone else got any questions about other aspects of this before we start doing us as I'm happy to do another one, and we can use this one as Dan can you just show the directory structure to show exactly the classes that text file sits in that top level of the directory. Absolutely. Anyone else just stop just stop one. So many screens. Okay, so this is what you're looking at. You know that so that basically that the things that you need to know that your images go in the assets folder so these are the images that I put together for the purposes of this demonstration. So let's use jpegs only jpegs are supported. But it's usually quite easy to convert your images to jpegs with as I think Evan already mentioned that we're planning on doing a sprint in the spring and one of the things that I want to do is have support for every type of a common image so pngs and tips and things like that. This is your labeled folder so these are the ones that have already been done. It also keeps track of the images that you've done in this text file here which is basically that's just a list of those images that you've done. And this will get like this will be overwritten every time you launch a new session, your results folder your results go in here. It's just dated time time date stamps and it's the start of the session actually started this session three days ago and I haven't turned off my computer since then. There's a couple of different outputs that you see these really these pngs these are really just for your reference. These these aren't necessarily the things that you use. And what you end up using are these npc files so npc is is just a zip archive it's just a compressed zip archive that you can open using any, any utility that you have on your operating system that you would ordinarily use to open zips zip folders. So that could be like seven zip or in, I've got a boom to it so I can just open it up inside here you've got a list of arrays. And this is your image. This is the late the final label that it made in grayscale. So this is just your, these are just integers. Zero is the first class and then your last class is whatever, whatever number associated with your last class. The settings that you use. Oh I didn't really go into the settings maybe I could do that but the settings that you maybe maybe whilst we're doodling together we can go through the settings together I'm sure you've got questions about that. So the difference in image and original image is just that the doodler will take the image that you provide it and create what's called a standardized image. So it subtracts the mean divides out by standard deviation. And that just helps us just a generic thing that is useful for application machine learning models on any type of data, you want something that's kind of distributed the standardized. These are the doodles that you made. So one of the really cool things about do learn really something that we should really hammer home is that it's fully reproducible but you can go all the way back to the original scribbles that you made. And you can reconstruct them again in a different algorithm. There would be nothing stopping you from taking this file. You could run these doodles through a different algorithm of your own invention. And that may, and you know it also means that we can kind of go all the way back and we can see we can spot errors and things like that just by looking at the original doodles. It also provides an interesting opportunity for nerds like Evan and I who are interested in just this whole human computer interaction. Of just like why do people make the decisions that they make this and that's especially relevant where you have complex landforms where you might see model error and that you're trying to trace down and then those types of things so we're trying to be, we're trying to make this in such a way that, you know, we can go all the way back to the root cause of any problems that we might identify. Color doodles that's just the color scale that that we've got built in. That's just really for the purposes of visualization. And then the classes that you make, you know, you don't want to necessarily keep track of all of these classes dot text files that you make. You might have different projects that you work on it's difficult to keep track of so everything's basically included in here. If you go through several iterations of I wonder if I have an example years. Yeah, it's one. So if you go through several iterations of doodling like let's this in this image. I wasn't happy with the first time around. So I hit compute segmentation. I looked at it. I wasn't happy. I did some more doodling and I hit it again and I was happy and so it keeps track of the original things that you did. There's a little bit of duplication here but it keeps track of the settings that made that a first model and it keeps track of the settings that made the second one so there's plenty of opportunity to look at what settings might be optimal over a whole class set. Sorry about image set. And there's also plenty of opportunity to actually revert that if you wanted to you can revert back to the original label if you wanted to. So and they're just, they're just prepended with a zero and if you have multiple rounds of this then you'll see 00 or 0000 and they're basically every time you add a new one it just depends on you zero. So that's the MPC file. If you don't work in, if you're not a heavy user of Python and you're super comfortable with like making like using interacting with these this file format. And I can tell you that there are NPY and MPC readers from that lab and the C++ and C sharp or whatever you use a full train even I think there's, there's an NPY reader so that this is a fairly portable format. And in the easiest possible situation we provide the in the utilities folder, a script to sort of reconstitute the images and get overlays and get labels which we'll discuss after we do these doodling. Yeah, there's a number of utilities here that I'm excited to show you that like doodler is is this is kind of this is kind of the way that you interface your doodle out outputs with the rest of whatever you want to do be a gym or be it something else that you do so we'll spend a little bit time later on on that. And as I said before this is your classes dot text file this is just a simple text file. Nothing more to say there, you just list them in the order that you want them appear. I tend to make I tend to put the most common things at the top and the least common things at the bottom. And I think that's good practice. Yeah, that's pretty much it so at the moment we've got this environment but really it's just so this is your condo environment that you'll see if you read the read me you'll see that we actually now. This is just a, this is a fallback for situations where this the recipe that we provided on the read me if that doesn't work then you can fall back to this. There's a couple of issues with dependencies that we're going out of date so we, we had to provide that full back since in case anyone was running on an older version of Python, specifically Python 3.6 by 3.6 is though it's pretty old now so we don't recommend using it. The current condo environment is Python 3.8. And the code then it, then, you know, most of the code is, well it's all really in this app.py. And then there's this other utility there's this other pip installable set of functions that we can talk about later on as well. Well it's almost the hour so I think we should get on with doodling a Evan. That sounds great. I'm sure you're all burning with. Well not all of you but I'm sure there's many questions that you have and let's just let's I'm going to hand over to Evan now so he can introduce the collaborative doodling part. And then as we kind of get to grips with the, with the doodler tool together. I'm Evan and I are happy to take your questions that you might have. Yeah, so I think that there were a lot of great questions that I saw that I didn't get to or didn't get to ask I just I think that this would be a great time to do that. Just live or Dan you and I can scroll through the chat or if people just want to ask the questions I think they'd be really valuable for people to hear what they are because there's some great ones. But this is what we sort of intend for everybody to do so get doodler working on your machine. If doodlers working on your machine, put the images that were either your own images or ones we have from the Google Drive in the assets folder. You can modify the classes dot text file to what you want if you have your own images, or I think there's a classes dot text in every one of the Google Drive folders, fire up doodler and doodle some images, and you'll see how it goes there's some great questions in the chat and if you have any others, we'd be happy to answer them and I think we can do this for 15 or 20 minutes and it'll work out great. I think that we're very interested if you doodle the images that are in the Google Drive if you can put everything from your results folder back into the Google Drive. And hopefully we'll be able to make it will contact everybody about making a full Zenodo data release with all the contributors as co authors. So that's the benefit of doing those you can do them now or you can finish them up later on, but that's where I'll leave it and I'll stop sharing my screen. It'll be fun to actually get an outcome out of this that was just beyond understanding and if you can always opt out of the Zenodo release if you don't want to be part of it, but I saw it. I guess, while we're guessing this point everyone's just kind of getting to launching the program and so maybe I'll take a bit of a couple of minutes to answer some of the questions that I've already seen. Yeah, I think that that's great. I think that's what we should do for sure. Okay, so the first question I want to address is from a later she says, do you have tools to automatically split images into smaller sub images. There's one tool in the in just one basic tool in the utility script that allow you to resize an image that you know that's something that you could probably do with any image manipulation tool that you might have like game for Photoshop or whatever and that's just a programmatic way that you might be able to that you could do it very simple. I personally have lots of different codes that I'm willing to contribute to the zoo repository I think or to maybe as an extra utility to the gym repository those specifically would work with geospatial images I tend to work with things like geotips. So I tend to use this thing called G dial, which you may have heard of geospatial data abstraction library. That has a tool that's called G dial retail, and I use that basically exclusively to to kind of chop my large images up into smaller ones. And as Evan said, you know he uses image magic image magic is another is cross platform tool. You can do it's a Swiss Army knife of image manipulation and you can do all sorts of stuff including chopping up images into into smaller bits and with overlap and things like that. Julie's question I think we covered with the racing. It's a little clunky to do the racing but it generally works fairly well. Just about scoring the softmax. I don't think we store the softmax and PC. You know, I actually I thought during the middle of the night, which is interesting. Yes, that's, that's a great idea, Josh. We don't store the softmax and so we should. That's a great idea. So if anyone doesn't know what the softmax is that's just the raw outputs from the from the machine learning model so that's the. The probabilities. They're not true probabilities and their condition probabilities but. Yes, we should do that. Good call. Evan says is the classification pixel based or is object oriented. It's definitely pixel based. So if you wanted to turn this into an object oriented classification then you would undergo at least one for the step. Typically, you would just kind of take the pixels and then convert them into vectors, right, like objects, which are essentially polygons of the continuous regions that you may. That can be done in Matlab. It can be done in Python. There's various tools that allow you to do that. If you are a Python user I would recommend using psyche image. It has a function called region props. Actually, it's the same function in Matlab, I believe region props. So that would be what I'd recommend unless anyone else knows if something better. The other, the other step that usually goes in that is a little bit of smoothing like objects are typically smooth. And not necessarily at the pixel levels so one thing that you may want to do as an intermediate step would be to actually pass a filter spatial filter high sorry low pass filter over the image so you get, you remove some of the, some of the kind of sort of noise and things like that. Unclassified areas we tend to provide a separate class and so you'll see in the, in there's a couple of different ones that I had in my demo that you saw one was no data. And then the other one was unknown. And they're useful kind of probability sinks if you don't know what you're looking at. So the other thing I should mention is that if your images. If they contain black pixels I 0 0 0. Then that will get propagated into the label so if there's completely black pixels then you obviously have no idea what that is so we decided that the best thing to do that instead of interpolate over that space, which could be large or it could be We decided to just propagate those zeros through to the end result so that's just a null class. Oh, how am I doing Evan. I think you're doing great. Okay. So that's an interesting question from Nolan so like, you know, yeah that's the one I was pondering close to interpret that as like how close do you need to be to a transition. But it's also how accurate. Do you need to be when you're labeling. Yeah, exactly that's the I think that there's actually two questions right there embedded in one. In terms of the accuracy I mean there's a setting still on the side that says the percent of the how much the algorithm trust the doodles which I think it's pegged at 90% Yeah. Yeah, that's one thing you modify. It's just it really does bend on the application so like, yeah, long skinny things like roads that just generally difficult for doodler. However, if you're careful enough then I've seen many many examples where it's, it's pretty good. So it's just about kind of developing it's part of the art of doodler I would say it's kind of developing an intuition for how close how much how many and how accurate your doodles need to be. So get around to it, we can do a book based on it. I find that also to be application specific like images oblique aerial images, the intuitions that I develop on those don't often transfer to like landsat scenes or something like that. So, unfortunately, there's a little bit of trial and error to understand how the course and so the pixels is going to interplay with the output of that to step machine learning process. But it's no no limit that if that doesn't answer your question. At a Yama. You're referring to the number of pixel points to be picked so so I think what you're referring to there is how much do you need to do the same. Is that right. If so, then, I'm going to try is sometimes it depends on how, how complex is like if you only have two glasses for example like that first time that movie that I showed that was just a beach and the water you can generally get away with doing much less. But if you have kind of a much more complex scene where there's lots of stuff going on like the example that I did, then generally you need to do a few more but again it's part of the art of this and you do need to develop a little bit of an intuition for it. That's the non deterministic part of doodling. You know, when we say it's fully reproducible we just mean it's like it's deterministic in the sense that we can take the doodles, and we can recreate the scene that you can see from the doodles that you've made, because it's the same algorithm. However, if you were to slightly change the doodles then you'll see that you get slightly different results and so it's really it's, it's part of the process. Hopefully that answers your question. I'm always surprised at how little I need to doodle and how if I focus on the edges between classes so two lines drawn next to each other different classes. Does a superb job at reducing the number of doodles I need and time I need to spend. So I'll just say that and also we've noticed in the past that if you iteratively add classes, if you're just like, if you draw water and draw if it's four classes and you draw two of the classes in. You should not hit compute segmentation just to see how you're doing you should do all of the classes that you see in an image, and then compute the segmentation so instead of just adding classes one by one and seeing how you're doing you should do all of them at once. It's just another common issue. Yeah, you know I'd say a lot of these answers that we're giving as well they're really just kind of dependent on our personal experience of it. I've kind of gone back and forth over some of the decisions that we kind of encapsulate in the software just because they work. As I said before they kind of work generally across the board. But if you find that, you know, you could make tweaks to your specific needs, then we encourage you to do so and that's that basically it's the whole point of providing this open source tool, and we'd be happy to provide guidance over it to. If you make an issue or a discussion with it. Um, I saw did I think Sharon answered. See John chose question. So maybe I'll skip that one Julie when you doodle in the white space around the image. Did I answer that question already. Yeah. Yeah, I just wanted to mention that I tend to user ID, it's initials or something longer. It doesn't matter for what you put in just ask people, especially if you're organizing a large team, ask them to be consistent. So I think that you can easily use like a command line to tool to relabel anything or anonymize everything. Anonymize all the labels. I'll say that when Dan and I have done doodle releases releases of images and labels in the past we tend to include everybody who wants to be an author who's late whose labeled images are allowed to be an author if they want to Yeah, that's the best practice for if you get people to do your work to include them as authors on these data releases and Zanotto, but we also try have both tried to anonymize the labels so that everybody you can't just if there's an error that somebody made it can't be blamed as me going forward into the future. So it's helpful to have those people be very consistent with their IDs so that you can replace them with label or one label or two in the future. Yeah, and I would add that if you're going to go if you're going to embark upon a multi label exercise like this then you should. It'd be it'd be informative to look at the upstream paper bits up on archives, sorry, archive, because we kind of talk about like, you know how we decided upon how many people to you like what imagery to use and how many people and how to do it. The doodler paper itself has a data release that's on Zanotto that is anonymized like the same way as co strain. You can download either of those two Evans got a couple of different Zanotto releases for the similar you know he published one this week actually that's over 1000 or 1200 images of, you know, and it's been contributed to by like, I don't want 10 or more people. Yeah, 12. So we have, you know, so we have some experience with that so just getting concept us if you want some advice. Julie yes please just upload everything from your results file back to the Google Drive folder. I think I'm going to address a layers question about. So, yes, the general general pipeline is to subset images into smaller doodle them, and then you would train a model on that scale of imagery. We're working on workflows for a downstream application called safety map. This is something that share if it's Patrick and Venus is a Venus who is helping us with. It's not quite ready yet but that that's going to provide a set of codes for kind of stitching your labels back together if so if you are in a situation like this where you have a larger satellite image, then your gym model is going to predict that smaller subsequent script to basically then stitch them together. Those workflows I've worked out and I just need to find some time to get them up on to onto safety map so watch this space and give me a nod if you don't think I'm being quick enough. The, the other question though is that that's an interesting ones like yeah, what we can only visualize in three bands as you know and we tend to use RGB images. But that's, it's not limited to that you could use false kind of images if you find it more informative. We've gone through some exercises with that like obviously if you're going to use like, you know, NDWI or NDVI or something like that then that then it's going to bring out the the water and vegetation respectively. So yeah, encourage doing that. The noodles that you make and the label that gets generated from that would obviously apply to all of the subsequent coins, all of the coincidence bands that you have so you could, you could train your model, your gym model you could train it on the image on the three bands that you happen to use for the purposes of doodling, or if you have more bands then you could use all of the bands or you could select the bands that you like. And in the, in the paper that describes the gym toolbox that Evan and I wrote earlier this year. But we'll get to next week and we'll get, we'll get to this next week is, is you can use any number of bands, we use five bands for the purposes of satellite image segmentation because we wanted to include the near infrared and the shortwave infrared. The noodles that you make could be repurposed like you could take the doodles and you could run them through the machine learning to get and apply them to a different band combination if you wanted to, but that you're always limited to three bands and that in that scope. How many more minutes should we, I mean, it's quarter after we probably will need about 15 minutes for the utilities but yeah I think that that's fine to just whenever you're ready I think there are a few more questions. I'm not answering some direct messages but I think a few more came in. Cool. You don't have to read restart each new image recommend restarting it just when you're done. I mean like, kind of pick off a manageable amount of images to doodle and that's usually, you know, how much time you have and each session will just pick up from the last one you can. One of the things that you probably noticed about if you've stopped to do loads that you have to use control C, because I'm not a software developer I'm a geoscientist, I couldn't figure out how to put an exit button in there. And if you can figure it out then please contribute that back but you just use control C to stop the program and you can just relaunch it again and you're basically where you where you ended up as long as you haven't moved the images out of the assets and the label folders. Yeah, they're sticking around. Yeah, so that's more. So, that's a question that I'm trying to catch up here but that's a question that Sharon already answered about this on occasion you'll see that the doodles stick around. That's not something that we've been able to figure out it's not necessarily something in the codes that we wrote it's more in the way that the codes are kind of translated into what the browser understands so it's maybe more of a job JavaScript or a caching problem. I noticed that that's essentially and primarily why we are recommended fire foxes because it tends to not be so much of a problem if if you're using Firefox. But, yeah, it's something that we need to, we need to try to get a handle on. And again, please contribute answers if you have them. So, if you have code images. No, but the, this second map thing that I previously talked about will be able to stitch them back together. So you'll, if you have geospatial input so you had, you can have geospatial outputs. We have done to learn with time lapse photos. Yeah, it's worked quite well. Yeah. And it's like you can actually, if you get a grip sort of the what the kind of functions that doodlers using under the hood. You can actually take your doodles and apply them to a time series of images like you can do the one scene, and then apply them to a whole bunch of subsequent scenes as long as you think that your doodles would remain relevant to them. That's something that we've done and it's, we actually used to have a utility to do that. But I got rid of it because I didn't think it was generally useful, but maybe I'll put it back in. Herb and you permission error. I think it's solved. Was that solved. Okay, great. You can do a browser refresh that that basically solves anything with just browser refresh. That's the same thing if you if your doodles are sticking around then you kind of, if you're just stuck there, then you have to do a browser refresh. One of the annoying things about doodler is that because of this timer that goes on in the background, I think because of the timer that goes on in the background, it can sometimes take a minute for it to refresh that list. So you have to select an image, and then you notice that you go back to the other tab quickly and you notice that either the doodles have persisted or that the image hasn't changed. You just have to. I just tend to select it twice. But Evan, Evan says that he, he selects it and then waits a couple of seconds. So it's just one of the quirks of using research software not can software thing. If, if all else fails to a browser refresh and the downside with that really is that you have to put your initials back in and to your ID if that's your ID if that's what you're doing. It's their examples of doodles that can help. Oh yeah, that's a good call. I did throw some examples up on the Google Drive Evan but I don't think this or I didn't think that was an internal folder that I kept for you and I. Yeah. What should we do. I can move them over. I'll move them over right now. Yeah, we've got some examples Lindsay that we can show you and you can have a look at them. I think the other thing I want to mention is that you can ping us with questions. On GitHub would be the best place to do that, even after, you know, well after the class like Dan and I received a lot of emails about these things that were these tools that were very happy to answer. You know, if it's a specific software issue that can go in the issues tab. Just a discussion that you want to talk about something that can go in discussions. You just make a new one and feel free to tag us. And we're very happy to talk about it obviously it's a cool tool that we're very excited if people are using so it gets us jazzed. Yeah. I also want to make a little plug for this other thing that we're making it's called hollow dude and it's basically we when we decided to do do we know I was pretty green when it came to writing web applications like this. So I decided to use plotly and dash because it was something that had a short small learning curve for me. You know, we've identified these kind of these little issues with it, which might just be the kind of. It's easier to use a different application API to actually create the GUI. So we had Anaconda have a look at this and they came up with a little demo of an alternative tool that uses their hollow views platform. You can just Google that hollow views. It's a it's just a something that's like plotly but it's slightly different and it allows you it's mostly used for making like interactive dashboards and data viewers and things like that. But we have this other things called hollow dude and it's it's working but it's kind of still under a little bit of development and a lot of these issues that we're looking at here like the doodles persisting for example they don't seem to be so much for problem in hollow dude. Watch this space and you can always watch the repository on GitHub if you want to get updates or submit, you know, use it and contribute to it. You know, that's another so we're kind of pursuing that route as well we're seeing whether we can make a slightly better version of doodler using a different platform. And you'll also notice if you've if you've been to the doodle verse GitHub page you'll notice that there's a whole bunch of different repositories on there that outside of the ones that we've spoken about. You know you're familiar now with dash doodler and we've introduced Jim and zoo but you'll see that hollow dude was on there as well. And then this is the other this safety map which I've talked about which right now doesn't have any code in it but we're getting there. But then you'll also see that there's this thing called doodler engine and so I just want to spend maybe a second talking about that while we're maybe wrapping up the doodling. So the doodler engine is basically they're the codes that actually do the machine learning of this. So, the way that it's, we've set it up because we, we've got these two downstream applications dash doodler and hollow doodler the way that we set it up is that we've had we have this pip installable set of piping codes that really is just a set of functions that both tools use. And they're just for for turning your doodles into labels. So this is where you would need to go if you wanted to modify the way that doodler behaves like the actual machine learning part, or if you wanted to actually start helping us develop these tools if you're interested in that then then you should become familiar with that as well and that these are the codes that hollow doodler uses to. You save your image just it gets saved automatically so you should notice that when you hit the compute show segmentation and it, and it completes the segmentation. If you uncheck that box, you should see that it just appears in your folder and you know that it's there because a new NPC file has appeared and a new PNG file has appeared. So arenas says, you mentioned that the tools help you pre process with TensorFlow. Yes, we're going to talk a lot more about that next week. That's a that's basically the first step in gym is that taking images of sorry folders of images and labels and then converting them into a format. So that TensorFlow understands, we're going to talk quite a lot about that next week so I really I said you're signed up for next week. Then, maybe it's best that we talk about that then if that's okay. If you're not signed up. Then. Oh, here you are. Yeah, I'll miss it next week but I'll catch the recording. Okay, great. That's like something going on that. I can't make it this time that big but yeah. Yeah, I was curious what your experiences were with that so like maybe I'll leave the question for them. Well I can, I can answer briefly I mean it like basically it's not too much of a challenge to get the images and labels into tensors, and that's what TensorFlow needs. However, there's a couple of different formats that TensorFlow understands. One is, you know, the kind of canonical one is called TF records, and we initially adopted TF records but they're quite difficult to you have the right. They're quite difficult to access if you're trying to debug stuff or if you're trying to use them in multiple contexts or if you're trying to, you know, transfer them over to another machine learning library like psychic learn or pie torch or whatever. It's nice to using TensorFlow records my opinion so we adopted this, we adopted basically the same format for Jim as we have adopted for do go which is this NPC format, and it works really well I mean you can, it works well in that you can, you can set up a TensorFlow pipeline that efficiently throughput your data onto a GPU or multiple GPUs for the purposes of model training, but it's also this really accessible format that you can just write a really simple Python script to just like access that data. So that's basically how we do it. What I want to say is that that's set up in a way that if you're monitoring with like top your CPU utilization and then Nvidia SMI to look at your GPU utilization, the pipeline is set up so that your GPU utilization should be super duper high. And that very quickly is giving information to tensor to the GPU for TensorFlow to operate, which is was a challenging to get working but is really great your GPU stays very hot for the weekend. I'm just learning about it now and I was like, I know like you guys were like already like 15,000 steps ahead so I'm. Gaging like how your experience was with like passing it from this stage into that and then I don't know like I think there seems like in the CSD mesh community there could be like a huge like step forward if we also start doing this with like model imagery. And then I don't know like some of the stuff that I'm seeing here. And that was that, but they were using TensorFlow for that too so it's a it's more like a future forward looking idea then that it's like something that I concretely do now. It's amazing to see more models access the GPU. That would be awesome. Okay. Yeah, like, check out this, this new glacier model it's called IGM, and it's does that it's called I instructed glacier model. And I know there's a inspiration ocean climate model that uses. Yeah, that uses not numpy but jacks numpy, which can operate super fast on GPU so it which just requires an extra letter j instead of NP that whatever it's J. Right. And that's like how the TensorFlow also seems to is like this, a lot of TFs in front of everything and like there you go. Anyway, I'll make sure that I'll catch the recording but I know that I can't make it next week but I super appreciate that you guys have like late this out for us today. Awesome. Thank you. Yeah, it's a new dawn when it comes to using machine learning in numerical modeling and I'm super excited about what I'm seeing. But yeah, we'll talk much more about that next week and you can catch the recording. So we are. Yeah, we should probably switch to utils now thanks Sharon for doing such a great job of answering questions to on the chat. Um, should we switch then. Yes, how's everyone feeling about their doodles just real briefly, you can, you can tell us in the video or you can tell us in the chat. It's fun. Awesome. And as we've said a couple of times, you know, for those who are having a little technical issues and hopefully, you know, you can get through them and and catch up later on and but for now we're going to switch to have permission denied. Maybe you're on a, are you on a. Did you install your Python with administrative privileges or something like that maybe that's something to look into. But for now I think we're going to switch to the utils so we can actually finish on time. I think I have a called up if you want me to just at least run gen images. Sure. Yeah, go for it. Actually, no, no, don't because I want to there was a couple of things I want to do. I want to run gen images and then I want to show another tool. Yeah, go for it. If you don't mind that. No, no, not at all. I'll take that on. I just have to figure out desktop. It's like the hardest thing to do. As we said, earlier on the details we have a whole bunch of different utilities here we're not going to be able to go through all of them but we want to go through the ones that are most common and especially the ones that really going to interface this class with the next class for those, for those of you who are going to join that next class. And that one. So here's my window and I just got another window open here. So once you have a set of results like I have here, then there's a couple of different utilities that you might find useful. The first one is Jen, Jen images and labels and all that's going to do is it's going to take your MPZ files that you just made. So you just point it, it's going to bring up this dialogue and you can point it to this folder, this top level folder you wouldn't necessarily have this one here that was just me doing some prep. So yeah, okay there. And then what it's doing right now is it's just stepping through each of each one of my MPZ files and it's just generating a set of images, they set the labels and a set of overlays and that's really, that's the first port call really is to it's kind of to see how good your, your images were in some sense and then it's also your second protocol if you're going to then use that very subsequent process. So, you know, it's just a, it's an easy way to get the information out of this in formats that you might understand so this, these are actually just the images that I already made but the so that I already labeled, but the only difference is now that I've got an ID to them right so this is useful for that multi label context if you're trying to keep track of multiple people doing the same imagery. And then these are labels and don't be scared that they appear black it's because I only have 11 classes and so my, my numbers are distributed between zero and 10 right so this is a full range eight bit image so it accepts zero to 255. It's not a photograph that is got a high dynamic range it's, it's just a low dynamic range so, so the information is in there, and the way that you know it's in there is because you can see these, these overlay images. Right so that was that one that was all water is that one that I did in the in the demo. And then here's a couple that I did earlier and you can see, you know, that's, that's kind of me to kind of rushing rushing through a few of these days. You can see that it's worked. And those are still usable even with, if you notice that sometimes the developed class blurs between houses like that's totally okay in downstream tasks will perform super well even with imagery like this. Yeah, this is kind of pointing to what I was saying earlier on about like you know, I could have done a slightly better job here, but you know, you know, I just did this as a one shot. I could have done it and you know, even if I hadn't I don't care because I know that the model that I'm going to train, because I can, I can make these, I can make these labels 10 times quicker than I can any other software so I can make 10 times as many labels for the process of training my deep learning model next week will actually actually do a model next week and you will see this live that we have noisy data. This is the data that we're planning on actually taking all of the images that you do today we can and making a model out of them and so we're going to use everyone's a different opinions about what things look like. And I think I'm fairly confident that we'll get a really good model in the end of it so we'll see we'll see we'll put our money where our mouth is next week. Let's go to the doodle so you know depending on the pin width that you adopted and the image size. These can be a little hard to see sometimes we're using small images and and just what the default and sizes which is three. And you can just kind of verify that you do things correctly in that way. Okay, so that was the first utility and the second utility that I wanted to show. This is called label generation so on occasion you might see like errors. In the data so this is a way to troubleshoot but it's also a way to pull out just the mission the, as you remember, there's a two step process in Doodler there's the NLP and then there's a CRF, and they produce slightly different designs and you may decide that you're free. What gets stored is the CRF version, but you may decide that the RMLP version is actually better and this is a way for you to get at that. But it's also a way for you to do any troubleshooting if you need to. And so what this basically does is that it's going to step through each of these NPC files, and it's going to generate. Basically step through all of the things that happened when when you saw the blue box on your screen. Right, it's going to just take the doodles again it's going to run them through the algorithms again, and it's going to generate all of those outputs. So the first thing it does is just that's the, that's the difference between a normal image and a standard image you won't know it's sometimes you won't notice the difference. It's not in order here. These are the doodles so it's once again that's another way for you to verify the ways that you do doled, but then more importantly then you can kind of these are the features that generated so you'll see that one of the classifier settings is the number of scales. That's something that we brushed over, but if you know so the number of scales is basically just the number of scales over which it extracts features. Well, it's only two because generally you only need two scales like a really fine scale and a really large scale. And then, but you can add more intermediate scales if you think that that's going to help your classification. It often does help the classification but be warned that it does slow things down as well. And this is kind of just a way for you to then just look at what features is actually using these are the features that gets passed to the to the MLP. And you can see so that's if you refer to the paper that you'll have more complete understanding of this but this is the spatial scale is actually using a relative coordinate system because we're in a we're usually as geoscientists were usually in a situation where there are more complex supplies and you know the things that are closer are more similar to one another. And so it uses the relative location uses the image intensity or Gaussian blur of that intensity uses edges and other things that are extracted from the image that we don't need to go into but that's what it does and that's what you can see. And then it just, once it's done it pushes them into this folder which is basically just going to show you that. So if I just click on, you know, one of one of these MLP and CRS you'll notice that these are almost identical as any a couple of pixels that are actually different there. But it will create the MLP and the and the and the CRF versions of your labels so you can kind of use them going forward. And then the last utility I wanted to show was actually the same, which is, I think it's pretty cool but I don't know, I don't have generally useful is it doesn't necessarily tie in with the subsequent work, but I wanted to do it anyway because there's a different way that you could use these outputs. These, as we said before, we kind of because we're classifying at the smallest possible scale. We have an opportunity to kind of shop up the images into small tiles that show you what, sorry that are examples of each of the classes that you've done. So this make classified tiles from images and labels tool is a way to generate small tiles of a certain size. So governed by the tea parameter is so I'm going to make these 96 by 96 pixel tiles that include a certain class where the proportion of that class is is more than 90% of the image. So what this is going to do, I'm going to provide it the, going to go into here and provide it the label files that I just generated which is using the gen. Gen images script. And then I'm going to provide it the images as well. And what it's going to do is it's going to step through each of those. And then I'll provide it the classes.txt. What it's going to do is it's just going to go through each of these and it's going to output inside my labels folder. It's going to output this folder called tile 96. So these are these are just example tiles of these things that I generated. And then you could use them subsequently, you know, in a in a subsequent application so this is, this is my terrestrial vegetation for example and so this is, this is then into this is for interfacing with a completely different type of classification problem, we are trying to classify a whole image. And so the is the situation where you want to classify, not every pixel level but kind of a child level. The other utilities that we have we're not going to have time to go into there's the battery sizing one. You'll see if you're if you look at the coast train paper example the we have the ability to, to remap classes so this is a situation where let's say you had like a dozen classes but you want to kind of make super classes out of them like I you want to take all your classes and kind of condense them into just a smaller set of classes. In the example that I showed for example I could combine water and white water together for us for a generic water class. I could combine my, my three vegetation classes together. And that's how you do it and there's an example of how you might do that. And that's all we have time for in the utilities front because we're going to, we're going to have to start wrapping things up. But if you have any questions about that maybe have to answer them. Go back to the agenda here. So the next thing we're going to do. I think that's really it if people want to continue doodling or use the utils or have asked any other questions to ask us about what's going on. And then we can. That's sort of what the rest of the time is used for just continuing to get familiarity with some of these tools. And then I'll pull up the last slide that we wanted to at least leave with. Is that okay Dan. Yeah, that's fine. Yeah, maybe I brushed over that a little quickly. I think we had a little bit of time I guess blocked out for everyone to use the utility said they had questions. I mean, the main one that you're going to use is that is going to be that gen images and labels so that's the one I recommend playing with. That just then creates the folders that you would then use. If you're interested in like if you can't wait to use Jim and you want anyone to use over the next week then feel free to like you know clone it and kind of go over it will be an issue in instructions like we did this week. But maybe we'll do a little earlier, we'll kind of provide those instructions maybe just after this class or very, very latest at, you know, very beginning of next week so you'll have a couple of days to kind of go through that. And if you get familiar with the docs and things if you're going to, if you think you're going to become a serious user of these tools then, you know, be a good idea to come prepared for questions. And you'll have an opportunity to use those utils for real, maybe you'll do a few more images maybe you'll do a few of your own images, and figure out some of the kinks you know and, and you'll be able to communicate some of your experiences over back to us like what worked what didn't work. And so to arrive at classes, you know how difficult it was to get the problematic imagery out to get that, you know, to get that worked out and those kinds of things these are all things that we're that we're familiar with but but you know where we would always always always value your feedback on any of this stuff. Yeah, this slide is just sort of the wrap up and the plan for the next class which is just everybody continuing to doodle if they dry out doodler and put results on the Google Drive folder. And you know if you want to opt in or opt out of the Zenodo data release that we'll plan and prepare. And then next week we'll go over Jim, hopefully using these enough of these images from doodler or if not we have zillions of redoodled images to make models from. And we'd love for everybody to participate and help and join the effort on GitHub. And we'll kind of say again that we're going to we're definitely planning on having a sprint in the spring or just maybe perhaps in March or sometime like that where we're going to kind of devote that month to kind of probably working on a few of the issues that have been identified working on a few of the issues that you already know about. And that would go for doodler and Jim, you know because they work, they work together. And if you want to be involved in that then you can just get involved by contacting us on GitHub, you know we don't, we don't do this full time obviously we have, we have, we have our research that we do too. And so, you know, if you're frustrated with the pace of progress then you can always hop in and help us. And that's it from my end of stuff. I can open the chat again. Yes, so some people had in probably figured this out ready to some people had images that were four classes and some people had images that were others. If you want more imagery then we've got more imagery if you want to do it. The better the model next week is going to be. So if you feel motivated to doodle more than we were happy to give you more images, I can generate images all day. Some of the satellite scenes is a little boring, you know I find them quite boring because they're very samey like you might do the same thing and so we got other sites that we can give you. And then the main imagery that we gave you. For those who've not see, for those who are working on satellite imagery, some of the participants had made images, which are kind of one meter pixel size and a little bit more complicated that's, that's the one thing example that I did. And then the rest of the classes satellite imagery and if you want to do the other images or if you have your own images and this is kind of, this is the time to do that and we'll stick around for the next 15 minutes in case you have more questions.