 We are a new bias academy. We started in 2020 in April, like with the start of the pandemic, organizing webinars and you can see we are quite a huge crowd of people involved in this. And for now we had 25 webinars that you can find the recordings of online on YouTube channel. So please check this out. And you can see that we got, we both collected quite many registrations and many views. So we are very happy about this response to all the webinars. And we are now in the big data series. So we are, and we had already like three very exciting and interesting talks about big data. And we are today at the fourth part of the series. We will be talking about visualization and sharing an annotation in the cloud. And the entire series have been mainly organized by Romain Gouillet at EPFL, Marion Louveau at the Institute Pasteur and Julian Colombelli in the Bastantossi. And also, like also today with us and also may central is Dr. D'Antuono and he is also very kindly hosting the session on Zoom. So for today's session we have several speakers. So we have the first part of the session will be about a catmate by Tom Kazimius and he is a consultant in software solutions but for now, but he is still very involved in all the projects that he had been scientifically involved before and we are happy that he is talking about catmate today. And together with him, Chris Barnes and Albert Kaduna. So we are really welcome you and they will have also to answer all your questions which is really great. And then the second part is about Moby and here we have Christian Tischer, Konstantin Papu and Kim Belly-Meekin and they will all three talk and all three answer your questions. And then from new bias Academy part it's Rocco D'Antuono again, thanks for hosting the webinar also and Julian Colombelli and me. And with this, I give the floor to Tom I should stop sharing. And again, please ask many questions in the Q&A window. Hello everybody and welcome to the first talk of today's new bias Academy session on data in the cloud. There has been sort of a last minute change today. Albert won't be presenting with me but he'll be there to answer questions. So it will be only me talking about Kadmate which is a collaborative web-based tool for online image browsing, image annotation and with sort of specialized tools for neural reconstruction, proof reading and circuit analysis. So there is a clear neuroscience focus in this tool but it's got its broader uses of course as well. As an open source project it's been used around the world and quite a few different labs with many different data sets and data modalities but even after a 10 year history already of the software series section transmission electron microscopy is still the most prominent data modality used. A little bit of background. Albert is a group leader for experimental and comparative connectomics at the MRC, LMB and Cambridge in the UK. And he's been working and contributing to the connectomics field for many years now. And before he was in Cambridge he was at Janelia Research Campus in Ashburn, Virginia in the United States for many years. And that's also where I worked with Albert for many years and became the main developer of Kadmate. And since last year I'm back in Germany as an open source research software engineer at my own consultancy, KASMOS. And before I get into Kadmate I'd like to offer a few thoughts on big data in general and the types of data that we deal with in the context of Kadmate and our research. And after that I'll talk about given overview on Kadmate and try to explain general ideas in this software and the little cat I can you see there is Kadmate's logo and mascot Scali. And after that, after I provided sort of a big picture overview I go to a live training instance, a tracing instance that we will explore some features a little bit more. We will look at account creation, basic project setup and some general light microscopy data, handling options and spend most of the time with respect to tracing tools. Okay, so maybe as some motivational thoughts of data in the clouds or cloud or the data that we typically deal with when I read data in the cloud what I typically parse is as or understand it as is basically some online resource. There is this saying that data in the cloud is just or the cloud is just someone else's computer and that's ultimately true when we talk about data in the cloud we talk about data that is made available through remote services at least from a client perspective and this remote service can be a single computer but it could also be of course the whole array of computers like Google cloud storage or Amazon S3 and so on. But in the simplest case it's still something you can host on a server and manage yourself. Ultimately the data sizes that we talk about in this context here is usually multiple terabytes and some instances are also more but typically this is the range of data that we work with but this already makes it hard to copy it as sort of traditional, like traditional image files since copying a multiple terabyte data set consisting of many files takes some time and storage and planning to therefore a setup of these cloud services and cloud service providers is usually more involved also with respect to permissions that have to be thought through but ultimately it offers a lot of gains also for people and users with lower spec devices but you can suddenly access data in a more often optimized fashion that you can access at randomly also with small chunks just out of this data set instead of a downloading much more. Like I said already initially we mainly deal with zero section transmission electron microscopy data so I found this old slide again an old publication which explains this very nicely. So just as a quick refresher on what this is we basically usually slice up data sets a specimen in individual slices of typically 40 to 50 nanometers thickness and then reimage or image on the plane with much higher resolution resulting in these gigantic image stacks where of course things can go wrong sections are lost sometimes you have artifacts and so any software that has to deal with data with this type of data on generally data has to also be aware of problems that might come with this data like missing sections for instance but not to elaborate us too much just as a sort of reference the data sets that I've been working with mostly over the past years has been for one Albert's Trosophila L1 data set which is roughly a terabyte in JPEG tiles so some pre-processing is required to make this available and also over the last years I've worked a lot with Davy Box 5B data set which is stored as compressed JPEG tiles about 11 terabytes and these data sets are already large enough to make it complicated to not have them accessible online and having everyone to copy it to other machines it's just not feasible then in later in last years other technologies like FIPSM become more popular and the date which increased the resolution in terms of zero resolution much more and so a lot of data is now also available as chunked data where each block in this example for instance of the Jamia Hemi brain data set consists of 64 by 64 by 64 voxels these are typical data sets that Cadmate can load and talk to and represent sort of the realm of data that we typically work with so now speaking more about Cadmate as most of you or some of you might know already Cadmate is an acronym for the collaborative annotation toolkit for massive amounts of image data and it's fitting quite well into the name of this and it will be a session here therefore and given the size of the data that we work with it seemed very useful to us initially to have a tool that allows collaborative access in a way so that not everybody needs to work in their own environment and it's hard to merge data so Cadmate provides an interface to allow multiple people to work on the same data and since probably quite a few of you haven't never heard of Cadmate or used it so to put a picture in your mind of what a typical Cadmate user interface looks like I use this as an example here maybe generally you should know Cadmate is a web application so you would normally open it up in a browser window and connect to a website the Cadmate server and then depending on the available data you could dive in the data and in the FFB universe for instance the adult fly brain data set a common workplace might look like this where you see individual widgets and tools in Cadmate that are all interconnected in this screenshot we see for instance on the left hand side EM data zero section EM data with magenta dots on it representing individual nodes of neurons that have been manually placed and traced and these neurons then can be visualized for instance in 3D like it is visible in the second column along with synapses and some brain compartments and on the next column again on the third column you see a graph representation of the local network and some other connectivity related tools don't worry so much about the details of what exactly is going on here the takeaway is more that Cadmate consists of many individual tools that are interconnected and allow you as a user to send data between them so it's easy to send this collection of neurons that is visible in 3D to a graph widget and quickly inspect a graph representation of this data we will see more of this later on in the demo another typical use case for us has been in the last year to look for similarities between neurons for this we use Greg Jeffries and Marta Costa's NBLAS algorithm and we put a front end basically to it to allow users to more easily navigate tools and use it and it allows us for instance to search for similar neurons in the brain and also transform neurons of similar shape so that we could for instance look for contralateral homologues on the other side of the brain and all this is possible with Cadmate so this is more of an example of what you can do with it so let's maybe take a step back and think of how generally Cadmate projects are constructed or made of to explain a little bit better of how these things are structured generally projects in Cadmates are the spaces where users can create data and so everything from notes that they create from annotations or text text or listen to a project image data is mapped into those projects and you can map many different images into the same project and maybe sometimes you have for instance a higher resolution re-imaging of a data set that you want to add to your project this is all possible in Cadmate so you can have many stacks and overlay them on each other we will see that they see this later too image data generally can have mirrors because we found in a collaborative environment that potentially has users all around the world it is much easier for on the users and the latency they experience with loading data when the data can be close to them so co-location is important also with cloud data or so data served from the cloud so there were many instances where we shipped data sets to users so that we can untort Cadmates to respect the local data to use that instead which helped in many instances to gain speed for data access generally also to represent multi-channel data Cadmate will collect stacks or image stacks into so-called stack groups image data in Cadmate in general and given that we typically talk about large image data sets or adding image data isn't a frequent operation normally on most scenarios that we worked with so this is currently still an admin or Cadmate admin operation that data is added to a server but over time so there are some ways where users can create own projects and own data themselves that we briefly look into later generally Cadmate is agnostic with respect to the data as the browser can render it in some way on this means we have to use it successfully with light data and electron microscopy data single and multi-channel data is all no problem at all most common still are regular tiles but more and more isotropic data sets appear so block-based representation like New York lands are pre-computed and like N5 are more and more common and Cadmate can talk and read both of them of course the benefit these block-based formats have is that they provide orthogonal views relatively easily or at no cost basically meaning it is easy for Cadmate or generally any other user of this data to look at it from the side or from different perspectives because the data is already presented in ways in these blocks that this makes this much easier and even though Cadmate is focused on neural reconstruction there are other annotation modes possible generally of course you can have simple text annotations and location annotations in a data set and you can have annotations on whole data sets that is backed by an ontology meaning that users can define a predefined set of vocabulary of terms and relations between those terms that can then be used to annotate image data and since this is a very structured form of image and term annotation this data can then be later used to for instance cluster based on the annotations that users did on individual image data sets but since tracing and neural reconstruction is sort of the focus of Cadmate I'd like to spend a little bit more time on this as well and explain the general workflow and thought process that is going on in the manual reconstruction environment. Generally we assume often that for neural reconstruction the approach is a question directed one meaning that people have interest in a particular circuit and a particular neuron group of neurons and so on and would dive locally into the data set and start reconstructing data there if nothing no one else already did that at this point and this tracing and exploratory data creation in the data set then at some point allows you to analyze what you have already and get a better picture of the environment of the data you create meaning that you can for instance quickly look after you created a small network how does this network look in a graph representation what typical connectivity patterns can you find there and this then again allows you to out this then again informs your decisions for continued tracing. So in Cadmate it's really important that you can quickly switch these perspectives between creating data tracing data and analysis and have this be sort of a workflow when reconstructing neuron morphology and connectivity and ultimately all these neurons form of course graphs and while the morphology is interesting we are typically more interested in the connectivity as well. So let me quickly explain how these graphs are represented in Cadmate. Neurons as you know connect to each other through synapses and there are different types of synapses and Cadmate allows you to annotate different types of synapses and different connectivity or relations between neurons all together but I'll stick to regular synapses here. When this example all these orange and red and yellow neurons are first sort of downstream partners of the green neuron we see on the right hand side and if we switch on synapses we see already how quickly these networks grow and indicate how quickly the how large these networks can become. So these neurons that to make this approachable in some way on a technical level and also sort of constrain what users can do. Neurons in Cadmate are strict tree structures and what this means is that they basically have a single root node and everything else expands from there and this is important or makes some technical traversals or makes it easier to traverse the neurons on a technical side, the database but makes it also a little bit more robust to user error in some cases. And in this case for instance you would see the segments of the skeleton getting darker that are farther away from this root node. Looking only at a topology of a neuron would maybe look something like this in this strict tree representation where in the center you have the soma, the cell body and everything else branches away from there. And individual neurons in Cadmate then represent these synapses in the following way. Users would place nodes in the datasets, regular tree nodes that would make up a skeleton or a neuron representation and then at a synapse location they would create these connector nodes and use them as hubs basically to connect to partner skeleton nodes. And there are different types of connectors or synapses as I said, but in this example it is a regular pre and post synaptic Germany. And then the synapses of course form complex graphs like this and like we've seen in the screenshot before and Cadmate then helps dissecting these graphs and hiding away features that might not be relevant. And there's a lot of threshold options in many analysis tools where you can say for instance to only respect connectivity over a certain threshold so they can quickly get a better understanding of these graphs that are formed by user input. Another aspect of Cadmate is that it is able to talk to other Cadmate servers. There is a so-called Federation mode where a Cadmate user can link in remote data from other public Cadmate instances. For instance, there are a few a public read-only Cadmate services that provides for instance a skeletonized version of a segmentation of the FafB data set that has been generated by Google. Then there is a synaptic, a synapse segmentation data set living in another Cadmate instance that has been generated by Julia Boeman and Jan Funke and Stefan Gerhard. And these then can be administered and worked with independently of a main production instance but it allows users to link this data in and sort of pick and choose what they need for their work. With this sort of fast run through some Cadmate features and general principles, I'd like now to jump over to an actual Cadmate instance. And I hope you see my browser tab switched and I went to a public Cadmate service called spaces.itana.io and on there. There are a couple of published projects along with neurons also available in these projects hosted and even without being locked in, as you can see in the upper right corner, I'm anonymous at the moment, I'll get presented the start page with all the available data. And I can click on the data and jump in and of course, like browse the FafB data set and all this works fine. And since this has been made accessible for anonymous access as well, however, as an anonymous user, I wouldn't be able to create data. So maybe as the first thing that we can do on this server here is that I will create an account and depending on the server setup, this can happen in multiple ways. Some servers allow you to register directly but on the server, we can use org IDs or Orkits to log in and have Cadmate create an account. If I hover my mouse cursor over the log in button, it says log in with Orkit and if I press okay, it will redirect me there and luckily I stored my password and it will redirect me back to Cadmate and now I have an actual Cadmate account backed by this Orkit ID that I have. There are other services that you can interconnect in this way. It uses OAuth2 as a fairly common format for authentication or protocol for authentication and this allowed us now to create an account here. Ultimately, the view doesn't look very different because as a new user, I don't see much else as a registered user on this service but we will need this account later when we actually want to modify and create data. To give you a better idea of general navigation and image usage in Cadmate or image display in Cadmate, I'd like to look at a few light data sets first in this list of projects at the bottom here. We have a bunch of Tresafila wrap project in lights images and I just click the first link in there and this shows us nothing mainly because we are at the top of the stack and this might be a little bit dark, this blue and I basically chose from a Cadmate project the first stack available. And in this image stack, like I said, this is a light stack. We have multiple channels available. In this case, we just opened the first Dapi channel with blue as a false coloring and it's easy to add additional stacks. Now, for instance, can we add the second channel, the wrap channel in a second view in Cadmate as we can see here. And they would move in synchronously and zoom synchronously, et cetera. But typically for light data, it is more interesting to actually add a stack to the focus viewer. So let me close this. And as you can see this way, you can overlay individual stacks and get to typical light microscopy representations in the lower left corner of Cadmate stack viewer, the thing where you can actually see image data. You have this little hamburger menu which provides access to a full host of layer settings for the displayed data. In this case, it allows you to modify the coloring, it allows you to modify blending options and so on. Due to a little time, I won't go into much detail here but generally it's good to know that you can adjust say the coloring in some form and then for instance, let's make this red and save this as a default. So generally, if you configure data in Cadmate so that it looks best for you and for your use case, you can save this as a default and then every time you open this image stack, it will load with these settings. This is especially useful for light data of course where you have maybe more complicated setups than for a single EM data set. Okay, now with this in mind, let's close this project again and maybe good to know as well as, so we looked at just at this rub 18, seen as one project which has the different stacks linked into it that we just saw and of course it's not really practical for light data to always select individual stacks. So you can teach Cadmate also to load all stacks at the same time for light data. So you can have this be generated automatically and oops, sorry. And this will then bring up the pre-configured version like you see the red that we just configured in one go, which makes it sometimes a little bit easier. Okay, also even though we only have four different projects here on this front page of Cadmate, it could be get cluttered pretty quickly if you have many projects and especially in the light microscopy context, this is common that you can up with hundreds of those. So it is convenient to build different views that represent better what you actually, how you want to access the data. So for instance, it is possible in Cadmate to tag individual projects and build other views for data access. So in the upper left corner, you see this little menu which provides different views. And for instance, I prepared one that would pick up techs from our four light data projects where we just looked at one and builds a matrix based from these techs. So it's relatively easy to automate a process where you'd have ongoing light data images come into your Cadmate, tag them automatically and appear in such a matrix. Rokko, can you, I said Rokko, Tom, can you enlarge the font size a bit? Of course, is that better? Yes, thanks a lot. Okay, yeah, no problem. Okay, so, but let's go back to the front view. Now we had a brief look at light data and ways to organize the front page and or at least me saying that this is possible. And one last aspect to this is that if you have configured front pages that look more or help you navigate the data, you can make them your default front view by clicking the little home icon in the menu here. Okay, so now let's move on to EM data and tracing data. Let me maybe zoom out a little bit again, just a second. Otherwise it gets too small here. Okay, okay, so now we opened the EM data set that we looked at earlier on already. And here we could also switch on the tracing data, for instance, and you see already neurons popping up. These are published neurons in the FafB data set. And we could click individual neurons and open, for instance, the 3D viewer, which I do by clicking the 3D buttons and the button, the top toolbar. And with a neuron selected, I can, like with many or almost all tools in CatMate, append the current selection of skeletons, neurons, to the active widget. In this case, in the lower right corner, I'm clicking append to select the active neuron, and which is selected in the source panel here, the active skeleton, appended to this list, meaning that it will show up here in 3D. And this 3D viewer behaves just like any other 3D viewer, what you can do, for instance, is give it a little bit nicer shading. So it's not so much about the details here, what I pick, just to show you that you can easily and quickly modify the visual appearance of these neurons, and you can, of course, load more into this. And one thing that might be handy if you're exploring data this way, and sometimes you find something interesting that you want to share with someone else, you can, of course, share the location, we may be assumed in a little bit more to actually be able to compare locations. And typically in CADMate, you do this with this URL to this view button in the upper right corner. If I just click it and open a link in the new view, in a new view, it will open your browser tab, bringing me back to the very same location like we've seen here. However, it will not copy all the front end state like our nice visualization that we just selected here. So what we can do is instead is create a short URL, which allows you to do exactly this. There are different shortcut options in the menu up here, but they all can be configured from this first entry, which gives you some user interface dialogue to configure such a link. So for instance, if we wanted to send this link to someone else, we could even say, provide a little message and say, hello. And if we wanted, we can give it an alias, which makes it nice to link from, say, a tweet. Otherwise, we will have YouTube-like short links and creating this link and we'll copy it to the clipboard. And now running this link and opening it should bring us to the very same view that we've seen before. So in this way, by using customized links and short links, we can basically share the full front end state of catmate, which has been proven very useful in the past. Now with, let me, with this neuron loaded, it might be interesting to also look at some measures and volumes. Luckily, in this data set, we have quite a few of them. So we might be wondering what brain compartments does this neuron innovate? So just to give you a better idea of the general lay of land, in the volumes and geometry tab, I will add a new volume, which is catmate lingo for measures, basically. And we can see the full adult fly brain here. Let me use, have it use phases, instead of wireframes and we can see our neuron here. And of course, while it innovates this, this is obvious, but we can also check what other volumes it might interact with and intersects. Luckily, in this data set, we have quite a few volumes, like I said, and in the volume manager tab widget, which is the little box icon in the upper right corner, you can see all these volumes. Depending on your screen size, you might not actually see these icons because they get invisible if this space isn't there. What you can do instead, is like with any widget in catmate, you can open it through the open widget dialogue, which is the first icon in the top. And doing that, so it gives you access to all the different icons, different widgets in catmate. So for instance, also the volume manager that we just opened through the icon button, we could also have opened here. Since it's open already, let's use the existing widget. And in here we can now ask for the active skeleton what neurons, what measures are actually intersected by this particular neuron. And we let it run the computation on the backend. And this isn't, in this case, it isn't pre-computed. So it actually goes through all the neurons and does the check, does the test, and it does take a small moment, which should be there in a second. Okay, and now we got a few suggestions. So maybe let's have a look at SIP, right? That's some color. And now we found out that apparently this volume intersects this neuron here. And it might be interesting now to ask, well, how much of this neuron is in fact in this compartment of the brain? And answers like this is something that catmate can answer, let me close the volume manager again. And as you already notice, screen real estate in catmate is scarce to some extent. That's why this window manager is very handy that you can basically move windows around and tap windows and basically maximize the screen space you have. And just since it isn't mentioned often in the lower right corner, you have also sort of full screen button and you can hide all the user controls away if that is needed. Okay, but we wanted to find out how much of this neuron is basically included in such a volume. And what we can do now is first add a filter to the 3D viewer. Many tools in catmate and many widgets in catmate do have this little filter icon in the top toolbar. And we see this as a 3D viewer and we click this funnel icon and it allows us to add filters. For instance, it allows us to add volume filters and we can add the SLP right volume. That's I think what, no, SIP is what I added here. Sorry, SIP right. We don't need to invert. Okay, and as we can see, this is only a really small fraction that is in fact included in this volume here. Everything else is basically hidden now, now that we activated this filter and we can use the very same filter to actually do measurements. Many widgets in catmate do support these kind of filters and if we go back into our selection table where we initially added our neuron too, we have a measurement button here which opens yet another widget. And let me also put this into this tab view here which provides some basic measurements for this neuron. And here as well, we could now add a volume filter. Let me do this. Yeah, okay. And this takes a moment and now it tells us exactly how much cable length and how many nodes are in here in this particular volume. Maybe more useful. This is even for connectivity analysis where you can look at the connectivity in individual brain compartments. Okay, let me close this again. And now we are still in an environment here where we have a user account but we are still looking at read-only data. In this data set, I cannot, if I create new data, if I enable tracing mode through this little button up here and click anywhere, I get a you don't have permissions error. And to change this, I can of course not just override the permissions of existing projects but I can create my own project based on those. So for instance, if I were a teacher and wanted to have my class trace in a data set to maybe learn something about biology, I could create my own class here and my own project here based off of this one or this project, for example and then distribute a link to this and my students would then be able to have to write access in this particular new project. So if I wanted to do something like this and be, or maybe for training purposes, I could go into the user menu that appears and I hover my mouse cursor over my username and click create own space. If I do so, my data pops up and asking me for a name for my space. I'll just keep it with it. I keep the default here, doesn't really matter. I want all the volumes to be available in this project also to be available in the new project and to kind of make this scenario work where I give other people, for instance, students access to this data set. I could say create a project token which is something like an invitation link and can assign some permissions with it and can say, for instance, everybody who will have access to this token should be able to read and write in this project and let's ignore the other permissions for now. And I'm creating a copy. And this now created successfully in the background and you had made project for me and created this project token, which I can copy. Oh, I think I copied it, let's see. Nope, just a second. Make sure, yeah, okay, now I have copied it. The dialogue would now allow me to switch directly to this new project if I do so and try to create notes here. This actually starts working and I have permission. This is now my own project. I can create new data. This is of course, focus data there, but everybody that I would now allow to, that I want to have access to this project too and work collaboratively with me in this project, I could now give this project token which I can also look up later in a project management tool within CapMate and this person would then be able to get access to this project. So this person, what this person would need to do is the following, they would need to create an account and they would in the user menu, click use project token and if they do this and paste the link that they may be got by email, CapMate will also ask them if they want to switch to this new project. Let's do this, not for now, but it registered basically that this new user, in this case, it's still me. So I didn't gain any new access because I had access to this project already but a user who wouldn't have had access already would have gained access now. Since of course, if many people do this, a view like this doesn't help and chatters this space relatively quickly and what you can do to access these kinds of spaces more easily is to go to the My Projects view which keeps all those projects that you explicitly register for and provides now access to this new project. This now allows you to basically collaborate relatively easily without any external help in a new project. In this, this CapMate project is completely empty at the moment and because it is the neurons that we've seen before a part of the other project, we can however link them in remotely and make them available as extra layers here in this project so that we could, for instance, import already published neurons in this dataset and work with those. And we could probably try this. So in this case, I added remote data from the project that we originally started with to this project and suddenly data starts to appear here. We see if can probably load also even more data. So this, for instance, is remote data available that is the skeletonized segmentation of the whole dataset that was created in a collaboration with Google and suddenly our space is filled. And this way, we can relatively quickly pick and choose what is relevant for us. Now that I see that the time is already running out already almost for this session that we get back to the slides. Since this only, the time we have only allowed for brief overview and various features, there are many things that could be explained in more details and we didn't look at any connectivity tools and there's much more to look at, but maybe this already helped you to get a better understanding on overall workflows and possibilities in CapMate. And with this, I'd like also to say thank you to many people. Like I said, CapMate's been around for 10 years now pretty much, maybe even a little longer. And over the time, many people have contributed to this project and use it for many different purposes. So thanks to many people involved here. And with this, I wonder if you have maybe any questions that can be answered right here. And otherwise, there is more information available on the web, send me an email or try CapMate yourself. And maybe as a last thing to mention, we are looking as a project for students that want to participate in the Google Summer of Code and the Cardona lab as well as looking for software engineers if you want to work on software like this and related topics that might be a good address to ask. Thanks for your attention and your time and I'm happy to answer any questions. Well, thanks a lot, Tom. It was really nice, very impressive data and handling. And there were a few questions, but Albert was extremely fast in replying to all of them. So it was like about, yeah, which species or what kind of data can it handle from which species can it come or from, yeah. And then it was also about what's hardware requirements does it have? But maybe since they are already answered, I would have maybe one question and this is if you could give an input, like you had like for the Tosofila projects that had been worked on like how many collaborators, I was interested a bit in the numbers, let's say, like how many collaborators were working on this and like how many neurons were traced and so on just to get a bit of an impression of that. Right, so I didn't prepare, I didn't look up any specific numbers. So I didn't want to say any wrong numbers, but over the time, like many people obviously were involved in this and I can't so much say for the L1 project. I remember that the last numbers one and a half years ago, I looked up for FFB was that the total that was produced by manual reconstruction was something like seven or eight meters in total of brain wiring in a Tosofila brain, which is a lot of cable length. And of course, with the advent of segmentation base or skeletonized segmentations in catmate, that is easy to top, but even seven or eight meters already is an impressive number for, and I think it's maybe around 200 people that contributed to that number back then. But this is also to say that catmate can handle many people at the same time working in the same spot, which is needed also for some of those data sets. Okay, thanks. Cool, Mary, that's quite a dimension. And maybe there's one last question appearing here. So that was like, can catmate deal with time series of 3D data sets? Not natively at the moment. There were some attempts in the past to make this happen, but at the moment, we can only support three dimensions basically directly. Yeah, so basically for now, we don't support time series. What maybe one way to represent it, but with a sort of different tool in catmate is the fact that catmate stores all changes. So there are history tables associated with every database table basically, and we can easily sort of easily roll back changes. And this potentially would also allow to represent time series to always only have the latest state represented in the database, but this doesn't sound like the right tool for this. Okay, then thanks a lot. Thanks again a lot for the presentation. And then we hand on to the Moby team. Okay, I'll stop sharing. You will need to, exactly, thank you. Okay, so then I'll try to share my screen. Can you see this? Yeah. Okay, yeah, so hello everyone. Welcome to the second part of this webinar where we will talk about Moby. So first, let me introduce us. So I'm Konstantin and also joining me as speakers will be Kimberly and Christian. And we are sort of the three main developers behind Moby, which is a tool for multimodal big image data, sharing and exploration, which we'll of course talk about further. And yeah, all three of us are at Ember Heidelberg. And we also sort of got a lot of feedback and help in implementing this from Valentina, Martin, Hernando, Detlef, Anna and Janik, who are also at EMBL Heidelberg. So yeah, I will give you a short overview of what Moby is and the ideas behind it. And then both Christian and Kimberly will show you how the tool actually works in true life demos. So yeah, what is Moby in the nutshell? So yeah, like I already said, the idea behind Moby is to have a toolkit for sharing and exploring large multimodal data and for doing this locally and in the cloud. So at the core, this means Moby has basically two parts. So the first is a data specification for large multimodal data. So before we dive into this, when we talk about multimodal data here, we mean sort of you have one data set which sort of shows a common specimen or organism and which represents this in multiple modalities. For example, with electron microscopy data or light microscopy data. And you can think of many examples here, like one prime example would be correlative electron microscopy data where you have like EMs showing a sample and then also different like microscopy sources for the same sample. And in addition to these sort of primary modalities, we also want to support different kinds of derived data. So sort of the main type of data we support right now are sort of segmentations that highlight different kinds of objects in the sample that you have like segmentations of cells or organelles. But other example would be things like neuron traces which we've already seen in the talk before by Tom. And then lastly, right? Sort of the key thing that makes this all work is that we base this data specification on existing data formats for big image data. And I'll talk about this more later. And then the second part of Mobi is a Fiji plugin that basically offers a viewer to browse this data sort of from local sources and from remote sources. And it's based on big data viewer to sort of the tool in Fiji to look at large image data. And our plugin is just easily available via Fiji Update site, which is just called Mobi. Before we talk more about Mobi itself, I want to quickly give you the history of the project. So it's a fairly new project. So I mean, we maybe started working on it like two years ago and sort of that it's actually been released is more like a year ago. And the sort of motivation to start on it comes from this Platinarius Atlas project. Where the goal is sort of to correlate morphological and genetic expression on a cellular level for this data set that shows Platinarius to Mary Lee Lava. And in this data set, we have basically three different modalities. First, we have the serial blockface data which shows the whole animal or the whole lava at six days old of Platinarius at sort of fairly high resolution at 10 times 10 times 25 nanometers. And this data set has like eight terabytes of raw data. So fairly large already. Then we have like these in situ like microscopy images that show different genetic markers. And these are much smaller. They are only at 500 nanometer resolution, but we have 220 or so of these volumes showing different genetic markers. And then finally, we also have segmentations that are derived from the EM data via some machine learning approaches that segment all the cells and some organelles and tissues in the data set. And the goal now and what motivated us to work on Moby is to bring these three modalities together and be able to explore them in a joint setting. And since we've developed it for the Platinarius data set, we've also used it for quite a few other projects sort of most recently for a different project at Amble and a University of Heidelberg to share electron microscopy tomograms and volumes that show SARS-CoV-2 infected cells. Okay, now that we've talked about the history, let's talk a bit more about Moby itself. So for the Fiji plugin, what are the main features? Sort of what do you really want to support? So the first important thing is that we want to enable browsing big image data. So for this, very important is to support image pyramids so that basically means, yeah, different resolution to be able to browse different resolutions of the data that we have for smooth zooming in and out. And that's luckily something that's already supported by the data viewer. So we can smoothly go through the data. Then we support bookmarks sort of you can share interesting locations in the data. And importantly, we want to be able to access this data both locally and remotely. So for remote access, we support accessing data in a cloud object store like Amazon, AWS S3. And then another important part is that we want to enable bridging of these different modalities, like for example, the electron microscopy volume and the light microscopy volumes that we have. And for this, we basically allow to display arbitrarily many image sources, but importantly, we need some way to bridge the coordinate spaces for these different modalities together because as these things come off the microscope there cause in very different coordinate systems. So what is necessary for this is to register these to each other. And this is not something we support in mobile directly but there are a lot of software tools for this like Elastics or BigWarp and there's already been a new bias webinar on this too. But what we can at least do here is support sort of on-the-fly applications of these transformations that are then the result of the registration. And that's also something that's already provided by Big Data Viewer. And then finally, we want to support interaction with this derived data, like I said, mainly with these segmentations. And for this year, we can display the segmentations with random colors or basically having a different color for each idea in the segmentations. And in addition, we have sort of these interactive tables that can be used to look up properties associated with these objects. So just as an example for the data set here, these tables could store the genetic expression for each of the cells in the segmentation. And in addition, this is also integrated with the Fiji 3D Viewer to render these segmented objects in 3D. So that's the viewer in a nutshell and Christian will show you how this all looks like in a second. But before that, I want to quickly talk about the data specification that we have for Mobi. And I'll first talk about the case when we all have this just on disk and want to load data locally. So then basically a Mobi project is just a folder on the file system where we have this top level Mobi project which can have sort of different data sets beneath where a data set is basically all the data that can be displayed together. And then this just sort of stores all the necessary metadata that we have in adjacent format. And then stores the images in the N5 format and sort of tables as comma separated values. And for details, you can look up sort of the exact specification online. So because, oh, I think my sorry about that. Sorry, give me one second. Yeah, so like I said, so the data is image data is stored in N5 format. So probably many of you are not really familiar with this. So this is like a chunk and the data format. So chunk means the data in there are stored sort of in small blocks. And the idea behind this is very similar to HDF5 which has been around for a long time. But contrary to HDF5, N5 stores all these chunks just as files as separate files. So you could see sort of here on the right how this would look if we actually look into one of these N5 files, basically just folders and then these chunks that you see here stored as separate files. And so the reason for us to use this is that this allows very easy access to these files remotely in the object store. Basically without having a server, we can just query these individual chunks from the Fiji client. And this is very similar to that with another new file format. And for this, there's an emerging image standard coming up which we want to support in the future which is OMZ-ESAR driven by the OME group. So yeah, this is basically what we talked about how we can locally access this data but sort of one key motivation for developing mobile for us was to easily then publish data and make it available and in the cloud or remotely. And so for this, yeah, the mode is a bit different. So instead of accessing the data locally via, yeah, just via the file system, we access the data via two sources. So first we put the metadata. So basically these JSON files and the tables that contain this to GitHub. And now in addition to the previous metadata we also store the addresses of these images that are then in an object store. So this is separate from GitHub and these images need to be uploaded separately to this object store and then basically live there. And then the viewer can kind of query metadata and image addresses from GitHub and then also query the actual image data from the object store. And yeah, so this may look a bit complicated but it's nice to have sort of all these smaller data under version control in GitHub. But in the future, we also want to support reading everything from the object store directly to make this easy. So yeah, this was sort of the ideas behind Moby in a nutshell. And now Christian will take over and show you how this actually works in a live demo. Okay, thank you, Konstantin. I'll share my screen. Okay, can you see my screen? Yes, we can see it, okay. So I want to start with further acknowledgments. I would like to thank Florian Yuk and Pawe Tomanchak for having me on multiple Fiji hackatons. And one of them was really important because Tobias Peach helped me improve all the code behind Moby that you will see running in a second. So that was very helpful. Thank you. And then I also want to thank Nicholas Chiarutini with whom I collaborate a lot on all these big data viewer projects. Okay, so let me browse now such a data set. So this is the Fiji and I will say open Moby project. And this is now the scenario where the data is remote. So the entry point is a guitar repository. And I think somebody also posted it in the chat. So you could look what's stored there. So this is the project metadata. So if I press okay, then it's going there and finding out what data could be stored. And then it's showing the first data set which is specified by this default bookmark that is required. And this is this eight terabyte volume EM. And so for example, I can zoom in now. And this works smoothly thanks to this resolution pyramids that Konstantin was talking about. So we can see all the details. Then since this is stored in these blocked chunks, one can also change here the orientation. So I can freely move the viewer plane. And I can also move around in X, Y and look at other structures here. For example, we see some muscle cells and here's actually some very dense neuron tissue. Let me zoom out again. So the thing is in 3D you can get lost very easily. So we implemented here for specifically this project a level button, which puts everything into an axis which is natural for this data set. So because it has a bilateral symmetry, we can go back there. And then let me also make it nice so that the head is on the top more or less. And here you see the arms of the animal. Here's kind of the brain, the mouth, I think. Okay, then what Konstantin talked about, the point was also to overlay now gene expression in C2 data. So this can be done, for example, here. So we can select any of them and say I want to also see this. So this is actually a muscle gene. So I can make it red. And then you can see together the EM and the LM data and I could add any of the others. So there's no limitation on the number of sources that we can show. Okay, then in addition to having these different modalities in terms of light microscopy and electron microscopy, Konstantin was also talking about segmentation. So I will now add cell segmentation. So we have several things segmented here. So these are the cells. And now they are displayed here. And there's also a table with all the cells and some features that you might have measured. So this is right now a random color lookup table. But you can also color by other features. So for this I actually have to load here in the table some additional columns from GitHub. So for example, Valentina tried to cluster the cells by their gene expression. So I'm adding this table. And now I'm having here a new column for the cells to which cluster they belong to. And now what I can do is I can say color by column and then I'm choosing the clusters. And again, using a random color lookup table. Now the coloring model changes in the table and in the animal and we can see the predicted like cell types based on the gene expression. So let me zoom in a little bit. Okay, and you see sometimes it takes a little bit to load. So now it's a bit blurry, but it gets better and better as it loads the higher resolution formats. Now we also have an annotation mode. For example, here you might say, hmm, I think this was maybe wrongly identified the cell. There should be another cell type. So there is an annotation mode. So you can say annotate, start new annotation. And let's say it maybe call it, so the gene cluster correction or something then you can create new categories. I don't know, something. And then one can hear select cells by control shift click, maybe these two. Now they are highlighted and actually another feature is when you highlight a cell here, it automatically jumps also to the table row so that you could check any numeric feature. And then you could say, okay, these I actually think they should be this other annotation and then you have a new column with your annotations that you could then store and share with your collaborators. So that's also an annotation mode. Then finally, I want to talk about the bookmarks. So, and at the moment it's not so interesting because there's only the default bookmark, but there is a context menu which allows me to load additional bookmarks. Again, from the Github project. And so what we did in fact for the manuscript that we are publishing, we made for each figure in the paper a bookmark. So that means people can now in Fiji just go there and look at the figure, but in life. So I will choose here the epithelial cell segmentation which is figure two B in the paper. And if I click, it takes a little bit because now it has to load a bunch of data and it's going to the place. And actually this is also something where we display here the cell segments as segmented in 3D with the image J3D viewer. So you can look at this. And as in catmate, I think all of this is linked. So for example, if I click now in the 3D viewer on this cell, it will in the big data viewer zoom in on the part of this cell. And vice versa, if I would select something here, it would go back. So you can kind of conveniently interact with all of this. Okay, that was the live demo. And then I would stop sharing and hand over to Kimberley who will show you how you could create your own movie project. Or of course, I don't know if you answer questions now, but okay, I'll stop sharing. Okay, perfect. I will share my screen. Let me just choose the right one and go in here. All right, cool. So you've seen a bit of what maybe you can do now and what it was made for. And my bit is kind of how can you make your own movie project? How can you get it working with your own data? So currently we have two ways to do this. The first one is this Python library that Konstantin wrote. So this is really good, especially if you have very large images or you wanna run it on cluster environment, something like that to make all of these file conversions to N5 very fast and efficient. One thing though is of course you need Python experience to be able to use this. So we also have another way of doing this. So this is a Fiji plugin. It's part of Moby. So if you enable the Moby update site, you'll have it already. And this can handle any medium size data that can be loaded in Fiji. So it'll also handle virtual stacks and all of this. So you've just gotta bear in mind that if your image is massive, you're gonna be better off using the Python library. All right, but then for this one, we don't need any of the programming experience. Cool, so I'm just gonna take you through making a really simple movie project. We're using this example data from Julio Mism. So thanks very much Julio for letting us use the data. What's nice about this data is it's a nice example of having multimodal data. So we have EM and also light microscopy and everything's been registered together. So we have some 2D EM, some 2D light microscopy. So fluorescence data and some 3D tomograms as well, which are higher resolution EM images. So I think I'll just get into it because you'll see what this data looks like anyway when I start putting it together. Okay, let me grab Fiji. So the first bit of this is you just search for Moby and you select create new Moby project. And it's gonna appear on my other screen. Let me just get it back up here. Cool. And then you just say what you wanna call that project. So I'm gonna call this yeast project because that's what it's of. We click select. And then what it's doing in the background is it's making this folder structure that Constantine mentioned before. So it's making all of this metadata. I just drag this up here and just creating this folder that's on your computer. So when it's finished doing that, it makes this little interface for you, which is gonna let you make data sets and images. So a data set, again, it's a bunch of images you wanna view in the same coordinate system. So it might be different samples or experiments, anything like this. We only have one, we're just gonna call it yeast. And then we have images. So let me just drag those over from my other screen. You just open them like you normally would in Fiji. This is a little 2D EM image. And then you click add, current open image. You name it something informative. So I'm gonna call it EM overview. At the moment, we're gonna leave this as image. If you had segmentations, you just have to say it was a segmentation because it will also calculate all the default tables that you need to be able to browse around like Tissue was saying. Okay, we're gonna leave it as end of five. And then we have this a fine transform. So Constantine also mentioned this, this idea that you can have in a fine transform and then it can be applied on the fly to look at different data sets. So with us, because this is 2D and we wanna compare 2D and 3D, the one thing I'm gonna change here is I'm gonna scale it a whole bunch instead. And you'll see why this is relevant later but it's just so that I can have a small 3D data set. And when I scroll through it, I'll still be able to see the 2D in the background but you'll get it in a minute. All right, so I'm just scaling it by some arbitrary amount. And then normally we could just use the default settings for 2D, they're not totally optimal. So I'm actually gonna set them manually. I've already found some good ones just to save me having to figure it out. So I'm just gonna copy and paste them over here. I mean, this is nice as well because you can kind of see what's important about N5. So like we said, it has this pyramidal format and that's what this first part is specifying. So you're saying here, how many layers do you want? I'm saying I want three layers and how much do you wanna down sample each layer? So like this first one, I'm saying I want full resolution, then I wanna down sample by two times but not this, I don't mention because it's 2D and then four times. And then the bit below that is the chunk size. So this is how big of the individual chunks we're actually loading from. And I've just kept it super simple here. So it's 64 by 64 for everything. I'm gonna leave the conversion now. It is, again, you could choose if you want to. We click okay, I drag it up over here. You can see that it's already done all of the exports. So what it's done is it's converted it to N5 and made all of the metadata that we needed. All right, and then we just do that again. So we have some fluorescence data. So here we go. And it's actually the same process. We add the current image. We name it something that makes sense. I probably spelled that wrong, it's fine. And I'm gonna scale it in Z because this is 2D again, just to make it a bit easier. And I'm gonna use the same settings as before. So I'm just gonna copy and paste them in. All right, this looks good, we click okay. And it does it. Okay, so then I have a 3D data set. So this one will be slightly different. Here it is, so you can see the use tier as we go through in 3D. The difference with this one is when we had these first two data sets, they were both of the same area. So you can see these have already been registered and they lie directly on top of each other. But this one is higher resolution and it's of a much smaller region. So this area here is, I think over here. So it's a very small region in comparison. So with this one, what we've done is we've already figured out what the registration is to match these together. So to take our tomogram and put it on the end and I'm gonna add that as in the fine transform. So what it'll do is on the fly, it'll figure out where to put this based on the fine transform that we've already calculated. All right, let me set that up and then it'll make more sense. So I add the current open image. I call it something again. But now for this a fine transform, I'm gonna go and find the one that we calculated for this registration. And then I'm gonna pop it in there and we're just gonna use the defaults for this one. You see it's writing this one slightly bigger. So it'll take a few seconds. It's just converting it to N5 and done. All right, let's close this up. So at this point it's done. We have our data set, we have all of our images and this is all on our local file system, all the metadata is there and everything is good. One thing we can also do is we can change how images display by default. I'll just show a super brief example of that. So like this fluorescence, I could click edit and I can drag this up here. And one thing we might wanna change is, for example, the color that it gets displayed with. So this was fluorescence and everything is going on my screen, I'm sorry. I'm gonna make it appear in green because it was green fluorescence. You update the properties and then it's updated the metadata. All right, so this is it. And then we can just open it. So this is the same command that Tissie was showing, this open Moby project. And then you just type in the file path to where it is. This should be right, I'm pretty sure I called it the same thing. And then it will open it. So we have the same thing that Tissie was showing. We have our settings and then we have our main window. This is our big overview image and we can browse around and see all of our yeast. We can add our fluorescence data and again, it appears in green like we asked it to. And it's properly registered. So it's properly where it should be. And then let's find this tomogram. So, okay, this is because I wrote it a little bit. So this is also appears where it should be, but clearly I have way too much brightness. So let's just max it out a bit. Can you see that that's been placed where it should be based on thisifying transform? And again, it's properly 3D. So if I just get rid of this, so you can have a quick look. So you can see the whole thing and browse around with 2D and 3D. Okay, so I think that's most of the basics. So I'll go back to my presentation and it's here. All right, yeah. So at this point, you've got the project set up. You've got it locally on your file system or on your local server or whatever. You can browse it and still do all of the things that Tissie showed you. But at some stage, you might want to show you a project. So then there's a few extra steps. So again, this is mirroring what Konstantin said. You'd have to take your image data and copy it into your object store. The structure that you set up is exactly the same. So it should be one step process of moving it all onto the object store. You have to add your image locations to the project message data. So this is just a few extra files that say, for example, what's the name of your object store? What kind of authentication? That sort of thing. At some point, this will be able to be done automatically. It'll be there soon. I haven't finished writing it yet, but it'll be there. And then the last thing is to copy all of your project metadata to GitHub. So to put it in a public repository. So at this stage, we have the two bits done. You've got all the images in the object store, all the metadata on GitHub. And now you can just give people the GitHub repository address and they can just type it in like we did for the Platy browser and they should be able to view it all together. All right, so with that, I think I just have one more side saying if you want more information on it, this is the GitHub address. There are some readme's and some tutorials there to get you going. Also, we're happy to receive feature requests or issues or anything from there. So that's also fine. And also feel free to contact us if you want some help with getting set up. We're always happy to help you out. And yeah, I think with that we're done and we can take some questions if people have some. Thanks a lot again to all three. Very impressive, very nice to see the smooth view of the data and overlays and it was really good also that you walked us through how to do this. So I think that I really liked it. And again, for the questions, you were very fast in replying. Moderators were very fast. I think maybe I can still pick up some again, even if they were already answered and for example, one was if the data is inside that Git database file or where is it exactly stored? So maybe, I don't know. Yeah, yeah, just to, yeah, quickly, I'm sorry, yeah, to quickly clarify it. Yeah, so now the actual raw image data is stored on this object store. So for example, on like an AWS S3 store and GitHub only holds like the small metadata that we have because of course it's not possible to put like these huge files under version control on GitHub. Let's see, I think, so I was wondering since when you gave the example where the cells were already segmented, then you get a much more features when you have the segmentation, right? Because then you can click on the cells and you have this very beautiful zooming in to that data and so on, also in the parameters. And maybe even if it's not a mobile related question but you could maybe tell us again how you did the segmentation, you loaded it. Yeah, I mean, so this was my part of the project too. And so this is basically like a unit. So like a deep learning network that predicts the boundaries of these cells. And then on top of this like a clustering algorithm that sort of makes sort of the watershed over segmentation and then clusters these together to segment all these cells in the volume. So you did not use elastic for that? No, not for that, but for some easier sort of parts we did use elastic. So we also had like segmentation of some of the tissue like the gut or the neuropay that you maybe saw. And for this we have used elastic. And then, I don't know whether you want to pick again a question that was already answered. I mean, there are new questions appearing. I could pick one. Yes, please, that's great. Aha, you're very good. Steve is asking, could mobile image data be from an Omero server? I think current simple answer is no, we haven't tried that. But also, I mean, the data would have to be stored in this pyramidal chunk format for this to be performed. And I'm not sure. I think some of the data, yeah, I'm not, I don't know enough about this Omero backend stuff. So currently not possible and whether it would be possible, I don't know. So I cannot, yeah, Steve, I don't know exactly. But it's an interesting idea, definitely. There's also the question, how did you pre-compute the gene expression? I was actually also interested in this. What was done there? Unfortunately, the person that did that is not part of the three of us. So I don't know if somebody dares to answer, Konstantin. I mean, I can, yeah, I think I can give a quick overview. So we have like all these in-situ sets that show the gene expression that we've registered. And then we just basically measure the kind of volumetric overlap that we have in our common coordinate space, right? So that we retake the gene and we also retake each of the cells and for each of the genes, we measure how much of the cellular volume overlaps with this gene expression. That was sort of the simpler way to do it. And then we also looked at a bit more complicated way, sort of that looked more like how consistent these different expressions are, but sort of for the simple example we showed, we used just this sort of, yeah, volume overlap. Cool, thanks. Maybe so I can just pick the next question because this is also about segmentation. So the question is if there's a dynamic mechanism for updating the dataset. So for example, for rerunning the segmentation. So currently there's nothing really dynamic in the sense of that you can do this while having a live movie instance. So, but we usually did for this, we kind of create sort of a new version of the segmentation like a completely new volume and then store this beside sort of the old version which was actually one of the reasons why we had this metadata in Git because this way we could sort of keep track at least of the different metadata files for the segmentation. But I think, I mean, since these are these N5 files in principle you could just in a running Moby rewrite these chunks. I think, I mean, it might crash a little bit but it's thinkable, right? So I think one could just try it, yeah, I don't know. Okay. Then another one, I think. There's the last year, about the N5 data whether it's readable on Moby, like the catmate N5. Is it readable on Moby and vice versa? So I would have to ask Tom, I don't know how, what kind of N5 flavor you are using. Is Tom still around? I am, yes. So I'm not aware of any particular flavors. So far we just use sort of regular N5 that came out of the Salford lab. I believe Chris Barnes did a little more work with this but so far if they conform to the regular attributes JSON format then we can read them directly without issues. Yeah, I think on our side, there's kind of a bit of a specification on top that comes from big data viewer, which basically just needs some additional metadata stored in this attributes.json. So we would need to have these and then also some more metadata. So I would think it probably doesn't work sort of off the shelf, but it would probably only take a very few changes, right? You wouldn't need to rewrite any of the data. You would just need to add some metadata. Good, and then there is a new question that appeared. Do you have plans to support thin plates blind transforms? I can try to take that. I think that might exist already. So the person to ping there would be John Bogovic from the Salford lab. And I think it might be already some possible to do to have this not only a fine, but also more complex transformations that are applied on the fly. But John Bogovic Salford lab is your man. So ask him what the current status is. But I know he was looking into this. Good question. Thank you. Okay. So I think with that, we reached the end. So thanks again to all the speakers for super interesting webinar for all participants. So there is a survey. Exactly, Roku just sends the link. Please fill out the survey. Also, you will find then all the questions answered at form.image.sc at some point when they are uploaded. And also the recording of that when we're done will be uploaded then after a bit of editing. So thanks a lot for joining us. And we hope to see you in the next new Bias Academy webinar.