 So good morning to the brave ones who managed to get up I Want to tell you about automated circuit reconstruction What we've done to solve it and where the field stands not solve it how we've tried to solve it, okay? Because this will be one of the morale of the story. We're not there yet, but but I'm hopeful I'll explain the reasons why This is joint work of many people so beyond Andres who is now starting his own lab in Zabrücken Tom Kröger who is about to defend his PhD and then Ulrich Kröter who is heading the group along with me I show elastic and there's a huge Well development effort behind that and I want to single out Christoph Strahle on a crash against your work and then finally We we have lots of fascinating data. Thanks to Graham not Kevin Regman Winford Denk and then from the media farm Davy Bark Albert Cardona And then Lucifer in media from the fly EM team and I want to thank them for the insight more than the data so This is where we start from a volumetric image and We want to find not just one but all the processing processes in there. So we want to get a dense reconstruction We want the processes and the synaptic connections between them to get a wiring diagram and This will be the end of one journey and the beginning of the next Because even if you have the wiring diagram, you have not understood a thing about the brain. So there will be a Much more network analysis required down the road But it's also hard to imagine understanding the brain without having its wiring diagram. So this is where the interest comes from From the image processing point of view, it's not an easy problem for two reasons One is that all the neurons look alike Let me show you that in in images So I fear some Very beautiful data from Graham not and you see that the resolution here is isotropic so we have three author views and Resolution is as good in in all three dimensions here However, if I zoom out a little You see this neuron here of If I look at its texture its contents There's nothing that distinguishes it from the adjacent urine or from that one down there or from that one down there So the only thing distinguishing them is their position It's not the usual thing that you have in computer vision where you try to do semantic labeling or Foreground background segmentation here. All things look alike. So we have a pure partitioning problem Moreover errors are not forgiven. So you know in electrical circuits. It's really important not to have a short Single short and you know you fumes rise and your set is broken Same thing here if you want to find these processes if we just make a single error So if we in some place where the membrane is weakly visible if we pretend that these are one We've created a short circuit and we have an error that propagates throughout the entire volume So this is why it's not not a trivial problem turns out that Mmm, basically all of the work in this field Now follows a canonical pipeline. We have this you know these brown bullets here for data So starting with the raw data and then at the very end we have our final segmentations and green is Some kind of some processing step and I've stated here a few papers. I'll go over some of them or eludes some of them Representing something like 50 person years of work. So of necessity my overview will be on a very very course level But you see that people have looked at each and every processing step trying to optimize it Including the proof reading, but I've not given references here because that's not the focus of the talk here I want to talk about the automated tracing mostly So in pictures. This is roughly what it looks like we start from the raw data seen at the very left We obtain Some regions that we strongly believe belong together. These are often called super voxels in the field and Then we try and decide how likely it is that two super voxels Should be merged and then come to final segmentation Going in a bit more detail here in this first step. We went we went to find the super voxels we try and determine Well, we need to make predictions somehow and If we just looked at the gray value to each single pixel that would not be very informative So what we do instead is we compute many features You know an obvious choice is to just smooth a little bit over the neighborhood We compute many features that somehow capture different aspects of the local environment of a pixel and then we aggregate all of those features so in the original image we start with just a single gray value per pixel and Here in this case, I've shown five or six features. So now I have six values per pixel So each pixel now lives in a six-dimensional space and in practice we compute even more features to capture more of the characteristics of the environment And then we can use this for a supervised learning stuff For that we need some user annotations. So we have a user telling me that you know green is not a membrane and red is a membrane and Then I can look at this in feature space So out of the six features, I've just shown here two feature dimensions That's featured a feature number two That's feature number one and have a scatter plot here where each pixel represents or each dot represents one pixel from the original image and then you see here the Annotations this time in feature space and these annotations allow me to Learn some kind of decision boundary which I can then apply to all points in feature space and then map the result back So to be totally consistent, I should have made this here on the left-hand side green and red But you know, I've chosen here steps between white and black So this is to bring the membranes out more clearly and if I like I can define more than two classes and Suppress the mitochondria and so on. So I want to show you how this works in practice this is told we're developing it's called elastic and So far nothing has happened. So I've loaded the data, but no computations have happened whatsoever and You now see that I can compute different kinds of features. So I'm using a color here at two scales and texture at some scale and You know note that my choice here is not totally random So I will talk about an alternative which is to use neural networks neural networks People say I don't have to make an informed choice here neural networks will learn themselves which feeds are meaningful Here I have to select something that captures relevant properties of the data however, I can browse these features and That helps me deciding which ones or which scale is likely going to be meaningful and Incidentally, you see that as I go from one feature to the next here, there's always a short waiting time and That is because everything here is computed just on demand So that is important because the data set that I've loaded here is One gigabyte of raw data and I've computed three six seven eight features here So the features alone would be eight gigabytes, which is the RAM of this commodity notebook here So even if I just computed all the features at once I would have no space left to do any computations on them and this is why we compute everything on demand Okay, so you see the data is being loaded features are being computed displayed and then Renormalized so that everything is shown on the same color table and as new values appear the renormalization of the rest has to be iterated Okay, and this is done in a block wise fashion. So if I now go from one Z slice to the next this becomes faster because the data is already there the features have to be computed again on the fly and This is pretty fast until I hit the boundary of a block and then new data needs to be reloaded and Well the attraction and this just in time computation is that it allows me to work on data sets that wouldn't traditionally fit into RAM so on Data which has resolution both space and time I've done interactive machine learning of this machine here on a hundred gigabyte data set so Here's my data. I can now define a couple of classes and Let's say I'm using red for membrane and green for not membrane and Then there's a synapse you see the vesicles you see the post synaptic density, so I'm coloring the post synaptic density here and So I've given these labels in my presentation this corresponded to This step here, okay having given labels and now when I press live update I train the classifier and make predictions for all points So I press live update and now it first needs to compute the features in this area And then with labels trains the classifier and then may makes a prediction And if I now zoom out as I come to the border of my tiles It starts making these predictions for the adjacent tiles Okay, and now we can for example say I'm most interested in synapses so I can give negative examples Perhaps I even want An extra class For everything that looks vesicular I'm adding another class here and saying this is not a synapse and Pressing live update again So it has pulled out the synapse. It has pulled out that synapse There's a mistake here. It has also pulled out this Biologist, what is this? Myelin myelin. Thank you Which goes through the entire volume, so this we need to distinguish from the synapses in a in a Post-processing step. So let me again show you what what interactive means. I Click on this point and say I don't like it and then the classifier is being retrained and predictions are being remade Okay, and then I can go to other parts of the volume and see if predictions are reasonable They are also and then iterate this until I'm happy. So There's a piece of software called elastic and I've just demoed you version point six, which is not out yet We hope to bring it out this summer Point five only worked on data that would fit into RAM and just now I've shown you this version Which works on data much larger than RAM On fit some data this works very nicely to bring out the synapses on SSTM data you have to work a bit harder So you need to find signups candidates and then compute more features on these and then Make a prediction, but on fit some it really works out of the box So this picture here was generated by painting five posts in optic densities That was enough to bring out all of them. That's again. Thanks to the very nice data here from from Graham not so to summarize short first part interactive machine learning To our mind makes a big difference Because the traditional way would be that you let's say Monday you give labels Tuesday you train Wednesday you look at the results and that makes for a slow turnaround and Moreover, if you don't know in advance, which labels to give You need you may need to label quite a lot and in this case you've seen me Given just a single dot in a point where I was not pleased so I can bring the human in the loop and do some We call it interactive learning if the computer actively asked for points This would be active learning and such schemes can be embedded so I can for example say I want to look at the uncertainty and then it Color codes where the uncertainty is highest and where it could need more labels Over time We find that it allows even people who don't make a living out of image processing to to understand what helps and does not help So for example, if you define too many classes The program will tell you or if you give if you try to distinguish Classes that given those features cannot be discriminated You will see the effect immediately if you've given a bad label you will see it immediately so that's what we what we mean by dialogue and In our experiences has really helped reduce the time that you need to train such a regression scheme Once you're happy with your with your classifier you can press a button called Badge prediction and then depending on how big your data set is you can grab a coffee or Go for dinner or go on a holiday So the biggest that we've the biggest that we've tried is on a 11 terabyte data set And so we've crunched I think nine terabyte out of this and this was only possible using a compute cluster And so that was more the holiday kind of endeavor For the technically minded people in this new version it will also be an object level classifier Which I'm happy to tell you more about in the break if you like so rather than doing pixel classification We can now do this interactive training on object level and that allows us to solve more difficult problems like finding the synapse and the synapses in SSTM images And if anyone's interested in counting I am also and we think we have something going there Okay, so you've seen us take this detour we started from the raw data and Computed features and then trained a regression scheme to make a prediction if something is for example membrane or not Your networks proposed to Bypass this explicit separate feature computation step and do it all in one go and That has some merits So if done well this Performed extremely well. So for example neural network won a recent is be challenge on SSTM data Also once you have your network trained. This is super fast and you don't need to specify the features in advance So you saw me click on certain scales on certain features neural networks learn these by themselves on The downside needs a lot of training data. So a neural network of The kind used in this field here has of the order of 50 or 100,000 parameters So of necessity you need a lot of labels and if you're fortunate enough to be able to pay for You know an appropriate number of students who will able that's good But but otherwise that that's not a good thing and the training is slow So it easily takes a week on a GPU cluster Can be longer can be shorter and then We put black magic in picking features You know that what people have to put some black magic in Specifying the topology of the network. So how many hidden layers how many neurons per layer and so on and these choices matter However, there have been papers that have used them successfully and in particular Proposed very clever training schemes, and I want to single out here the malice scheme from stringy to raga. I mentioned these short circuits, but you want to avoid so here's a color map of Probability of a pixel being inside a neuron and What you want to avoid is these bridges between separate neurons so the idea here is to burn the bridges and For that you need to first find the bridges and then give a high weight to these pixels and tell the neural network to lower the estimated probability in these spots and That is done by finding paths between Between adjacent neurons and then the lowest pixels on those paths and then Among those lowest pixels the highest one. So this is the critical This is the worst point that you want to improve and you can do this iteratively Scheme that works very well Okay, so this point we have a Weighted graph on the voxel level it can either edge weighted or node weighted Edge weighted would mean do these voxels belong to the same neuron node weighted and what would mean is this membrane or not a membrane Then we went to partition this graph to obtain super voxels or small regions Here's an example where super voxels have been computed simply on a on a specific feature So we have these very small regions That now already are somehow adapted to the data. So perhaps you can see, you know, there's something running down here so having these regions allows us to extract more meaningful or more descriptive features and A few years ago people try to you know go from the raw data to the final result and just a single step And turns out that it's not possible. You need an intermediate stage where you then compute features here on this region level or super super voxel level Which you can then further process This partitioning into regions traditionally happens by either just using watershed or connected components but for For some purposes you need not just one partitioning but multiple partitionings and There have been a couple of suggestions. So there's one by Emilio Vasquez Reina from the Pfister lab Jan Funker did something Together with Albert Cardona and us and Christopher Drell from my group tried something else So the idea here is that it may be difficult so this these weights here may not be perfect and It may not make sense to operate with a single partitioning, but perhaps you want to get multiple interpretations of this and then work with these multiple interpretations together in subsequent steps and Then once you have such regions you can compute Likelihood the two adjacent regions should be merged so shown here are all the boundaries of all these regions that you had just seen and Color-coded now is one specific feature. I don't even know which one. So we look at things like I have a boundary between super between two super voxels How different are they in size or what is the lowest? Boundary probability along along this boundary and so on. So, you know anything you can invent That that looks remotely useful can be applied here And here you've seen example of different features that are all color-coded and then all of these features together and we compute many like 40 or so of them all of these features together can finally be used in another regression scheme which again needs to be trained on Using using some human input and now this makes a prediction if a boundary should be eliminated if it should be turned off Then this is shown here in red or if it should be kept on then it's green and if you look closely you see some intermediate colors where the classifier was not quite sure and That kind of information can now be used to merge adjacent regions This merging traditionally happens by using a hierarchy And there are a couple of papers of Viren Jane who has a group at Julia farm did this Tolga test edition and colleagues did it Nunez Iglesias Juan did it Together with colleagues from Janelia Here's a illustration I'm showing you three boundary candidates and Then a tree that expresses how likely it is that two adjacent regions should be merged So in this case or the boundary between A and B apparently looks weaker given the data So if there will be any merger it will be first between A and B and then between the resulting big region and C Okay, so here are the possible segmentations that you can produce you can I've shown a threshold here I can either threshold my tree here and then I get these three regions out or I can threshold there Then A and B have been merged But C is still distinct or I can put my threshold up here, then I get a single large region However, there are Some segmentations that cannot be expressed by such a hierarchical tree So you cannot put any specific threshold here still this tree is a It's a data structure that allows for very efficient computation and inference and these papers here have interesting policies on How how to learn an appropriate merging strategy? So you want to learn both these Propensities of two adjacent regions to merge you want to do this right and then given the tree you want to find the proper threshold and This threshold here I've shown it as a horizontal line if you have a bigger tree it cannot be something more complicated You can cut the tree in different heights and in different places So it's a good approach, but limited in that it cannot express all segmentations We've done something that can express all segmentations, but it's more costly so We use a different representation A dual representation where each boundary can be flagged as on or off It's good because it allows to express all segmentations An alternative to using this dual representation would be to give an index to each region So we could say this is region one region to region three and then if I want to merge two regions Then I give the same index to both regions However by working this dual space we can work with a much smaller label space We don't need a large set of indices. We just need a zero or one Indicator variables to tell us if a single boundary is on or off This avoids the graph current problem that's implicit when you work in the primal domain and it avoids degenerate solutions What I mean by this is that if you use indices I Could call this region one and this one two and this one three or I could call them two or three one or one three two all of these are different Well, they're the identical solution, but different representations of the same solution It's just an expression of you're using a label space, which is too large in the primal domain So this is why we work in the dual price that we pay is that consistency is no longer guaranteed So look at this configuration here I've switched this boundary on and these other two off, but that's nonsensical. So if I look at these two points here There is a path separated by a boundary which tells me that You know these points should be separate and then there's another path That connects the two points without crossing an active boundary Which tells me they should be disconnected. So I have a I have a dispute here, which I need to resolve This is how we do it It's an approach called correlation clustering or the multi-cut partitioning problem We Define simple cycles so cycles that do not intersect with themselves of a specific length We demand that they all be consistent So each indicator variable for a boundary must be either zero or one boundaries should be on or off and then the important One important bit on this slide is this inequality here. It tells me that the The indicator for one boundary must be less or equal the sums of indicators for all other boundaries You know for the entire system to be consistent So we have a simple cycle here This boundary is being on So one of the wise is on and now I demand that one be less than or equal the sum of these two wise But these two have been switched off So this is zero plus zero and I have that one is less than or equal zero plus zero, which is not true Okay, so I have a violated constraint which I can then use to update my system So we take this entire set of constraints if we take all of them This is an exponentially large set. So it's not even possible to write all of them down However, what we do is we start without constraints optimize the problem then find of violations efficiently in polynomial time at these constraints and iterate this this is called a cutting planes approach and So the optimization problem that we solve is to following We have a cost theta associated with each boundary and in this case Your theta is a vector. So those are the costs for all the boundaries in my three-dimensional image Then I have a vector of indicator variables that tells me if all of these boundaries are on or off and I want to minimize this inner product. So I want to minimize the cost of all the boundaries that are on Subject to my solution being Being a legal one being a consistent one being in the multi-cut polytope Now if all of my losses were positive then the solution would be trivial I would simply switch all boundaries off However problem becomes more interesting if some of the theaters are positive and some are negative So this is the non-trivial situation that that we're facing here. And if you look at this problem then So we're minimizing this over why and that's linear in why so we're minimizing a linear objective That's at easy as it gets. So you have vector pointing in this direction of theta and you just want to walk as far as you can in this Against theta And then we have this multi-cut polytope and if it wasn't for this condition here If it wasn't for the fact that why has to be binary this would be a convex polytope very high-dimensional But convex so we would do we would optimize a linear objective over a convex body Which is very well studied and an easy optimization problem. It's called linear programming However here we are restricted to binary solutions, which makes it an integer linear program Which is empty hard to solve in general and in fact the correlation clustering or multi-cut problem is empty hard However, they're very good solvers around these days. So we simply use One of these industrial weight solvers To solve the problem to global optimality And because we've solved it to global optimality, we know that all mistakes in the final solution are ours. Okay, so we've Our modeling has been deficient Or we've not trained the classifier. Well, it's not because we've not optimized the system Well, we find really the globally optimal solution on very large data sets on up to millions of random variables and All remaining mistakes are ours On even larger systems, one can use heuristics to find approximate solutions So Here's an example if I just a threshold these votes of single classifiers That's what I get and if you look closely for example, the boundary is on here, but off there So this is inconsistent. This is taboo and If I now did a connected component analysis on these thresholded predictions without the multi-cut constraint I would remove most boundaries by transitivity So you've seen that most boundaries here that were green have turned yellow Which means that the classifier here believes that the boundary should be on But somewhere in the third dimension the classifier believes that the boundary should be off And then by transitivity all of the other boundaries have to be, you know, they melted away So this is why just thresholding the boundaries is not enough This is why we need the This is why we need the multi-cut constraint and here's the solution. So if you compare If you compare to just thresholding, you know, we've fixed all these Lose ends here and we've done so without introducing a bias So when you have such loose ends You could have heuristic solutions that say, okay, when there is a gap in a membrane You should close it or you could say if there's a single membrane somewhere then erase it But you know, that's a bias going one way or the other way and the Things I've presented here this correlation clustering gives you an unbiased solution This is why it's worth a Superman up here. As I said on the downside, um, this is computationally very expensive Now I didn't mention how how we find those penalties For For switching a single boundary on or off Of course, we want to learn them from training data rather than invent rules by hand And we can either use unstructured learning ignoring the consistency constraints or we can do structured learning potentially even with the structured loss function and You know summary of this graph is that the unstructured methods here They are shown with these horizontal bars in terms of rand error. They're doing worse than Drink this structured learning So this is good enough to produce pretty pictures And each lab working on in this field has its own pretty pictures However, it's not good enough to do You know science Let me show you why This is a solution that we get from this multi-cut Constraints And I've blown up a small part here highlighting some errors So you see there's a thin process here Which we've segmented into, you know, we've totally over segmented this We have a boundary too many here and there and there and there Of course, we've and also there's over segmentation here We've biased the system to over segment rather than under segment because in subsequent steps Over segmentation is much easier to fix than under segmentation And also, you know from this 2d slice that you see here. I've really picked the the worst bit So it looks quite okay in in other parts But it still means we lose the fine processes and this is why You know starting at the so much out here. We don't get the very fine branches Of these arbors that really should be there Shown here is only a minute fraction of all the cells The partitioning has been done on all cells But if I switched all cells on you would just see a colored cube. Okay, so this is why Some cells here have been selected randomly So this multi-cut partitioning part we believe in this dual representation It is as I said at the very outset. It's a pure partitioning problem We have nothing that distinguishes locally one urine from another one The scheme that we have is unbiased so it doesn't bias towards either closing holes or opening holes And we get the optimal solution on pretty big systems and we have a heuristic beyond Learning of this is luckily fast Compared to your networks, but as I pointed out it still Doesn't have the accuracy that we need to really get the full circuits out I've been talking about the processing of truly 3d data So on the right-hand side, you see Graham Knott's data x y and then x versus z And I've tried to indicate here Two neurons from red slides and then from an adjacent blue slides And I wanted to show that the volume of each neuron only overlaps with itself So if you have that kind of data you can truly do the segmentation in 3d If you work with series sectioning imaging Then the x versus z plot looks like this And if I you know those are actually the same neurons But Now this red neuron here overlaps in the next slice with both this one and that one And now you can argue but it overlaps much more with the top one than with the bottom one and indeed, you know, this can be exploited and and worked on But it's a little less easy than working in in three dimensions directly I want to refer to a couple of papers here who do this They all produce multiple segmentations So here is an excerpt from the Jan Funke paper Where he showed that in these three regions Actually different thresholds or parameters parameters of a segmentation method are best to do justice to You know to produce the proper segmentation in these different regions So all of these possible segmentations are arranged in a tree And you do this for each slide separately And then you want to simultaneously Select the best segmentation that can be represented by a tree within the slides And you want to connect it with elements from the next slice And that works reasonably well Reasonably well, but again, you know, not the accuracy that you need to get full dense circuits correctly I'll be very short on the proofreading Even though this is uh, you can argue the most vital part these days When computer vision people started in this business Seven years ago, they were ambitious and they wanted to solve the whole thing automatically and and we still do But it turned out that it was not as easy as hoped And this is why today most work is semi manual. So you start with a You start with some conservative auto automatic result and then you try and fix it Okay, so there you have to distinguish if you just want to skeletonize or if you want space filling segmentation And uh, Knossos and catmate, for example, allow you to trace manually the neurons Then there are say my manual schemes I have two ones in gray, which I think will come out in uh, will be made available to the public In all of these schemes, you start with an over segmentation and then let the user merge regions Or uh, I think raveler can also split regions if there was an under segmentation in the input that you received I'll show you carving Which uses one particular strategy So i'm adding some data This again gram knots beautiful data It's been a little bit smooth here in all three dimensions and I can now compute a watershed on this I'm simply using the raw data here as is And i'm telling the algorithm that boundaries in this data are dark because the algorithm needs to know, you know What are boundaries? Are they bright or dark or what are they? and you can do this On the raw data or you can do it on probability maps and So this now computes watershed segmentation So rather than work on a million of voxels that the three data set it had initially we now work on only a region adjacency graph of 60 000 super voxels and All subsequent operations now happen on this region adjacency graph so I can for example say make this green and I have a special background class that grows a bit more aggressively than the rest Then I get out a segmentation, which is You know has been bleeding out here across this membrane So there was a weak point somewhere in the membrane so I can say this is another class and press segment and you see that These response times now are very very fast because we work only on this super voxel graph rather than on The rather than on each and every voxel And then I can look at the result in 3d and you know keep working on this and correcting and so on and the Well, the strong point of this algorithm is that Because it has because it relies fully on human input You can solve anything You can segment anything. However, how much effort it is will strongly depend on how good the boundaries are Okay, because this always grows the seeds Well until It grows all seeds until they fill the entire volume And if your boundaries are weak then the seed will bleed through places that you didn't have in mind Also, if you work in 3d, it's important to have an answer to indicator Because you don't want to keep browsing Two and fourth in the data to spot the error. So, you know, I I spotted these errors manually now But at some point this becomes very hard and in very large data sets So a few things I haven't talked about are special features that people develop like the rather unlike features or array features Then you can do more work to to get a smarter regression here. For example out of context Stacks classifiers where the next classifier works on the output of the previous one This is uh, has some relation to the deep learning from neural networks Or you can have a bias towards Closing boundaries. So I mentioned that we must avoid these short circuits And Verena Keinich, for example, has a prior in her conditional random field that will tend to Close such small holes Okay, so as I said, it's around 50 man years worth on this slide alone here now I tried to give the realistic picture. I hope I haven't sounded too pessimistic or too optimistic. I I I tried to render my proper feelings here um I'm going to about talk about the future now. So, you know, as always this Attributed to many people Prediction is hard, especially about the future. Uh, that's disclaimer number one disclaimer number two Um, people always get the difficulty of computer vision questions wrong So one reason is that we do it so well we have on You know into our advantage. We have a few million years of Uh evolution and the moment you open your eyes, you cannot help Parsing everything you see. Uh, so you open your eyes and you see this is the door. This is the board This is the seat and uh, it just works so seemingly effortlessly that we don't even realize how hard it is And even people who should know better, you know, keep making this mistake One of the most latent mistakes If you want to call it that is this 1966 summer project from seamer piper Who proposed to solve a big part of the computer vision problem by a couple of, you know, visiting students over over summer Um, this is like, you know, the the billgate statement about the ramp. The same thing translated to computer vision somehow Um So those are the disclaimers Having said that I am still optimistic and uh, I think that one of us groups working on this will be able to report Complete success a few years from now Um, why am I so optimistic? Because the number of categories is small So humans easily distinguish of the order of 10,000 different categories Whereas in this data, if I look locally Or all the categories that I need for segmentation Are, you know, maybe 10. So I have uh intracellular extracellular. I have uh Visacles I have mitochondria. Uh, I have synapses. I have er Um, I have myolines Um, but you know, it's It's of the order of 10 Um, then secondly, you know, as computer vision people, we always, uh, you know, we we hate the noise and the data and we ask for better registration and so on But still the variability that we get in these Very nice images is much much smaller than what you see in the real world So in this sense, it's easier than the generic computer vision problem Also, some of the errors that we make are still, you know, of the embarrassing kind. I've, uh, I've shown you, uh I've shown you some of the errors that we make Here You know, it's pretty obvious to a human that, you know, this should be connected and if I look at these membranes Here the the boundary evidence on this membrane will be so much stronger than the boundary evidence on that one And apparently we we don't have a feature yet that, you know, captures these Relations between different boundaries associated with the with a single object. So Some of our errors are still sufficiently stupid that we should be able to fix them So we get the shape sometimes very wrong or we get the size very wrong If we have something tiny, we know that this cannot be correct And then For humans, it's often very little context that is required to make the correct decision So I'm showing you this eye wire game from this young lab here and I've tried and traced this So I was given the blue thing I've added a couple of super voxels that are colored in green because this is building on the output of a neural network trained with melas and then The program tells me based on the output of other players where I was wrong So I was wrong here and I was wrong there And I want to point out two things. So one is If I as a human had done the blue and the green thing I wouldn't exactly know where to look for something that I might I might have missed, you know, I would look here I would look there. Perhaps I would look also here because there seems to be some kind of king And then secondly, if you look closely at you know, where I lost one of these objects Or you know didn't follow up With this very regular structure here Also, that should be able that should be possible to To be captured by features to say that, you know, that is a mistake that must be fixed Okay, so there are my reasons for optimism to brings me to My wrap up here We developed these things open source So we use vigra for the image processing and basic machine learning I want to advertise open gm, which is a very nice library for combinatorial optimization That's not really meant to be readable, but so open gm Either implements or wraps all the state of the art inference methods that you have And we use elastic for this interactive learning for batch prediction For seeded segmentation and then the new version for object classification summarizing I've shown you this canonical pipeline from raw data to superboxes to a merging step right now Even a single human is still better than us, but redundant tracing So many humans are required to produce fully accurate results And For us on the computer vision side just the sheer amount of data. It makes everything hard So even just you know, adding numbers becomes hard when you deal with jiga and jiga voxels Tomorrow means Few years down the road So I would say a single handful number of years down the road I think we will have automated tracing In high quality data here are some of the directions that we and others are exploring What's limiting us today as automated people, you know brains, of course as always but also programmers I think because It's not in the interest of the ordinary phd student and you know, they are really the ones who Push science. It's not in the interest of a phd student to develop a huge infrastructure They want to get their paper published and then you want the next phd student to be able to build on the work of the previous one And for that we really need programmers Those are the brains. I refer to programs also brains, but you know, it's a different kind of task here and then The people who are very By well who have a great affinity for biology With some happy exceptions are not always, you know The greatest experts in computer vision and the greatest experts in computer vision They typically have no incentive to after having published a paper Go and spend the next two years and making a nice piece of software that Gives this tool to biologists really So in this computer vision community, there's no incentive or funding to you know make this program writing step and biologists on the other hand say You know what you're doing is not biology. So we don't fund software development. So that's a practical problem and then a few years ago um The lack of data was a big problem and thanks to initiatives like the open connect home and the iNCF activities I think data is accessible fairly well now The next bottleneck is the annotations So where humans have said this is connected or not connected We need this both for training these methods, especially those with many parameters and powerful methods have many parameters And to gauge performance and to make you know go beyond these pretty pictures and Measure progress both within a lab and between labs So I hope that um a few years In one of the next iNCF congresses, there will be someone on stage With a mission solved talk And in the meantime, I thank you Thank you Thank you, um spectacular talk again, um, I think it's incredibly uh impressive the Um the way in which we're actually you are able to address these questions which previously, you know I had always considered kind of a science fiction challenge. Um, now having said that, um It's interesting that that obviously when you zoom down to the the lab to the em level You're always kind of very much in the weeds and you're looking at intrinsic connectivity within very small voxels Within very small volumes of tissue But it's one of the characteristics of neurons that they can be up to meters in length in terms of their axon Um, and when people consider connectivity In terms of the organizations of systems They're usually talking about how these long range connections transform information from one area to another. So, um, how far away Oh, so in terms of the partitioning problem and figuring out how Uh, a population of a chunk of tissue is is is wide up I can see that's very much in your wheelhouse. And that's what you're aiming for Could you comment on how this work will inform the questions of Neural connectivity at the level of populations and ensembles of cells Possibly millions probably hundreds or tens of thousands of cells communicating with each other over large spaces I think this is going to happen Um, or I believe this is going to happen Um, when I say that I hope to that someone will be able to claim mission accomplished in a few years I'm referring to FIPSM data and I've, uh, you know, I've Made a slide here, uh, which shows that no method is perfect So FIPSM currently gives the nicest images But it shows much too small an area to address the questions that you have Raised even so it allows to do interesting biology and I think having dense FIPSM segmentations will will offer new insights biologically um in the long run we need to go to one of these methods and, um The so the serial bug face people and the serial sectioning people Um, they are working seriously towards, uh Imaging entire brain regions and I think we will have entire brains not too long from now and, um So This much lower z resolution Obviously makes the automator tracing harder, but I think that once we can do it for the FIPSM I am hopeful that in another state we will be able to do it on or the community will be able to do it On these kinds of data sets also and they are definitely going in the direction of these long range connections um, I think imaging-wise The kind of length that you refer to will be available a few years from now Okay Maybe one short question very short. I'm very practical and nested question So you showed us that your current segmentation pipeline makes mainly mistakes at the very fine, um, little branches If it does not make errors at the larger branches and the answer is yes, it does not make errors there Um, does that only mean that you need higher resolution? The problem is solved If we had even higher resolution isotropically it would You know, it's it's always meeting in the middle You have the computer vision people and the imaging people and we are walking towards each other So the images are becoming better. We are becoming better performing and higher resolution would help I, uh, you know, unless we get the kind of images where we threshold and to connect the components It won't solve the problem But it would help to get higher resolution in three dimensions At low noise levels and with high contrast. Yes, that would help But you still have some errors in the in the larger profiles as well or No, I think our errors are really in the small parts now