 comment and then we will have a session of question answers with code yeah okay good um okay so um i talk with the title of in defense of the scientific integrity of image data and analysis and then uh so um i'm often asked to talk about image data ethics and then uh initially um i acknowledged later but was asked to talk about ethics and then i start searching for what's the situation and so on and then the first thing i i think the most important thing is that it's not really about ethics and then uh so that's kind of conclusion but i would just um follow step by step so uh what the situation is and then what the problem is and so so um so this is a slide about um the kind of general scheme of how do we do image analysis in life sciences and then um we do experiments and then we do microscopy capture image data and then do image processing and analysis and get results and then what we know is that this image processing analysis part is pretty much a bottleneck in this flow of work um so this is already five years ago but i think the situation hasn't changed much probably we have to do the summer and uh we start feeling like that but um so we asked people um um so which step in imaging based research project is the most difficult for you and then uh we had three choices um so the one is experiment second is image analysis third is microscopy so that the green one is this image analysis and you can see that among these three different steps image analysis is the most difficult step um so uh and why is that yeah so that uh today's talk proceeds like this so at the let's first think about why do people feel by our image analysis difficult and then as a part of this difficulty many mistakes or misconducts or publications probably comes out from this difficulty and then that's i think is one of the most relevant uh cause of all those problems and then we go over different cases of erosions uh in image data inequality and then we go to the um see the erosions of image analysis inequality which is slightly different from this data problem and then uh we think about how should we deal with these problems okay the first is why it's difficult image analysis in life sciences so there are three reasons that i just now try to explain to you so the first one is that um number one we love images probably too much and then uh because of um we have a history um even before this modern biology started that um uh we call it natural history but from this time we did a lot of sketches biologists and then this is one of those sketches it's beautiful and then uh when we see images under microscopes um we kind of us um get really strong impression of beauty and then uh we capture those things uh with uh ccd camera these days but um the the thing is that we kind of get um this confusion between this beauty and scientific value that it has is somehow we cannot we not really separate those to this these things uh in um good ways but in fact um what we're doing with images digital image data is that those are numerical information that we're capturing so that we're capturing but we are measuring and then so that this is a one example image of a single cell and then uh as you see uh that did a lot of dots and then if you increase one of those uh if you zoom up the image and then start seeing those pixels and then you see a lot of these squares those are the single pixel and then underneath in fact are numbers so that uh if you try to see those data those are the measured values as numbers so um um while we're doing measurements uh we're kind of um um seeing those images and then we kind of distracted those by those patterns so that's I think one of the difficulties that we have we cannot separate those two different impressions difficulty number two so uh this is uh um um about image analysis versus bio image analysis so image analysis is a term that has been used pretty much already in computational science that uh if you open this very famous textbook by Gonzales and Woods digital image processing you can find uh the definition in the textbook so this is a computational science image analysis is a process of discovering and finding and understanding patterns that are relevant to the performance of an image-based task one of the principal goals of image analysis by computer is to endure machine with the capability to approximate in some sense a similar capability in human beings that means that um what we're trying to do with image analysis is let computer to do something like human recognition so these days we call it uh artificial intelligence uh but in any case so the image analysis in computational science is to mimic so to let the computer to mimic human recognition on the other hand so uh what we try to do in live sciences is a bit different so uh here's the definition I wrote together with Sebastian Tosi uh in this book in 2016 so in biology image analysis is a process of identifying uh spatial temporal distribution and dynamics I should say or biological components and images and measure their characteristics to study the underlying mechanisms in an unbiased way so that there is some contrast with what we want to do in live sciences and what is intended image analysis in computational science because in case of live science by image analysis we do not have to be bordered with similarity to the human recognition we rather want to get rid of such uh bias we don't want to have human recognition because we want to have um measurement that is as objective as possible but uh we rather think that human recognition is biased so that there's kind of this two different type of values here that and then we try to walk together and then um but we are not yet to solve this uh difference in the goal that we have we're trying to merge them and that's what bio-image analysis is so that uh this is the second big quality um what I would say is that bio-image analysis is a new field in life sciences and then the way to teach and learn is not established yet we're trying to do it uh in no years but um yet people feel like it can be learned through image analysis in computational science partially yes but uh partially not so that uh we have to make these two fields live sciences and computational sciences to be merged and then create a new field bio-image analysis to fulfill those new values the third point so this is what I call a legal block pro problem and these are the um a lot of these legos and then uh this in fact is similar to the situation of computational resources we have in bio-image analysis but uh all these tools are generally called software tool and then but this is too crude we have to have a bit higher resolution some categories within the software but we have collection we have components we have workflow and then definition is in the next slide so I just go and come back to that the component so uh components are implementation of certain image processing analysis algorithms such as class Gaussian blur workflow is a set of components assembled in specific order to process image data and yield numerical parameters relevant to the studies of biological system for solving specific biological question workflow templates are a general form of workflow that offers us to tune algorithms or to swap some of the components such as track mix collections is a package of bundling components uh with an interface or api to use them to construct workflows software libraries uh which could include image map the python psychic image and so on so if you go back to this uh schematic uh figure so that we have a lot of these components which is bundled in a collection like image j and then in case of image j we have like 500 or in case of fidget there's 800 components bundled when you download and then those components that um um you get it as a collection and then from there in fact you have to choose right component for a right step in the workflow that you want to explore the biological image data and then put them in order so that you get some numbers both stats of visualizations so that the difficulty here is like this but this collection of components is this is like a bag of Legos now so Lego is this okay it's a bag of Legos but the problem is that when you download those libraries or software there is not much of hint how to put them into this workflow right here okay so that the difficulty is this red line arrows right here so um in case of Lego you have when you open this Lego box um so what happens is that you start with opening the manual instruction manual and then starting from this first page and then if you follow one by one one page one page uh one page by one page the next page and what you can do is that you just follow that manual and then eventually you get very nice millennium falcon or x-wing and so on so that but in case of this image analysis we have this a lot of these components uh which we don't have much of instruction manual so that the difficulty is that we have a backup Lego without the instruction manual and then we need to construct a nice workflow uh out of that the backup components so that um we kind of um I just listed three uh difficulties number one visual impact hinders the objective examination by its numbers number two the fact that biomass analysis in new fields is not well recognized number three resources Lego blocks without instruction manual is the situation with the um the resource that we have so are there any efforts to fight against these difficulties but one is of course we made this uh um working with this uh new BS network of european biomass analysts which are now we have 300 members and then we are organizing trainings and then we're trying to publish we want we're making uh web platforms for my biomass analysis we try to um develop carrier path and then we are organizing conferences um of course i mean but the support is until september 2020 now but we try to extend this activity to so that we can even get more organized action to support uh fight against those difficulties um so to explain about this why uh we started this um so here's a kind of simple figure of um as a metaphor with the sushi uh um creation so that uh in case of this uh sushi there are so if you try to make your sushi by yourself uh seriously what you do is that you're trying to go to the knife shop in tokyo and then try to find out so which knife i should buy and then what you get uh you become shocked is that you go to the knife shop in tokyo and then you have a lot of different types of shops uh here i mean like this so there are a lot of these knives and then you don't know what which one you should buy right but of course i mean um um the professional sushi guys um so these guys know which knife should be used uh for which fish at which part of this cooking procedure uh so that uh the set of knives that they have they exactly know at which step which knife should be used and then they sharpen them and then make a very nice slices of fish and they end up in this nice sushi so this is analogous uh okay so and then there are for these knives there are bladesmith who actually are concerned with creating a robust sharp very good knives and then uh bladesmith um do not really care about the cooking procedure itself but they know what they um they are really concerned with this uh each of these knives of different types so this is analogous to this situation that we have a different roles of professionals in biomass analysis that bladesmith are like developers and software packaging are developers and software engineers who are creating and making maintaining or implementing new algorithms uh as a computational resource and biomass like guys actually use those knives or software tools and then create workflows and it ends up in a nice sushi for good results um if we go back to this uh schematic figure of these resources so uh these collections and components so those are created and maintained by developers or mathematicians who create each of these components and then biomass analysts are concerned with this workflow so that in case of developers uh they are concerned with how efficient what's the higher the speed is better so that um and then it's also that generic uh implementation is more important for this um for this collection and component on the other hand biomass are more concerned with accuracy scientific accuracy scientific adequacy and then the one unique aspect about this workflow uh construction is that it's very specific so that sometimes the workflow is not general at all it's kind of tight pretty much uh tightly to a very specific radical question so that uh it doesn't have to be generic but when I read those uh papers um about uh computational resource in life sciences it's always um mostly uh is written that so how general it is how efficient it is and uh so that uh but on the other hand it's um in that in fact this biomass analyst has a different type of interest which is not general so that we cannot sell the work in a way like developers that's kind of difficulty we have I think is um so that because we don't have those textbooks for uh biomass analysts themselves uh we we try to kind of make these uh we already have these two textbooks and then we're now creating the third one but this is uh both are freely downloadable so that you can access these um short URL and then you have them uh you can download them um and then use them for uh creation of your own specific biomass analysis workflows um um and uh another new news is that uh there's a f1000 gateway uh of novious is starting up and then uh we we don't have contents yet but we try to link uh training materials and then other um bio resources so that the people has a unique portal to access all those biomass analysis resources okay so come back to this uh um um the problem of biomass analysis so that there are many uh pending issues in biomass analysis but one of the problem a big one is integrated crisis which means so this is a quote from 2013 paper um um in plant cell the journal symbology performed a detailed study over the past decades that study found that 10% of articles accepted for publication included inappropriate manipulation of image there okay so that uh a surprisingly large number of the authors appeared unaware that they had handled image data inappropriately so that um there's probably you notice already that there are problems with this how image data are presented or analyzed and then this situation hasn't actually um changed I mean I might be decreasing but uh this is a situation 2016 uh that uh this is uh um 2016 nature news so that uh this is a manual inspection of 20,000 biological papers and then um it's a manual um um survey there's a manually checking uh many papers and they found that 4% of papers contain inappropriately duplicated images so this is only about this specific type of uh mistake or misconducts but that duplication is a copy and pasting of image from one place to the other in the same paper or across uh two different papers from the same same lab and so on so that um and then this 4% is already a lot but if you include other mistakes or misconducts it's probably much higher and then uh so that um of course so these problems as you may know causes a lot of problems because uh it causes carrier disasters and then as efforts by others based on the fake results or mistaken results becomes useless and then workflows for verification so that uh when you have such a problem it's often that there'll be a committee uh organized for um the uh verification of all these problems and then uh this is a lot of work that in all its enemy of science so um um there are two different types of problem in integrating so the one is image data and the one is image analysis then um so that uh we try to go over these things uh to see the first example cases so cases of image data problems and then cases of image analysis problems okay so let's see the erosion of image data in a greedy so that it started with uh 2004 there was an article written by Mike Rosner and Kanes Yamada uh in JCB and then uh so the title of what's in picture the temptation of image manipulation and then there are a lot of different cases uh was um presented in this article showing that there are problems with these um image data and then um just one year after that there was a very uh um a case that became really famous is this stem cell cloning uh cases so that there's a many different uh type of manipulations i mean from this is already 15 years ago and then uh if you see from this now uh i think it's pretty simple but for example here but this one is rotated and cupping placed here or this one and this one in fact is a same uh image which was cropped into two different parts and then uh used as the different conditions and so on so um in 2015 the embojano editor uh burned for fair fair um made uh kind of um um study on the cases that they encountered uh they actually uh um had with the submitted papers and then they categorized all those um um cases into three different levels so that uh it's very uh light one and then uh slightly serious one and then really serious one and then depending on that um different level so that uh in the um easy case they allow revisions and no report to the institution that's 12 percent and then um may allow revision rainbow to the institution depending on the how the author is interactive this is eight percent and then the very serious ones reject and report when they found image modification with digital obfuscation and then like a splicing cloning insertion of select division then this is a less than 0.5 percent so that in total there were about 20 percent of papers submitted papers had some problem uh and then of course the serious ones are much less but um the problem is that 20 percent is the kind of 10 percent of the the problems even without authors recognize themselves until they were told and so on so um let's see different levels so that um with this uh very easy case um there could be a confusion in uh uh using so this is often the uh the one of the um case that you see often so this beta acting control right here loading control here like this is a copy and paste it in different conditions uh kind of reused so this is uh of course it's not really good and then um um these type of um cloning of same image and using for different conditions um is of course I mean uh sometimes so lab recommends it or something like this could happen but in case that is not good and then uh the second case here is that um I'm here this image and this image supposed to be a different condition but the same image is actually there and then uh so what I read from this uh um the what happened to this is that um the editor had uh um uh asked the the authors and then they replied that they were very busy and then uh when they are making figures they actually mixed up the same image for different conditions and copy and paste but I mean that could happen of course uh but uh we don't know um this is from uh uh Rosna Yamada uh paper in 2004 that um you have uh uh this is a um a paper a figure in a paper but when you try to increase contrast or change the look up there where you see that these guys are actually uh copied from different place and pasted like this so that they have somehow more cells in the field um um so uh a bit more now level two so that uh data beautification that could actually um um that could change the resolutions okay so that um this is a case with the science paper in 2015 and later retracted in 2017 so that um um it's kind of a complex figure so I try to explain a bit detail so this is a published results in the left side and then uh if you see here that you have a three different conditions uh control s i r b and then s i b r c a one so this is a rescue condition in the right most and then this is s r n a and there's a control condition and then you have g1 and g2 phase in uh mitosis phases and then three different rows so each row is um um this is a dna signal this is a rat 21 signal and then this is aura b signal and then what I'm trying to say is that so that uh when you try to see this uh signal difference in g1 and g2 phase you see that this rat 21 has more stronger intensity in g2 phase and control whereas in case of this s i r b treated cells you see that less signal I mean the signal doesn't change there's no uh intensification of the signal here but and with the rescue experiment you see that this uh intensification again like this um so that this was questions and then uh after properly adjusted with the effect image uh intensity contrast you see that there are not much actually difference you can see from this original data that is treated with um right contrast uh enhancements so that of course so uh with image j um I mean for any kind of image uh processing analysis software with this uh single nucleus so this is one example that is unrelated to this uh previous slide but this can be intensified or maybe make it dimmer like this um just by adjusting this brightness contrast and the kind of same thing happened so this is from the same paper uh with a different figure but you see that um so this is also um two different condition this is like cold shock not treated not treated with cold shock and this is a after cold treatment this is c i c i n minus and plus that's cold shock and then uh after this cold shock you see that this green signal is gone so this is a histone kinase distribution and then around this red dot this green signal is not there anymore with this cold treatment whereas so this is a some rescue of this mutation that you see here in the top row uh second row this rescue experiment where you don't see this loss of green signal from here um but again this is uh was questions and then when original data was uh recovered and then doing the same kind of contrast adjustment you see that this loss is not there so that it just stays like this so um this again is also you can do a 50 image like that i mean uh you can just freely change this contrast of this red signal against green signal and then see that this is dim and this is more higher so um um of course um um uh this was not manually but maybe the authors did just push auto button we don't know uh and then depending on the situation you might get such a different contrast by auto button uh so we still have uh there's no clue to say they did it intentionally or they just didn't know but the fact is that they made the wrong results and presented um level three so it could be that there could be um a manipulation that is very much intended uh in a way to uh support their own conjecture and so on so for example so this is a deletion a word from uh gel but uh this is original image and then you don't see this band anymore so this is deleted um this case is a i i see them often pretty much so i call them blood DJs uh of gels and then you see this uh this is another blood DJing so that uh you see that uh when you see this uh figure in um paper you don't feel any kind of wrong thing but some people found that this part is actually a copy of this part and uh um are people getting a bit more skilled with such a duplication and then uh in this case this band was cropped and then it was flipped horizontally and then pasted here but uh the people getting a bit more clever uh in making those things so this is not gel so this is uh with um tissue culture uh cell culture so you see that um this is uh some experiment with different level of uh radiation exposure with zero two four eight grades so that there should be a different plates but you see that this part is a copy of this part and then this part is so that so it seems that these guys have uh moved around the same plate and then take picture and then name them with a different exposure level um so this is a kind of physical duplication uh because they're not copy and pasting but you know they're using the same place for different things and then see that um they're from different experiments uh duplication uh could be done also this is uh I already mentioned about this problem the stem cell case that uh you crop different parts of a single image and then paste it into different conditions like this a more kind of severe um and then it's very hard to detect type of mistake is this addition and insertion you start drawing so that uh this is uh from one paper in 2015 and then this is a yeast experiment trying to study the symptom of cell wall formation and then you see that this actually is a drawing that um the cell wall formation was drawn in so that uh this is really a tragic case because after a PhD student of a University of Lausanne couldn't reproduce the data from another manuscript she was preparing some med she contacted her former student who admitted he had fabricated the data along with two figures and jcb paper so that they couldn't find out until they really do this trying to reproduce the same results but they couldn't and then uh the guy actually the guy I don't know I mean guy or uh the person uh who actually said that um they drew this signal right so um um such a kind of editing uh is of course could be deletion and then adding dots are also there but in this case this is uh immuno gold data and then you see that here original image there's a you know um very low contrast immuno gold dots like this and then in the presented submitted paper you see that this guy is gone and then now you have more contrast with these dots surrounding the space skull so that uh it's probably so what I guess is that this is a docking protein that is uh they want to enhance the contrast to insist that these guys are on the surface of this fiscal and so on um of course I mean so they kind of made these dots you know I mean they're there uh slightly there are you see that these guys are there and then they drew those things but of course I mean deleting and then uh marking them with a black color is of course this very bad manipulation okay so I we just went through all these different cases of misconducts uh I mean could be mistake or maybe uh fabrication but um there are some tools that's appearing like for example inspect j in image j tools for detecting cosmetic surgeries so that is um allows you to find out such a for example copy and pasting part of the image to the other but you can see them with uh by making a different type of lookup table and then uh try to just uh visualize those uh manipulations uh like this and there are studies uh going on to automate these uh detection of uh crowds and then this is one example uh from this uh scientific engineering ethics 2017 so that uh this is a kind of a um simulation of the uh the manipulations for example that you see that this guy actually copied and pasted multiple times to erase some signal right here and then this algorithm actually recovers uh this multiple step of copy pasting uh from this data uh like this with this the color coding is these different steps and then you can see that the first copy and pasted this part here yellow to yellow and then afterwards it comes to this larger area that I copy and pasted like this to cover up whatever the signal in behind and then there are detection services so I mean there are multiple but this is from nature 2017 that um this one research engineering company already offers automated process that costs 10 to 15 part paper and I think I recently saw another uh nature news that they're actually uh company starting in Germany that actually doing those things uh much more systematically okay so um that was in fact there was copy and pasting and drawing and so on which actually the handling of uh image data had some problem or some uh faking but there are problems also with the image analysis in degree which has not been really um uh discussed at so that uh it could be that um the erosion of image analysis could be maybe it's it's kind of pattern so when you compare patterns intensity of situations there could be some problem and then there's also there could be issues with thresholding especially with spots intense measurement and a bit depth treatments and so on but I'm just trying to show what kind of um wrong image analysis um could cause different types of problems so that uh we need the image segmentation you often do this image thresholding for example this is a um artificial image with the intensity gradient and then if you set a rule let's say that um if the intensity is above certain pixel intensity then you say one otherwise zero then you can let's say that you have this pixel intensity distribution and then if you set above hundred or equal to hundred is now it's a signal and then otherwise it's uh it's not a signal and then you say that this part is white and this part is black so that's why we kind of make a boundary of the signal like this that's often the case and sometimes what I have experienced several times uh with the people asking me to do something is that um let's say that gold analysis measure volume detected spots you want to measure um you have a 3d image data set and then you want to measure the volume of uh three-dimensional spots that you have and then when you try to explore what's happening you start seeing that okay so it could be you know this is a three-dimensional stack and then if you see in detail so it's a x y image so the spot looks like this and then in xc it looks like this and what you do is that you try to threshold this uh book cells and then in image day you would use a 3d object counter to measure this volume and then let's say that this is a 0.01155 cubic micrometer and then uh you try to collect those a lot of spots and stuff but in fact I mean uh if you look at it you immediately know that this is uh um it's a point spread function that is you're looking at and then uh so if this is especially is diffraction limited spots uh you are merely measuring just uh volume point spread function using this certain main threshold that the the scary part is that if you don't know this you can measure um not uh you have to be careful about those uh what is diffraction limited and so on so this is the second case so you will sometimes see so that uh let's say that uh the goal of analysis is automatic measurement of DAA contents and then you have a DAPI signal or so on and then what you do so there's a lot of this nucleus and then you threshold them and then you get the nicely segmented image and then in case of image day you use particle analysis and then you do add to the intensity measurement and of course it makes connection to the component analysis and then you get a lot of this labeled signals and for each of them you can get the integrated density or um uh and so on so that uh this looks okay but in fact it's not really right because segmented area is affected by the intensity so that when you do this uh segmentation you're using the intensity as the definition of the area and then you're trying to use that area to measure the intensity but there is certain type of totology there and then we can simulate those situations for example this is a simulated image this is a completely um same size area circles so this one this one it's just that the intensity is different but the one is darker and the one is bright and then um Gaussian blur is edited to mimic the situation uh real the realistic situation but in case if you try to threshold this using O2 algorithm uh what happens is that this blue circle is this uh the the when you draw this circle this was the original setting and then this was well and exactly the same but when you do this Gaussian blurring you see that this segmentation result in this yellow circle which is a bit smaller than this uh the circle that you have for brighter spot and in fact so that um the the what I did with this simulation is that the intensity of the signal in the spreader circle should be 66.7 percent higher but when you measure this using this old threshold what you get is that only 41.1 percent uh higher so that uh you would see this pretty significant difference in the comparison results okay so that uh I don't go detail in this bit depth conversion problem but if when you have this 8-bit image and 16-bit image and then if you compare let's say that downscale from 16 bit 8 bit you lose a lot of information but uh of course the signal wise it looks pretty similar but um like this um but this is a one example image from image j and then if you take this pixel intensity profile you see the peak here and peak here and then I mean the peak looks um the distribution looks same but of course if you see this y uh scaling uh it's pretty different and then uh the problem is that um when you have this automatic uh scaling that is there in the workflow you might compare uh brighter one and darker image let's say that you have a control experiment and then you have a um inhibitor experiment which supposed to have let's say half the intensity based on biochemical experiment that you do uh independent of this image um then you have a half the intensity but when you capture those images and then in the workflow if you have an automatic scaling because of this bit depth conversion is there in the workflow you might end up in the image with the same intensity and then you might get puzzled say that okay why do I have this contract contradictory results in biochemistry and uh image analysis right I mean uh so you have to know those things um any case then what should we do so that's a kind of action what we should try to avoid all those problems so that uh one thing is that we can make create a guidelines and rules and then uh other ways to how can we make everything reproducible so that we can check afterward so that the guidelines in fact are the people are discussing this years so that uh one famous moment is chromase uh guidelines digital images are data and it should be treated as such uh this is a more good method in more about 2014 and then uh the problem is that hard and fast rules that apply to every image forming disciplines are difficult to create the national academy of science found this out when they were unable to agree on guidelines so um uh this is probably uh um I mean if we think of it it probably uh uh clear that we cannot agree on guidelines because in a sense if depending on the what you want to know there are things you can do and you cannot do but it depends pretty much on the goal of your analysis scientific goal so that you cannot we say no to everything um um so that uh depending on some conditions you can do that so that there are this guideline at least a lot of these uh rules so that you can look at this paper I can take pick up some of them like manipulation digital image should only be performed on a copy of unprocessed image data file this is because you have a metadata that comes with the image file so that uh it has to be uh together with the original data so that you don't lose those informations but of course digital images that will be compared to one another should be acquired another identical conditions and any post acquisition image processing should also be identical so that we saw that case with this science paper retraction that um there are contrast enhancement that happens differently in different conditions and then the result was presented you shouldn't do that um it it goes on many I mean this is you cannot use uh um you have to be careful with the resolution so this is a Nykist sampling uh theorem that uh if you have a different resolution capturing you have a different result of course um it's mostly in case of uh image data people discuss is uh about this uh spatial resolution but um one important thing is the temporal resolution so that if you see this movie which I get from youtube um you see that this uh it's a strange helicopter which can fly without rotor rotating and then this is because this if the capture rate the video frame capture rate is same as the rotation I mean it's not same it's if it's um in a synchronized with the frequency of this uh uh the rotor rotation of course it looks like it's stopped so um um this is a um okay it's a data case problem but um even in the science it has a problem because uh this is a simulation of Brownian motion so that uh the complete path it took is something like this very um um a lot of wiggles and then uh if you start from here it ends up to fear and then with the highest time resolution the total path that it takes is really like this is read by but if you capture this uh Brownian motion with much less time resolution this time point and this time point it's probably that total distance is travel we estimate that as a straight line from here to here but if you try to get the speed of this uh particle movement what you get is that um in terms of this the total distance it travels is much longer per time than when you take these pictures only that two time points so that it becomes much much faster with higher time resolution so that is not really you know so that the the speed uh measurement depends on the time resolution so i mean uh so you have to be careful about those things um and then of course i mean rotation of the image or um um if you try to increase the resolution by image processing you get the different uh pixel indices and then um so this is a robot Hase um so uh he will be talking next week or so uh with his uh i mean so i saw his trader uh entry um he was uh you know trying to rotate his image and then you see that the degradation happens if you try to rotate and then if you try to rotate the second time based on that rotated image and if you continue that you get a lot of degradation like this okay so now what do i see is that there are uh there could be two types of ruling so one is that like chroma's data chroma's guideline we can try to restrict the data handling and analysis behavior so let's say that we have a we're going to have a bio image analysis regulations so that um another way is that we try to let people free to do anything so we don't even give any guidelines or anything but we ask people to report everything okay so report data analysis that others can reproduce so um um in terms of this second case what i see uh have often with this method section and papers is that methods that are unreproducible so that for example that it used to be 10 years ago that people only write image analysis section that we used image j for image analysis and then we kind of repeatedly saying that this is wrong behavior and then people started to write more but um and then for example this one looks a lot of information about image analysis but when you see the detail for example here the sum image was decomposed with autocornex software media cybernetics the convolution was performed to reduce noise and improve the resolution of a process okay that's good but there are various algorithms for the convolution which algorithm was used i mean we cannot at least i mean reproduce this method already but here co-location analysis performed with a metamorph software and image j so 0.4 micrometer was used at the upper limit for distance between centroids so there are many co-localization plugins in image j so which one did they use so we don't know so uh we cannot verify the results there are so so uh generating kind graphs so is was this some manually i mean there's no indication about how they did it so that the methods looks are pretty detailed but not even enough so what we try to propose is that um we have to even uh make this data handling completely reproducible which is not so difficult for example so this is a sample image for me in j and then we're making a kind of panel of these two embryos out of this image creating figures of such and then this can be completely scripted okay so that this is just a data handling but for every type of image analysis it's possible to do those things as long as results can be reproduced from source data by any others the image analysis procedure can be evaluated so that uh we don't ask to not to do anything uh not to do this and that or you can do this and that instead we ask people to write all the things that they did and then mistakes and misconducts can be discovered so that uh we think that text is too much you can write everything as a text but it's probably too much so instead you can use command recorder or similar and then submit the record when you submit your pick or you can write your own scripts and then scripts so normally when you think about computer programming you think that this is the kind of specific to this computer programming but it's actually is the best documentation of your methods so that uh think that computer programs programs are not only for automation but for documentation of your methods so that the important thing is that this open source and data archiving so that we have access to this data that was analyzed and then if the source code is there it's kind of complete completely reproduced so this is kind of one slide showing how can we submit reproducible biominus analysis workflow elements like a workflow course you can use github and the nodal and minted with digital object and identifier and it can be a very short script as well and image data can be uploaded to some public server where it's accessible and then these two needs to be associated by some short text explaining how to run the code uh with which data and so on okay so uh i come to the conclusion now um with um so there are two types of misconducts image data handling and analysis with march normal so we see cases that people just didn't know and they made mistake there are rare cases of this uh that um um the guys who sold one sold to the devil at the crossroads so you you might not know this expression but there's um in a deep south in the blues musicians uh when you want to be really the genius of blues music uh guitarist then what you do is that one day you have to go this crossroads and then you wait at the crossroads and imagine the devils come down and then ask you so uh could you give me your soul then i give you the blues right and then uh uh the guy becomes uh so you might not you might know that robot joseph the famous blues musician who was said that he sold the soul to the devil uh because he was so good i guess um so those two types right and then um we always uh um um um um so about this uh mishandling of data it's actually it's not limited only to the image data okay but this is a first paper our first page of mendel's paper so that you probably know mendel even if you're a computational scientist and then um this is this manuscript about this uh inherit um heredity so that uh you probably know this uh he counted uh the number of seas with different shapes and then or colors and then you see that the plant's one has round shape seed and a wrinkle shape uh seas 12 and so on but i mean it's three two one right and then uh so that uh this is a famous uh observation that was made by mendel but um later on uh fissure 1930s uh famous statistician fissure sold his data and then say that he must have beautified the data right but we don't know whether that's right or not but uh so that uh according to this statistical analysis of all these results outliers of uh data looks like or removed from this uh table so um um in case of image analysis uh it's actually is not unique to see image analysis is what i'm trying to say so let's say that we have a kind of this counting right but we have original data that is control is there's 144 cases and then treat it 155 and so on maybe it's just an inhibitor experiment and then um you can multiply these numbers so that uh there should be two times more and then you start saying that okay i decreased this number to 99 and then this number to 250 uh 210 and then you submit uh paper but is it bad of course but um this is the same as enhancing the contrast let's say that this is a single pixel this is a second pixel and then the first pixel has 144 pixel value and then second one is 157 and if you try to adjust this contrast enhancement uh interface uh then you get higher contrast like this so this is exactly the same as this but if you try to kind of make an outlier if you kind of make a Gaussian filter or median filter you can remove the outlier right but um it's not specific to image processing analysis that we have this problem but that i mean it's just that two things should be avoided so the one thing is that image analysis with marginal noise so that if you don't know if you don't study image processing analysis don't do image analysis right but uh but of course people want to do that so what you need to do is that even if you don't know anything you just write everything what you did in the method and others especially reviewers or the paper can evaluate or your uh lab mate can evaluate what you did in case of bad guys the ones that sold one sold at the cross store today at Deville so that in this case it's it's rather that reproducibility protects us from these bad guys because everything has to be documented so that there is no room theoretically to do something wrong okay um so this is um um um the the third slide from the last slide so um so we often think uh we often say that um um this is a ethical issue but i think this is uh really about uh tradition of scientific methods but uh so i got from times of the 70th century uh since this modern science started i think we try to describe methods as explicit and clear as possible so that other people can follow uh so that uh we kind of uh losing this type of scientific tradition in the midst of this complication of uh computational methodologies but as i said it's possible to do that uh to make a complete documentation of methods so that we should do that so that concerning image data ethics of course there are so that uh let's say that but the ethics is something to do with social values but for example like if you're trying to make a figure that is also uh visually uh recognizable by uh color blind people so that's ethics or um so ethics is more like you know so it's it's something that um the value of society that actually we have to think about so that's ethics but specifically about this image data and analysis handling it's more about scientific methods so um this is a take home message so um let's propagate the conventional publishing reproducible image analysis so that's kind of the um the message i'm trying to deliver to uh many people as many people as i can um i would like to acknowledge uh several people so uh Nuno so he's now in Dresden used to be in portugal with Gabi but um it's actually i started making this um thinking about this uh integrity issue based on his request and then Simon here's an AT heart we've been discussing very long time about this trying to write the manuscript based on this and then Berm could prepare had made a inspiration in the initial starting of this talk with his slides and all these um slides we have been discussing with the reviewers so Perine Giovanni, John, Sebastian, Martin and then it's heavily discussed and then today i thank Rocco, Marion and Jillian for moderations thank you and then for the questions i try to see what has been there from uh so that i switch now to um another slides could uh if you want we can resume the questions that they have been made during the webinar yeah so you don't need to show this document so um there are several questions about the fact that uh many times you try to publish a paper and often you choose the prettiest image instead of the most representative and even though you have a large dot set on which you do image analysis often you are not asked by reviewers or you are not given space in the paper or in the database to upload all the materials so question is i try to resume don't you think that this culture of uh documentation open science should also be diffused a bit more among reviewers okay so um reviewing so that uh i think the primary problem with reviewing right now is that um the number of reviewers is normally limited to two or three per article so that uh on the other hand the complexity of uh life science papers are increasing so that there's a lot of these uh genomic symbols network science symbols uh bioinformatics symbols there are different types of technology that is put into one paper and then for reviewing each of those technological uh applications i think it's it's quite getting more complex than what one reviewer can handle and among all those technological difficulties i think image analysis is one of them yeah and then uh so that uh i think uh so that aside from this uh how to make these uh um so i mean the primary problem i think is that um there should be i'd say more uh better reviewing itself because uh um many reviewers what i feel so that uh is that they are not even looking at all the details of the mythos right so that uh we first should ask that the publishers to be more to become more serious about reviewing process to keep the quality of science to uh better standard right that's one thing of course and then secondly that as i as it's mentioned that um having this open source uh does might um solve the problem but it's probably then um it's kind of a big longer term problem is that um the reviewing process becomes more interactive and online in real time and maybe the reviewing process just continues forever in that case so so in that case of course i mean that's we have to kind of uh um make a revolution in the the way to keep the scientific quality uh publication system in that pretty different way so that uh of course i mean i mean i agree with the idea uh but and then by making everything open like um keeping this bio-archive and uh archive or met archive and so on to be the majority of the paths to share the scientific results it we might be able to uh see the changes but i mean that's far beyond uh what i can definitely say right now is that okay yes there was also the comment that of them uh image analysis is made with a commercial software and so not everything is documented well so do you think after we change this culture among reviewers we make reviewers more available about image analysis they will be asked company to document better their software and this can come as a common advantage for everybody yes i think so so i mean so i think um it's kind of out out of date to hide our rhythm and sell that hidden procedure as a value but um so that there should be change in the business model but i'm not in a position to say that the you should change and so on but the the direction that whole science is taking path towards this open source and open methods right but um so now what i would propose to this old business style is that i mean don't be afraid and open up and then you still probably can make money but i i i really think that um um if you cannot evaluate the methods you know but uh i think it's difficult to take it as a real science okay thank you i ask the other panelists to make some other question otherwise i'll go on with another sure i will i will take one topic i mean i think there has been many questions on thresholds and eventually many were referring to slides so i would sort of think that kota would answer them afterwards and we will post the answers on the forum but maybe there's one which is a bit more generic and i can try to to phrase it uh a bit more general so uh if no one writes in the document so i can still read it so the question is from mirene it says if you threshold with the same equation in different images you would obtain different values of thresholds for different images as opposed to thresholding with the same number um than different images what would be more appropriate so i i suppose here the question is how how do we ensure uh that if we use a model like a mathematical model that actually makes calculations to determine what is the right threshold that it actually adapts to images of very different intensity distribution and always behaves the same to yield the same result yeah yeah so technically so um um so uh the best always is that uh you evaluate this threshold procedure so that you do a segmentation and then any segmentation requires some kind of validation of your results and in fact segmentation is uh act of defining the boundary of structure right so that it's very serious business in biology or life sciences that you have to be really serious about this and then um um i'll just go back to this um um slide so this is uh the slide that um was about this thresholding problem and then i see that when you threshold this you have a different diameter now you have a different boundaries depending on the intensity itself but that but if you try to do apply also algorithms with exactly same square individually for this this actually the result is okay so that uh if you try to threshold each of them should be using the same algorithm but limited to certain area so that means that um um instead of applying a kind of a global thresholding by making this individual signal to be automatically threshold it makes some results that kind of specific uh um technical issue that is there but even then there should be some validation or procedure that actually uh um would be uh scientifically more plausible that is that um you you don't have to work only with single channel just have a second channel right so that um in case of uh nucleus you probably can use lamin for example as a signal to validate what you're doing as a nucleus boundary because uh um lamin is by definition uh there should be embedded in the nuclear envelope and then it's definition the definition of nucleus and so on so that there's such a kind of um definition can be uh if you if you can find the best uh way to define the boundary use that signal to validate what you're doing and then just go on fire so that's I think that kind of a more um the straightforward way to uh validate your results it's not limited to um image intensity threshold is that okay yeah sure thanks got that so there's a number of questions about the journals you know um maybe we can try to like group several questions into one like what is the current status with the journals like how many of the journals actually do have um and do use uh software that can actually automatically attract misconduct or image manipulation and uh in your experience how how have you seen this uh evolving in the past so to say five to ten years so what I know from one year ago is that uh it quite depends on the the company um and the publisher so that uh some some journals are very serious they even hire several people just for this uh um image analysis issues and then uh they're tackling this uh you know really serious but some journals are not spending much human effort to uh um cover this uh image data or image analysis part and then uh there is no tendency in uh impact factor and how seriously are doing this actually so that um that's so so there's a contrast between and then after this I don't know what happened but uh it could be more uh coordinated effort to uh overcome this problem for example um simply increasing the reviewing so that um it's always that journal papers are becoming much more it's always you know um the the publication cycle I mean the speed of reviewing and publication is going faster and faster and faster but I'm not sure if it's good for size I mean it might be good for careers and uh but uh I mean if you think about in the long run like uh let's say 200 years later so how many people people would be still there in the sense and then uh even in a later time people start evaluating that at this time point was really a lot of crappy papers so I think we really want to avoid those things right okay yeah sure so a question and maybe it's the last one that I take and then maybe my own uh Rocco want to take over afterwards um one important question which is related is uh uh it's actually from my father so I'll just read it what's your opinion about being a bio-image analyst should I be responsible for wanting that the users didn't make this the kind of mistakes that you were talking about today like writing paper methods for them so what could be uh say the best policy or the best approach here and what is the role of the bio-image analyst and actually helping the biologist to avoid involuntary misconduct of course um I think uh um so okay so this is a message to bio-image analysts but uh sometimes you have to really refuse right so that you shouldn't be the best nice person in the institute because uh sometimes people come to you with uh two years of image data that actually you cannot use at all and then you still need to say this is unusable right there's an extreme situation but um so that the best is that before this to happen try to convince your research institute or university that there should be professional involved from the starting of the project any type of project that involves imaging that image bio-image analysis is there that they can advise before this tragedy to happen that PhD students spending two years of image capturing that means nothing right um concerning this uh involvement uh so that's kind of a you know very active involvement but um when you're involving a certain paper uh projects and then they need to write a paper I think bio-image analysis analysts should also collaborate to write the paper because um sometimes I mean uh the other people do not have enough uh even vocabulary to explain the methods and then they don't know how to uh put uh let's say script or procedure into text or um or computer scripts that um I think that it's more much better as a kind of publication if such a professional is also involved in authoring the paper okay that's my answer can I ask a question yeah um have a question here about um image acquisition which is shouldn't we address the issue at the level of image acquisition manufacturers needs to be more forthcoming and considerate with regards to today's topic what do you think about the uh importance of image acquisition and do you think it's uh it could help or solve at least some issues or yeah yeah yeah yeah so in many things though so uh so I think that it's always that the closest collaborator or bio-image analysis are microscope people you know but in but in addition it's also that biologists themselves and then there should be a good discussion on what you want to do so for example so I'm just showing this um um the the the Brownian motion assimilation so that uh let's say this is a cell truck yeah cells are randomly migrating you have a path like this and then you want to study something um some topic right and then uh so how do we define the time resolution we want to take these all these uh cell migration uh path so that uh it depends on a question but uh sometimes you just need to take two images so you start with a time point zero and then 10 minutes later you take another picture and you might be able to if you do this for 1000 cells you might be able to estimate you know uh the statistically uh relevant information for your research but sometimes you need to do really take highly high time resolution but this is a you know um you discuss with analysts microscope people biologists and then you come up with the strategy or workflow that includes this capturing step and then maybe a self-profession step and then uh that makes more maybe more efficient and then uh the time resolution that uh in accordance with this Nike's sampling theorem and then a limitation of your budget because sometimes you don't want to increase the image number so that you can handle easily with two pictures rather than 10 000 pictures right so that uh I mean uh so that you don't really need to sometimes and or many cases by discussing you can really narrow down what you really need to do into uh what is actually really required to answer to address some questions is that okay did I answer to your question not completely it was also about the device itself so uh the microscope or the camera is there a ways to improve um the way we acquire images could of course so um there like a kind of old style uh I mean imaging related research is that single person does biology microscopy and image and then in those cases um as I just explained it could be that um so um you could design the whole experiment by yourself knowing what you want to do and then if there's a single person this happens in a single person but nowadays it's a team science uh that you do and then you need to discuss well for example let's say that um I'm just not showing this a self-migration case but if you let's say that let's say you want to study the cell membrane fluctuation uh or self-migration something and then what you probably start doing is that so trying to get a high time resolution high spatial resolution image with a very you know fast capture and so on and then so so one of the old studies is that you just try to take one dimensional scanning so this is not really image anymore but in the sense this is a one-dimensional image and then you try to study this uh membrane fluctuation so on but those kind of things I mean uh you probably don't even imagine nowadays because you have a really high-end microscopes and image analysis systems and then biologists want as great as I can you have to have a three-dimensional membrane fluctuation so on but um it's possible to even make uh I mean try to just use the ccd camera which scans only one line you know so these needs uh the machine uh knowledge how you design those things and then um so this is a device problem but um so those things used to be done by a single person but nowadays there's even three people so it might be even more efficient if you discuss more together trying to address creation in the simplest and the best way as clear as possible so I'm kind of answering from different point of view uh uh in the same uh message notes but was that okay mario yeah definitely thank you kota good good so maybe we pick a last question and then because of time then we we will close go ahead I had one last question uh would you consider writing a set of guidelines for writing methods six sections and terms to use for image analysis it's actually um so I started so last year so I so Simon invited me to his summer school in syria and then uh there I did some practical um how can we uh publication that is reproducible in image analysis and then uh um I'm not sure that was the best but maybe we can discuss among lobbyists again so what would be the best you know but uh I think uh the the web services are developing really uh fast and then uh many publishers even are starting to offer this data repository and so on so that uh we might probably need to discuss you know so um periodically so what would be the best way I mean in terms of practical uh steps that we need to follow um I think the question was also what to write into the material methods section what to like the parameters and so okay so that uh so I think the best is that all the parameters you used should be there and then uh so that but this can be um you know written as a script so that uh what I am strongly recommending about so to say a must is that at least like uh you study like an image demacro yeah so Anna Clem was giving this uh very nice tutorial several days ago but uh so um something like this is really uh I think um so it becomes the uh the way you write so that we can make a kind of an analogy to chemical formula right but uh if you're a chemist you know how to write chemical formula and chemical equation and so on right and then uh this is what you learn in high school or university how you write those things and then if you don't know that I mean uh you actually as a chemist you are out of business right in the same way because we use those um descriptive tools chemical formula because uh it's the best way to transfer knowledge to the others and then in the same way so that I think um sooner or later this scripting at least would be a kind of a a must technique for a life scientist okay so that probably brings us to the end so there are many more questions and we will make sure that all the questions are answered uh and we will make a sort of file or compilation with the answers uh of quota and then we'll post this on the forum uh the video of the webinar would be available on youtube uh hopefully by tomorrow and uh well let's thank all the participants for staying until the end and uh thanks quota for the very nice webinar thank you okay thanks for your participation and then uh hope for your health uh in this minute so by all this chaos okay