 So, good morning everybody, I think it's about time to start. My name is Jan Greve. I'm from the Institute for Neurobiology from the University of Tübingen and I'm, so basically I am a neuroscientist and on half time or spare time I'm a part time developer for open source tools. So, I want to talk about the good and the bad sides of developing such tools for neuroscience and give you my ideas about these things, right? So we start off with something that is kind of serious or has become serious in scientific community. This is the reproducibility crisis and I picked a few older papers that show these effects quite, quite nicely. So there's one paper sorting out the facts, the devil in the details, these is a paper or short comment by two laboratories who try to reproduce their respective results. So they're working on the same thing, same methods, but they come up with different results and the striking sentence out of this paper is our two laboratories quite reproducibly were unable to replicate each other's cell sorting profile. And they really took the time to try and figure out what is the problem here and it boiled down to something like it is a tiny detail in some protocols hidden somewhere and this information has never been noted somewhere. The second thing is the gene or genetics group is rather advanced with respect to data sharing, tool sharing, et cetera. But even there it is tricky to replicate studies, right? So in this paper they try to replicate results by using stuff that was published in the public database and the striking finding here again is the authors replicated two studies in principle, six studies partially and ten could not be reproduced and they found out. So the main key was the unavailability of the relevant data and meter data is the key point that hinders here. So we do have a problem with reproducibility in science and it seems that information, background information, meter data, data about data is the problem here. And the issue is that some information is hidden somewhere in let's say handwritten notebooks and if somebody is supposed to read my notes I think he has a hard time. Other information is hidden within settings and properties of the used hardware or software. Some is implicit knowledge that is somewhere hidden in lab traditions, right? This doesn't even show up in lab books. So I think that open source tools and standardization might play a key role to solve these issues and to overcome this reproducibility crisis. Before I go into our open source tool development stuff I want to talk a little bit about what we do scientifically. So as I said I'm from the Department of Neurobiology and more specifically we are neuro-ethologists. That is we mean we intend to understand how the brain processes information in the context of the animal's own behavior. And our favorite study system is the e-fish or electric fish and these creatures produce an electric field that surrounds themselves and they use it for prey detection, navigation as well as communication. I will just briefly skip over these things. We can use this field, go into the wild, put a lot of electrodes into a river which is the blue-ish line here and then record the electric fields and figure out where these animals are. And this is shown by the three different dots here. And each individual has its own frequency so we can track it over time and space which is cool because we then can find out how and who is talking to whom or interacting with whom. And finally we can go and try to analyze really the signals and see so each line here is an individual animal and they usually keep a constant frequency, though the spectrum shows time on x-axis and frequency on y-axis and they keep it constant except there is something happening, let's say an interaction with a different one. So they can produce communication signals. The second part of our lab is working with the brain itself so we record the activity of nerve cells that process this information in response to something like a communication signal. Don't look at the details. It's just we record action potentials and see how do they encode this communication signal information. Or we do something like system analysis like we play tricky white noise stimuli and figure out what range and frequencies can they encode and how do they do it. With this I come to the tool chain that we use and this is almost entirely based on open source. Some of these tools we produce ourselves so there is the recording side. We use a tool that is called Relax for recording. We store information in open formats like the Nix tool or ODML. We use different tools for data management. This is data joined or the GIN platform developed by the G-note here. And of course we use open source in during our analysis and the code management. Let's go to the data recording side. So our tool is Relax. It's self written by a bunch of people. The main maintainer or developers is the head of our group. So what we do with this is we record several signals. We store them. We filter them. We extract events like communication signals. We extract events like action potentials and store all these things. So we run protocols like for example for basic characterization of the neuron we record how does it respond when we increase the stimulus intensity what we call an Fi curve. As you can see this is the dialogue that allows you to set all the features of the protocol and this is kind of diverse. So I can change it on the fly when I think this is I need to adjust the settings to that neuron that I just record. Or we have some other information that needs to be stored. Let's say information about the subject that was recorded or the cell. So Relax in itself is super flexible since we created our own we have the full control over it. We can extend it by producing what we call research protocols or repro's and we can customize it and we can change flexibly or on the fly or even automatically what the stimulus parameters will be adapted in a so-called closed loop fashion. And it features a hardware abstraction layer and real time functionality. And these are two things I want to talk about in a bit. The thing is it knows already a lot about the settings and the properties that we need to store along with the data to have it reproducible or at least that we can recreate the conditions under which we were recording. So does that mean that everything is cool? Not really. There are certain drawbacks here. The code base is extensive and the abstraction levels make it hard for people that come into our lab or people that want to use that software to really jump in and create something on their own. The whole software thing is maintained by a small team and they all to some extent depend on the main developer. And this is to some parts due to lack of documentation which is due to lack of time basically. And we also depend on some third party software. So the whole thing is Qt based. We use two libraries I want to talk about, the comedy library and the RTI library. And this leads us to some issues with our dependencies. So for example the comedy library which is great. It's a Linux interface for data acquisition hardware. So the boards that we have in the computer that transform the analog signals from our recording electrodes to something digestible by the software. But it is only run by a handful of maintainers and the hardware support depends to some extent on the hardware manufacturers. And some hardware support is therefore outdated. So I just ran into the situation that I thought I need to buy a new card, a new data acquisition board and I looked at the documentation and see oh yeah it's supported. Everything's cool. But actually the support is so outdated it doesn't work anymore. Good side. I can address the guy who did it and ask him and hopefully reactivate him to add to support. The second thing is the RTI library which is an interface for real time processing. And it's maintained by a really, really small team. Basically it's one lead maintainer who is a professor and engineer I think. But he has of course very limited time. What happened here? I don't know. So this makes things really hard. I don't know. Okay. So what do we learn from doing this kind of stuff? We develop our own software that keeps us very, very flexible. The code is open to everyone so we can go back and check how things were really done and figure out if there were issues. Relax and some of its dependencies are maintained by only small teams that makes it critical but it's only a free time thingy. And getting people involved is really hard. So this is something whenever we hire a student to work on pieces of it, it's really hard. And the neuroscience community. So we have some users outside of our lab but they find it really hard to get accustomed with open source processes like sending a pull to say to them oh yeah implemented send a pull request is something they don't accept. And the final thing which leads me to my second point is we need flexible solutions for data and meta data storage since our tool is completely open to changes every time. So the second part is flexible recording requires flexible storages and this is some efforts that we do together with the German neuroinformatics node and the team is slightly larger. I highlighted the ones who are most active but this is not entering through all the time and we did two projects one is called ODML it's for storing arbitrary meta data in a in a hierarchical format and the other one NICS is to a format or a data model actually to integrate data and meta data storage and I want to talk about NICS for a few seconds. So the core idea behind this is to generate or come up with a generic data model that allows to store n dimensional data irrespective of what the data is so it can be doubles it can be floats it can be whatever Boolean strings and we want to be able to highlight regions and points of points of interest in in this data. Further we want to be able to annotate it with arbitrary meta data in principle down to the single data point that we record and one of the core ideas that's that the entities carry just enough information that we can draw a basic plot of the of the data without any additional sources so how does that look like this is a little bit a part of the data model that is behind it so the key element is what we call the data array it stores this information and it stores information about what is in there what is the unit of that stuff of course the data itself and then we have something that we call dimension descriptors that describe this time axis at this point here or anything else and then we can have entities that are called tags that allow us to tag regions or points in this data to make us independent of so we don't want we don't only want to store time dependent information but it could also be something like a dimension could be something like space or so so we have we have an abstraction there we have dimension descriptors and I think I will kind of skip over them they are meant for different purposes something that is regularly sampled in time something that is not regularly sampled in time and this drives me nuts I don't know what's gone okay so as I said we want to be able to tag points and regions and this is shown here so basically we have the data which is the blue line as a function of time and we can just store from where to where do we want to tag things and this does not only work in 1d but also of course in 2d or multiple regions in in basically n dimensions and with this we can create then new entities we can have this data stored somewhere we can have the information for the events somewhere else and we can combine it to something we detected events in that signal right and the issue is we are working with linux we are working or happy are happy with working with c++ libraries but not all the people are happy with this so we thought behind this project it would be nice that we have a we have a core library that is in c++ and the data can be possibly stored in different back ends so far we only support fully the hdr5 back end most of you might know it but we are a small team so we thought it might be a great idea that we have this common library here and just maintain language bindings for python matlab java whatever but this turns out to be really tricky and actually the user is supposed to use something that is a specification of the abstract model to something that is domain specific and as you can see it's really hard for us it's really challenging to keep up with everything so abstraction layers what do we learn from this project abstraction layers make it hard to get scientists involved so using a generic model is hard for them to accept they they rather have their domain specific entities and when we want to attract people to work on it to to to join the project we see that the attractiveness of a project depends strongly on the toolchain that we use so python is more attractive than c++ further when we try to find people for different projects then we figure out that that ui development is much more attractive for them than working behind the scenes and right scientific communities are diverse and supporting large varieties of platforms and languages is really challenging especially for us as a small team and all of these things wouldn't have been or wouldn't be in the shape they are now this is true for nix and for odml if we hadn't had lots of support by the g-note and also for central usually and the g-note actually pays developers to to professional developers full-time developers to to go on with this I think for the sake of time I will skip over these so as I said we are using different open source tools for for data management or code management what can we learn from these so I think there are projects that turn from a hobby to something like a like a service or yeah people make a living out of it so open source does not need to be a hobby and I think there is a kind of a misunderstanding that open source is something that is for free I think there is open source does not imply that that one may not try to make a living out of it using open source tools for data analysis allows us for tool sharing just imagine that you create or large parts of our institute work with MATLAB and they happily use all the functions that are hidden somewhere in toolboxes but if you share this with someone else and these person does not have the toolboxes they they are actually doomed right they can cannot run the code and I think we need it and access to raw data and the tools for analysis is really helpful when we try to do far reviewing of scientific results during the review process of a paper for example right so who does the work on who should do it should it be PhD students I think there is a pro to it yes they learn a lot about the setup the protocols and data handling in general and they improve their skills on programming on the other hand they are unlimited contact contracts that must become experts in their science and not necessarily in programming or developing something like open source so should it be the postdocs or the senior scientists again yes they know exactly what they want what is needed and when they have visited different labs they also have some broader spectrum and and these experience might come in very very handy on the other side the pressure for scientific success is even higher on this stage and most are again unlimited context senior scientists have little time due to different other applications so professional full-time programmers I think yes the probability they probably write code much better and much faster than I would do it the caveat is times up yeah funding agencies fund for software they happy to give you ten thousand bucks for fire for buying specific software but usually they don't give you money to hire a programmer we would love to have long term maintenance that are involved in open source and I think this would be really great if there would be kind of software workshops in institutions so my take home messages open source in my understanding is is the key for independence and innovation and sciences and to eventually overcome the reproducibility crisis and open source is also the key for data and tool sharing one drawback open source development largely depends on volunteers and they do it in their free time and the way open source development works does not always work with the scientists who want to just use it right my claim is we would need long-term support for open source tools in in the sciences to co-overcomers finally open source is definitely fun and I want to suggest to keep on hacking thank you questions we do have so the question was if it if it is possible to get the raw data from our experiments yes we usually are recently since a few years publish all the raw data together with some descriptions of how to read it and so on actually on the gin platform so that in principle you could really go and dig into the raw data and I think this is really critical we had some experiences with published data where people found that there is a linear relation between two two features and they only showed the average of this this thing and when we go and eventually we could dig out the data well it was not the raw data but but some abstract but it was an average across trials and each individual trial showed something like this not the straight line but since these individual low-pass like things were were shifted across trials it showed up as a line so their interpretation oh yeah it's a linear relationship this is this was actually not not good and I think this this can only be dig dug out if if the information is somewhere available yeah is it possible to play back the data you mean during what do you mean with playing back the data I mean we can we can write a tool that that makes a movie out of it is that what you can read it and then basically play back and help let's say another program read it and I'm thinking if you can use something some kind of like machine learning to to to train it to understand the playback of the data I hope so so yes we could play back the data too so when it depends how you store it you can decide that you only store the events for example you know like the action times of the action potentials that were generated by that neuron but since we always record the raw data in completeness that it blows up the the file size but I don't know what we might use it for we could in principle play back the stuff yes it depends so what do you mean the activity of the neurons is a millisecond range so the relevant time skates are milliseconds or sometimes ten tens of milliseconds it's yeah definitely much smaller I had a look at the gravity wave data that was published and it is very very similar to such data actually it's just maybe a larger data set but it's no problem for the storage or for the recording start so I think we we published some software using the lesser sorry the question was what software licenses we use and if it matters for us so we use MIT like licenses or always or sometimes we publish something under the lesser yeah so the question is if we consider it having a joint endeavor with other communities basically well yeah considered yes time-wise not really active so we are part of the there is for the neuroscience side there is the incf international nurse neuroinformatics coordinating facility thank you and they try to bring people from different aspects of neuroscience together to come up with something that is helpful for standardization and for common tool development would be awesome actually the next project is not at all depending on neuroscience right it's not limited to neuroscience data it could store anything but yeah so far we were weren't really active outside neuroscience so there were two questions one was please correct me if I'm wrong if we use in vitro or in vivo measurements and the responses we take in vivo measurements that is the the whole system is alive and at work with all the connections between the different areas the second thing was oh open hardware if we consider using open hardware for recording yes and so we we are kind of trying to use the teens controller boards which do offer us some enough sampling stuff basically but they are kind of tricky in our circumstances because of other problems but yes generally yes we