 Okay, so we are right on time and Thank you for organizing this track. Thank you for the opportunity to speak to you This is Marcel my name is Marcos. We are both research associates at the Department of Medical Informatics in the city of Göttingen in Germany and now we switch from the Neurosciences domain to the medical research domain and Our plan is to give you a quick very quick overview of some of the tools that we use in our domain of research and Just to get an overview who has ever been in contact with the domain of medical informatics Quite a few. Well, it's overwhelming actually So you might recognize some of the tools but for all the others we try to introduce a little bit The discipline in general so We thought of how to how to Characterize our domain and we set last like four major fields of research and medical informatics So at the core of it, there's systems for primary care So I think everybody can relate to that when you are sick when you go to the hospital you your case creates data your Your condition is documented and all of this is done in electronic health record systems Maybe there are images taken MRI CT Echocardiography days those are put into some information systems and all of this is primary care domain and then we have a couple of research domains or research heavy domains first of all biomedical research where you study the Systems biology within the body and derive therapy ideas The other route would be to go down here Into take routine data run statistics on them and do use so it's called secondary use because you again Data that was collected in primary care is used for research to develop new therapy options again and Finally because we run something called evidence-based medicine We want statistical scientific evidence that the therapies we run in primary care are actually not harmful and even beneficial for your condition So all of this goes up to the top into what's called clinical trials So basically you do experiments in human To make sure that everything we do in therapy is actually sound We will go through a couple of software tools that are part of some of those fields and One thing I would like to mention is that the primary care is basically the Probably the domain with the least open source software Which is due to the regulations that are in place in primary care. I mean it is you're treating patients Which means that even software will soon have to comply to European medical device regulations and so on so many companies Create proprietary software and sell it because They say our software complies with all the regular regulation So to take you through the software tools that we want to introduce Short story This is Bob Bob suffers from chronic heart insufficiency So this means that Bob regularly has to visit the hospital Have his condition checked up Every time he has to go in for routine checkup they Document the medication he's on They take blood samples and also they make echocardiographic imaging and All of this is stored in the clinical information systems So this is a model hospital because they actually use an open source tool for primary care documentation in this case they use X-Nut which is a picture archiving and documentation system that is open source and That can store the images created and The the structured data that is derived from those images. So the vital parameters that are actually Interesting and further research on the data X-Nut is a very extensible open source tool. You can not only store the images and share them with your colleagues You can also plug in analysis pipelines like in this image J to run analysis on the on the images and data that is stored in there and Yeah, in the end it gives you web-based user interfaces for image upload Also image viewers, etc So since Bob is also not only at the model hospital, but he's a model patient. He has given consent for That his his medical data may be used for research purposes Just like that and this leads us to of course Alice and Alice is a health data engineer At a place called the medical data integration center. So that's basically where we work And her job is to get All the data that is created and documented in primary care systems out of those systems. So you extract this data To mask patient identity because in clinical systems You always have the medical data stored for each patient and you know who this patient is in research We do not want to know who the patient is. So we want to anonymize the data so The data has to be masked then the data has to be transformed according to the formats that you can use in In research and finally put into some kind of research data repository to make it accessible again for researchers The tool that Alice uses is open source and it's talent open studio for data integration. So this is a Graphical user interface data workflow manager. I'll give you a little bit better Image of that so you can create data transformation workflows with a graphical user interface. It's based on Eclipse. It's Also provided as a product by a company, but All in all it's open source tool and you can create these kind of workflows dragging and dropping Like sub-processes that are encapsulated and can be reused in different workflows You can export all of that as a as a jar file and then run it on service and orchestrate this and This is basically what we do to Extract data from all the different clinical information systems and put it into a single report research repository in the end So sub-processes part of these ETL jobs run through talent is for For one the masking of the patient identity The example two we use for that is the Mainzl Lister. It's a German German name for Two mostly created in Germany. So please excuse me that these slides are in German It just says that we pseudonymized the identifying data of a patient. So you see the name on the On the left actually and then you pseudonymize it You just get some random ID number back and you can also use this service this Masking service to create different different secondary IDs that belong to this first ID. So Yeah, there's lots of math behind that and you can actually do lots of stuff so you can do De-identified record linkage. So we have research data from two different systems That have separate IDs. You do not know that those are the same data from the same patient But you can use the service to link those data back again in the end Yeah, so each of those De-identified data packages that we now created in the first step of our workflow Has to be stored and we are working in science. We want to have all the all the data pieces we use in in the scientific progress Archived and stored persistently for this we use an open source tool called cd star This is developed in gutting in its Data storage middleware basically so We mask all the underlying block storage options or that are when running the data center We put a REST API in front of it and we can just use REST calls to store data items together with Access control lists and media data about this item and we also generate persistent identifiers So that each data package that was used in one of those processes Can be identified afterwards permanently Hopefully forever because we use the data center that's In the forever business because they are linked to our library university library So this is the cue to switch So awkwardly switching has been done. I hope the cartoon was long enough to distract all of you um So we've seen that alice has stored her data in cd star and now She has this data in some format. She may be thought of but we would like to have more We would like to have semantic annotation meaning that the data file itself should contain something that tells some other person What the data is about so you may know it from csv files with just data tables where you don't know what it means and Best most of the time table header that says some weird combination of letters that you don't understand So in order to circumvent circumvent that we use open EHR open EHR actually is a foundation So it creates a lot of stuff But the two things we want to focus on would be the specification as well as the clinical modeling part of open EHR So as for the specification the guys at open EHR created a two level modeling Yeah architecture Where the specification states different reference models These are the most basic parts that you can store data in so think about that as Data types in every programming language like character or integer or something like that Using this reference model you can go a level higher and use it in the let's say user space Where different users with clinical knowledge can take these reference model Piece of information and put them together into an archetype an archetype would be a let's say logical Yeah logical compounded Value that you can store in a database afterwards So meaning that instead of just storing a number you want to have the blood pressure You need two numbers and maybe some other units or something like that So an archetype would for example be Bob's blood pressure At the topmost level the templates are even a higher level collection of archetypes meaning that you can model even higher level Constructs like a visit so meaning that a patient comes to a hospital You are able to create how should a visit look like what archetypes what what parameters has to has a study nurse to record in order to Map everything to a common data format If you do that a template may look like that so pretty complex pretty big and it's More or less really hard to just look at the data itself So imagine that will be a really big json x and l file. It's hard to find the data in that So obviously the guys of open hr thought of that and created the archetype query language Which is a kind of hybrid between sql and x part that allows you to traverse the hierarchical data and Get your data that you want to have out of that So very nice. We have now the data in a common format that can be understood by other people So let's introduce another researcher called Carmen And she wants to get the data. She uh, no she first of all she does research on heart insufficiency Heart insufficiency So just the condition that Bob has and she now wants to have some data from the open hr repository show So if he she's capable of doing that She will specify some coin some kind of aql query Get the data out of the database that is set up at the hospital And is then able to use another platform or another tool to analyze the data that is called i2b to transmart Again i2b to transmart actually are two tools, but they are being merged right now together And this tool is a data a clinical data warehouse that allows to do some simple analytics and analytics on data Um, we will focus on transmart for this presentation because it's the tool that we run at the moment But in due time we will switch to the i2b to transmart merged tool Um for that you can see uh in this picture you can see uh, how this can look like so you can look at very basic statistics like the distribution of the age or the uh gender distribution So to kind of get a feel for your data that you have And you can even run more sophisticated analysis using r scripts. You can just create an r script write it Uploaded there and the analytics engine will give you the output that you desired uh, uh, we we use this tool primarily to um, yeah kind of do a data review So in this example you can see that there is a correlation between age and height And you can see that there is a lot of data points on the left But only one at the right. This is because the person that is depicted there has a height of 165 meters Which is let's say unlikely, but it just shows that um Data that you integrate from the heterogeneous it infrastructures usually are erroneous And you have to think about that and you have to keep that in mind um Okay, so let's let's say she uh had a had a research hypothesis Common that specified that did her research and was able to either approve or reject the Hypothesis she writes up a paper and everything is nice. Everything is clean And she submitted this Submits this paper to an open access journal What we like to see is that she opens up her data as well So in the spirit of open research open data, she should uh, yeah publish the data somewhere She could do that in a fathom seek instance located at the hospital run at the hospital Um fathom seek is basically a data repository, which follows the iz isa standard which stands for investigation study and uh assess Say thank you and uh Most importantly, it stores rich metadata with the data files that you can upload there So you can assign a license you can say this data This piece of data was conducted in this study using this investigation and so on and with that you can much better publish the data and uh, you're stored persistently so Even after several years, um, you can provide the same link You can call that link and you can see okay this data was used in this publication and I can maybe open up this publication to find more about the data or vice versa so Yeah, thanks to all the tools and the people that are involved that they were able to use these tools, uh Maybe bob can get healthy and all his fellow patients Can lead longer and happier lives um So for one hospital, this may may even be somewhere the case. I don't know not in germany more certainly, but um, What about sharing data with other hospitals? Um, this is something that is very hard to do For us, especially we're working on that project right now And we can see that there is a lot of data infrastructure on the global level that uh tries to do that to link They get data together so create this internet of data objects um There is a lot of work put into that in different domains, um, and For example in the medical informatics domain On the national scale, we have the medical informatics initiative which uh, yeah has offered us some money Basically to build this up for germany On the international scale odyssey or eden do basically the same with other data formats other technologies But have the same goal. So, uh, it's a very very good movement to see that that they want to link the data together And across domain, there are also developments that most of you people may know Like the research data alliance or the working group from the w3c data on the web Which has specified the doap protocol to Excess digital objects from different domains or from different it infrastructures So in order to create uh in order to use this data over The the boundaries of our hospital we have to think Especially in medicine about the security Um sensor medical data is very sensitive. Um, as we said we we are doing Masking of the patient data, but still this may be not enough if you think of rare diseases It's uh, very likely that's a doctor that knows a lot of rare diseases Uh or works in the field is able to find a specific combination of some diseases and no, okay I know this patient. I've seen that before because it's rare um So this medical data bears both high value for the research, uh, but also potential for misuse so We think that the benefit from linked medical data has to be uh, yeah exploited basically so we we want to have the data We don't want to have it accessible And do research on that to improve healthcare But what we need to do Would be to create secure IT infrastructures for that So not everybody should be able to just like that download everything and have it In a really big nice repository and do some I don't know big data analytics or something like that as he wishes But we do need some regulation and do need some yeah safeguards in place um and for To do so we need accountable and transparent workflows So not only we have to make it secure, but the patient should know where is my data going What is happening with my data where is research may be published for with my data in order to empower the patient um Yeah scandals like the the Cambridge Analytica Uh data leak or the leak what they did with the data doesn't really make it particularly easy to say go up to a Patient and say hey, can we have your data? We want to do research. So they're pretty hesitant on that um This also not all this this not only all this this not only counts for the data But obviously for the for the For the for the tools that we use themselves. So We try to use tools that are off the shelf are available for download But not every tool fits our purpose. So we have someone some sometimes to extend it or even build new tools and from our perspective this has to be open source tools because um This is the only way to really show Get transparency and empower the patient So things like public domain public money public code Show that we are going in the right direction and have a political Let's say a layer to talk about these things So we need that for medical research especially and we should emphasize that Everyone who's was working in this medical field should think about that that making not only the data open Just as is but thinking about the tools that you are using and making these open And the data flows you're creating So Global data infrastructure. We saw them. They are being built. Um, and we should Establish decentralized and free technologies to ensure secure it IT infrastructures Typically the medical information systems are really not Free or open source whatsoever. These are proprietary big blocks that we saw that are very black and nobody touches them But the tools in medical informatics research are frequently used and could be very well rolled out to more of the primary care But we have to advance that we have to politically Like make voice for that and say hey, there are tools you could use and yeah Please do so if you are capable of so we saw that some people from the medical domain here Raise your voice. It would be really a step forward to have more openness in the medical informatics research um, so for the references Obviously thanks for the team in Göttingen. We had a lot of guys that are very open to the whole open source community and we try to Yeah advise it on further use and Like like everyone basically said if you're interested contact us. We have jobs. Yeah, so thank you very much Questions doesn't work if I turn off the microphone. So any questions? Yes So the first comment was that we do not necessarily need open Resource tool or research tools, but especially open formats and open standards, right? So like opening HR presented that there are a lot of other standards fire omob cdm We agree totally so we we do need that to exchange data and enrich it semantically so other can reuse the the same data um, the second comment was uh, what the the question was um Where where is our field of application right where where we where we going with that? So basically as medical informaticists, uh, we're kind of in between everything. So basically we're trying to Help the researchers get the data analyze it and and derive research hypothesis from it We're trying to get to the patient to to to empower His view on his data and we're also trying to Go a little bit into primary care and say hey look at that We have these tools. We use this path to get the data from a to b and wouldn't it be a possibility for you guys in the it Yes Okay, the the question was um, there are similar movements in the Netherlands right now So basically what would we shout it out? We want to have the doing that in Netherlands. So great Um, please come to us after the talk. We would like to talk I guess Um And I I hope I think I don't know which project you are associated with there are some projects that do that internationally But I agree. There's not much communication. So everybody gets this grant and tries to build up something open saucy And then we have 60 different versions of open source standards, which comply very very badly Yeah, so, uh, I guess yeah, the the the answer is let's talk. We have to talk Yeah, sure Using So so your question was regarding the medical information informatics initiative in germany and how the four consortia that are being Funded their act together. So Yes, especially relating to the to the open source tools So, uh, we have uh, like yeah organization that goes above all of it and tries to increase the communication between Uh, but of course, these are these are professors talking about problems that have to be solved Um on a working level, we try to establish those I mean in germany, it's not a huge community. So basically everyone knows everyone somehow and we have meetups and then some things like that. So Um, we have to increase the talking bit in between. We try to do so and we we try to establish Let's say the the the standard for for open source is set I I've I've never seen a researcher in medical informatics who says now open source. We don't need that. Everyone is the same Um, but what we have to do is to deliver that more into primary care and to the people that are not Affiliated with medical informatics itself But maybe more in the medical it which are somewhat more hesitant regarding that