 Hello everyone, I'm Stefano, I'm from Chur in Italy, and I'm starting from a question for you, how many people are here in the room are ending every day with health care data? Ok. I pray also for you, because ending health care data nowadays is a mess, it's a completely mess. So, let's explain some issue we have with health data. The main is the lack of any standardization. So, without any standardization, it's quite impossible to compare the data between different formats and then combine data from different sources in a way that there is something good as a result. Because if we have different sources that make different data formats that are incompatible with each others, it's very difficult to have a very part to the final result for a statistical point of view. Then there is privacy and security concern. We are handling with health data, so health data is very sensitive and it's the most confidential data we have about people. And so there is also security problem. So, normally when we are managing with the health data, the first step is just to analyze all the possible data inside to avoid any match between the data and the patient related to that data. This is the first step and this is mandatory, because when we go on with statistics, we cannot have any information deeper on a specific patient, because this is in the European illegal because we have a lot of concern when we get data from different sources from the same patients. Then there is another point, this is also critical. There is a lack of quality of the data. The main reason is the device that acquiring the data that some way there are manufacturers and not all are very well certified. So we have a data that maybe sometimes it should be inconsistent. And also when we have a certified device that is OK for the certified body, then we have also the patient that maybe you have not the training to make the recording correctly. So the measurement or the data we provide from the patient is not in the correct image of the patient itself. So we have all these issues. All these issues then have the effect that we have, when we try to combine them, we have the problem that the statistical analysis we have at the end, if the input is useless or there is some strange things inside that we cannot discover, because we have not the way to discover how the data is correct or not, then maybe sometimes the statistic we get is completely useless, without any user. It's not useful, at most it's not accuracy, it's very few or less. To avoid this problem, there is a development of data standards and protocols in progress. And also the GDPR tried to also cover health data, not all aspects, but some of them. And the process we need to have with health data also have a need of investment for that infrastructure analysis tools, because we are growing and growing data about people and we have a lot of problems. For example, for the first step, the administration step is a step that it can make a lot of time, a computational time. So the investment in the data infrastructure is very important. So we return to the first point, the lack of standardization. We have a vast and complex universe of data formats inside the healthcare. So one of the challenges is in this universe that, like the real universe, it can explode. And the problem is the new formats entering the universe are mostly property. And this is a big problem when we are trying to handle it with the data. So there must be a way to try to define a guideline to create new formats. And the best way we suggest is to try to use open format. Open format, because normally when manufacturers of medical devices create a new device, it doesn't know, like we told before, there is no standardization. So when it starts, it just creates, OK, I have this data, OK, I put in a format that I invented, I created for my device specific. But the problem, we have a lot of device and a lot of data in different format. So in the past the best was, OK, we have different formats. We can create a converter to an open one and then make the comparison, the combining with an open format. But this converter is normally made by the same manufacturer of the device. So a lot of time this converter is also property. And so we have no access to the source code. So if there is some error implementation inside the converter, we don't know. So we just converted the open format, but we don't know by sure that what we get is exactly the same. We have as input, as original proprietary format. So this mismatch is also a real problem. We cannot solve that. So as I told before, the importance of open format, get the best for avoid all this problem. But when you create an open format, then you have to adhere to this open format. You have not to create proper extension to open format. Because if you want to create an extension, you have to discuss to the open community, and then try to change the open format. Not trying to put something proprietary in an open format without any description of this extension. Because at the end you get the same proprietary format at the beginning. And so they use the open format also, allow the collaboration, allow also the research in Elkar. So this is a list of some open format in Elkar. There are some, not all. The main one is EDF, the European Data Format, and BDF, biosemidata format that is used for medical time series. Which is mean, HEG, EEG, whatever in the timeline we have measurement, we can transform this data in this type of format. This is our open format. They are an open page with research mainly from the university that maintain the format, increase the version for each major version, and describe very well the format. Then we have the ISHNE. The ISHNE is the International Society for Alter and Non-Inversive Ethical Geography. It's a format that is dedicated for the ALT, HEG ALT. So it's another way, like EDF. EDF is more open to other data inside, so we can use the same to other medical time series different from ALT. ISHNE is dedicated to ALT. So it's created for this society just to, it's an old format. EDF is more recent like format than ISHNE, but it's very simple to use. Then we have the open format for biological sequences, so DNA, protein, whatever. So this is FASTI, FASTQ and SAM. This format has just the way you put the single character in a line to describe the sequence, but the sequence in each format is described in different way. And so there are different open format for these biological sequences because there are different interpretation of the data inside. Then we have the digital imaging for communication medicine that is used for medical images. So CT, RMI and other images like that. And the icon in different between other format to describe in this page are also the description of the protocol. The interconnection is also described in this format. So it's not only the description of the images, but also how that should be transmitted from one part to another one. So go to Python, because we are in our Python, so we have to talk about Python. Python is used for analytics. So with Python we can manipulate, analyze and have a complete data set and we can make comparison between data inside the Python. So what we can compare, the main one is patient information. Ok, you told before that you have no patient information because we have anonymized. Normally when you anonymize health care data, you maintain the age and the sex or something like that. Some information you have to maintain to make statistical data. So normally this information is used for statistical data so we can manipulate, analyze and compare. And so also when there is some pathologies, it's normally inserted in patient information. Then there is the result of blood analysis that is another test useful to detect some pathology and to make statistics. Then HEG, EGG, ecography, radiography that are images. So I suppose that you know the one in the left part of this slide. They are normal library to handle with statistical data. So we have no pi in ship pi, pandas and multiple plebe. They are standard ones. Then in health care data we also have another ones that combining with them, they make possible to reading, manipulate and using the multiple plebe for example to visualize the result. So we have biopyton, MNE python, pydycom, hdf-lib python, ishner-alter-lib python. From the name you can discover that the luxury libraries are dependent in each format they support. So pydycom is for dycom, hdf-lib python is for hdf and ishner-alter-lib is for ishner. Biopyton and pyton are more complex. They have a lot of formats inside. They are more specific. So we go inside. Biopyton is for biology and bioinformatics. So the formats we told before about biological sequences are manipulated, are accepted by the library called biopyton. So with this library we can make, describe and also create a protein structure, making population genetics and biological sequences and making annotation on them. So we have the input and output from this open format we told before. Biopyton also supports proprietary formats, some proprietary format, not all. And allow the possibility to export in open folder. So it works like a converter. So this one is an open converter for healthcare data in biology and bioinformatics. Also they have inside some machine learning. There is some classes inside biopyton that allow to have manipulation using machine learning. So this is an example with biopyton, including Sekio. You can parse a general bank file with inside a protein or whatever you want. And then you can cycle in the genome inside the file and create a single phase stay file as output. So this is a conversion written in four lines. Normally to make this with another software or with a proprietary software, it's like a mess. Biopyton is very simple to use and allow to handle with a lot of proprietary formats without any problem. Then we have an MNE python that is concentrated to magnetoencephalography, electronencephalography, sterencephalography, electrical technology near infrared sector stop, and other ones. And allows also difference between biopyton. It allows also the visualization and exploration of the files. So using matportlib and other libraries to visualize the result, it allows to handle and visualize immediately the result of what you want to get. It supports more format than biopyton and it has inside a lot of converter format from proprietary format to open formats. The problem is that it has also permissive reader. Permissive reader should be a problem in elker data because we go on in this point after but if we have a library that have permissive reader inside then if the original data has some deviation from the open format then the permissive reader doesn't attack that and allow, for example, proprietary extension to the open format. So, for example, if we pass not valid ADF file to MNE we can plot. So, yes, we can read this file but we have not the detection that maybe this file is not correctly in other ends with open format. So there is no warning, no error, no issue at all. For MNE it works like that. I can read and I have no idea if there is some error inside the data format, I don't care. I just want to get data from. This should be a problem because if we don't guarantee the open format then weather can make extension to the format. And the problem is that the manufacturers have the complete open format because without this we cannot have an open format right. So we have to have the strict reading of the open format. For example, the library we told before the ADF believe Python if the file is not complete it just not open. It just say, OK, this is not valid. And it check whatever in the open format is written, so it check all the definition and if it's correct then it's gone and open the file. This is the right way to do because other way we have the possibility to include some open format and also support that. That is not the best because to have the possibility to compare we have to avoid the support of property extensions. Then we have Python icon for medical image data set. We also support the protocol so we have storage transfer so the protocol implementation. With Python icon we can plot directly the image inside the icon. So if I pass the file from the icon my leg I can plot directly from inside the Python icon with few lines. This is very simple to use and also the icon server and client they are written using this library because it implements very well the protocol. So let's make an open format. We can try to make an open format from health care data just very simple to try to get something that anybody else can use. So for example we can use the blood pressure, the device that measurement the blood pressure. So we can create just a simple JSON file that is open and then try to make a constraint to avoid that people cannot create the property extension and not the stage without the community improvement. Why do we start with the blood pressure? Because with blood pressure there is not a right standard so we can make one in a very simple way. We can just create an array with daytime and all measurement we can take from the blood pressure. So the daytime and then the mode automatically manual dependent of if the device is automatically measurement of if it's manually measurement when someone press the button. So we can add also other information like the patient information and the intervals for the automatically blood pressure there is a possibility for the device to make automatic measurements during the day and during the night and normally these are dependent on time. So normally there is morning time and night time from this point we start with different interval. So we just make this interval as an argument as a possible field inside the JSON file. So this also can be customizable because this should be a setup for the device. And so also we can add the error man because there are also errors when the device can make the measurement. So the form must have to also support error. There are some format that doesn't support error because they only want to get patient data but the problem is if we cannot get the error from the device we cannot know if the device works correctly. So I get the patient data but the device doesn't work very well because it make a lot of error. I don't know that. So the format has also to store if the device make some error. And then we create a version of the format. So the format have to be versionable and also we can add some information from the device for example the firmware of the device something like this just to add more information to the format because more information we have inside the format more accuracy we have. And then, ok, we have the JSON file with all the data we show before, we get before all one after other and so we have a big JSON file with all measurement with the device information with the patient information with error information so we can create a schema to validate the JSON so there is a JSON schema format that allow the possibility to create a definition of the JSON so we can validate but we work with Python so we just make Python do that so there is a library that is called JSL that allow to create the JSON schema for you so you just describe how do you want the schema and also the JSON like and you can get the schema and then validate the JSON you create is valid. Why I create a JSON file because normally as I told before the blood pressure create a proprietary format that sometimes you get the information out the manufacturer create this binary and you can convert but so you need to convert to make conversion more easily because having this format sometimes is not the best to make conversion and then statistical analysis so after making schema and example of about this format then you have to create a library to support the format when you read the format you have to check the adherence to the schema like I told before you have to create a converter from another format to your format and then the last point but the main one you have to spread the open format that's all any question? Hi, nice talk do you know if these libraries can connect with fire files? Yes so if you can using this library without any problem it's a protocol so it's like a container inside this container you can use different format so you can use whatever of this library you want Hi, quick question you were talking about healthcare format especially open source one I was wondering if you had a look there are also open source kind of widespread in the industry especially for role management and mission that are in the concern Yes SS7 and FHTR are mainly for communication and also are containing our container for other formats so you can use to put proprietary format inside this container and then put the trial one so it's not the main of the purpose of the talk but yes open also this but our communication protocol more than data formats Any other question?