 Welcome to this encyclical talk. My name is Torsizio, I'm a research scientist at the Swiss Institute of Mithromatics. Today I'm going to talk about our work entitled Enabling Semantic Curious Across Federated Mithromatics Databases. So I want you to know how to answer your complex biological questions in a short time with responsible course. The plan of this presentation to be as followed. I'll give you briefly an introduction about the motivations and the problems that we have to solve to be able to answer these complex biological questions in terms of data integration. I will also highlight our approach, how we do to solve these problems. And third, I will explain you, I'll show you actually applications that we developed over this integrated data for exploring it. And finally, I'll conclude this talk. What can be your system to be able to answer complex questions over many two preliminary steps? One of them is the data source discovery that it's about which are the data source that we need to answer a good question, biological question, and another one is the data integration. In this talk, I will focus on the data integration part and how we take advantage of this integrated data to be able to answer complex biological questions such as what are the human genes associated with a disease or which orthologs exist in the rat and are expressed in the rat's brain. For example, perhaps you are studying potential drugs against a certain type of brain cancer in human. And one thing you would like to do is to identify corresponding genes from a precise orthologs genes in this case in model species such as the rat and to narrow them down to the place they are expected to be expressed. To answer this question, you're going to need in this case, for example, three data source. For example, for the part of human gene associated with a disease, you could think about your uniprot database or the orthologs exist in the rat could be the Oman database, ortho-automatic database. And finally for the expression, gene expression data information you could take into account the BG database, for example. By doing so, we can enable actually new insights on the wealth of biological data on the web. But to be able to answer these complex questions, we have to combine these data. And we have first to solve some problems. And one of them, when you're thinking about biological data on the web, they are really scattered. One of the reasons is because several biological database have been developed and we feel or no coordination among the database stakeholders. This, in my opinion, one of the main issues of that is because these database are often independently maintained and they don't serve the same needs appropriately. They're not from the same domain, they're kind of domain-specific database. For example, gene expression, orthology, protein-centric database, human protein-centric and so on. Some of those database are read do some data integration to combine a small data set, but usually it's for a specific domain, such as gene expression, for example. If we consider the nucleic acid research annual issue, there we can find like more than 100 key resources that are featured there. That shows that there are like many database, nowadays available. And then if you want to answer complex questions that needs the extract data from these databases, we need some solution for that. Then to answer complex published questions, we often have to combine these scattered data on the web. Mostly also because these database, they're domain-specific and those complex questions, usually they need to do cross-domain and then by consequence, consequently have to extract data from different database. Another issue is that the biological data integration is quite complex. One of the problems is because the database is very, and in several ways, such as, for example, in the way that they model the data, the way that they strictly organize the data. For example, could be a relational model by using a technology such as MySQL Oracle. Could be another kind of data model called graph model that use all the kind of technologies such as new 4G, startup, this graph and so on. Since they are using different technologies, it hardens the data integration into probably the exchange of the data as well. Another problem is also the syntax that the user represented there. They also can be heterogeneous different. Like for example, they could use a JSON file, or XML file, almost separated file and so on. And most important, as most of those database are domain-specific, they also have different contexts and consequently also different data meanings. And for example, if you think about the gene expression database, the difference between the concept of a gene and a protein is quite important because if you not see that the protein is expressed, but the gene is expressed. And for the orthology, this meaning can be overlapped because when I have orthology database that even though they are working at the protein level, sometimes you say that a gene is the ortholog to another gene, that has our corresponding gene, there are this meaning of the gene and the protein doesn't need to be necessary, make a difference for this case. Then, but if you want to integrate a gene expression and orthology, you have to think about this kind of semantic conflict. And now I'm going to talk about our approach, how we solve these problems to achieve a data integration and furthermore be able to answer complex problems with questions. This work was published in the database column 2019, journal. Then in this approach, actually what we do, it's a seamless data integration that to achieve this kind of data integration of this scattered dispersive data from this independent data source, we believe that for this case to make them independent to keep as they are, we apply the decentralized approach. We are not loading the data in a central repository or in a database. By doing so, we reduce significantly the efforts of madness since the data are maintained by the database owners. And the data is also always up to date because we are directly retrieving the data from the original data sources. We are not loading it in a new database. Why the seamless integration? Because as I mentioned, the data is directly extracted from the data sources and are combined during the curious execution time. It means when the question is being answered. To be able to connect this different database, we established some metadata that we call beta links that bridge this different database to be able to answer the complex biological questions afterwards. The architecture of the system of our approach is divided in three layers. One of them is application layer. That's completely independent of the other layers. The second layer is a structured query interface layer. In this layer, it's where we do seamless data integration where the data is actually combined based on the metadata. And we rely on the Sparkle query language, the documentation is below here. And finally, the third layer, the data store layer that contains is composed of the database that we want to integrate. That they are kept as they are. These are some of the technologies you use it to develop this system. Now I want to present you some applications that we developed over this integrated data. For example, if you come back for the question for the introduction about the human genes related to a disease with a corresponding gene expressed in the red's brain, we could go to the browser.expressive.org and write disease keyword. Look at the templates with the disease keyword. Afterwards, you can run it by clicking on the green button. Then you're going to have some results. If you want to see the equivalent Sparkle query can be also by clicking the blue button. And if you want to edit the template, you can also change, for example, you can change from human to mouse. Then you can finally run the query. Another application that's actually work in progress the browser application that we aim actually for answering questions in spoken English. The user can directly write its question in spoken English without, that gives more flexibility. And then here also we're going to give some suggestions of interpretations of the questions of the user and finally also the answer for the question actually. Hopefully this work will be, final prototype will be available by the end of this year. To conclude this talk, I would like to emphasize that the integrations, the basis for answering several biological questions across multiple databases. Our approach actually performs a kind of similar state integration at the query's acquisition time. We do that for reducing madness efforts and also to keep the data always up to date. And also we show a readable question application that's we call the bio query application and it's available at the biosoda.expasi.org website. If you want to know more about this work, I would like to invite you to check the GitHub of the bio query project. Also the, the true publications about this work that was published later this journal and the Electoral Notes and Corporate Sciences in 2009. Thank you very much for your attention and although I would like also to acknowledge the work of my collaborators and the co-authors of this work and also the BG team and Oman team for this work. Thank you.