 Hello, my name is Emily Pash and with Anaïs Mota we are going to present Vario, a literature search engine to support the curation of genetic variants. This search engine is addressed to biocurators, molecular biologists, molecular pathologists, or any scientists interested in finding information about genetic variants in the literature. In precision medicine, oncologists must identify the genetic variant of a patient that are clinically actionable in order to select the appropriate treatment. Clinical practice guidelines recommend to evaluate each variant, but for each patient, there are thousands of variants to evaluate. Therefore, the work is usually performed by curators who will search in genetic variant database, as well as in the literature to collect evidences. So how to find evidences about genomic variants? The curator can use knowledge base, such as cosmic and clean vape, which are valuable source of information. The curator will find their high-quality information, but the coverage is quite limited, and information about rare variant might be missing. In such case, the curator will need to access a literature, and it can be done by using standard search engines like PubMed and Europe PMC. But these common search engines are not appropriate to search for genetic variant, because genetic variant can be expressed in multiple forms, and so it's required to multiply the queries in order to gather an exhaustive set of documents about a given variant. A solution is then to use variant-specific search engines, such as Litva and Vario, and in this video, I am going to present Vario and to show you the strengths of this system. So how Vario can help? First, Vario is focused on the recall. Our objective is to try to retrieve all possible relevant documents, and the recall is very important for a rare variant, for which it is sometimes complicated to retrieve a single document. So to maximize the recall, the queries are automatically normalized and expanded. Second, Vario keeps the specificity of the user queries. It means that we will return only document about the variant mentioned in the query, and not aggregation based on dbSNIP identity files. However, if a user would like to search for an aggregation of variant, it is possible to do by using a dbSNIP identifier. Third, Vario is able to search for several variants in a single query, which is especially useful for polygenic diseases. And finally, Vario is not only searching in the abstract and full text, as it is the case for most literature-based search engines, but it's also searched in the supplementary material associated to publication. Including searching in images. In this video, I will first present the architecture and evaluation of Vario. I will then present how to use this system, and I will finish with a short conclusion. The Vario system starts with a query. A query consists of a gene and a variant, and optionally a diagnosis or demographic information. The query is first normalized and expanded, the gene is normalized to a uniprod gene identifier, and the variant is expanded using Cynva. Cynva is a tool developed by the CIP text mining group and is freely accessible online. Cynva's tool process variant at any level and expanded using all possible reference sequences. The expansion is based on HGVS format, but also on non-standard format, based on a list of expressions usually found in the literature. It means that the user can start with the variant at protein, transcript or genomic level, and Cynva will normalize it and expand it with synonym at all levels. Then, the process query is used to retrieve documents in Cbils. Cbils is a back-end search engine developed by the CIP text mining group. Among the articles present in Cbils, there are four collections of interest for variant search. Medline, with about 33 million abstracts. PubMed Central, with about 4 million of full text articles. Supplementary Material, with about 800,000 articles corresponding to about 4 million files, and it's mainly images and tables. And finally, we have about 300,000 clinical trials from clinicaltrials.gov. All these documents are processed and annotated with a list of terminologies such as drug bank, mesh and uniprot. And finally, the retrieved document are ranked in order to return the most relevant article first. The evaluation of Vario relies on several experiments, and in this video, I will present the main outcomes. More details about the evaluation can be found in these two articles. First, we evaluated Vario as a literature triage system to support the curation of a variant. In other words, we wanted to evaluate the precision of the top-ranked document returned by Vario. The experiment is based on a benchmark provided by the Track Precision Medicine Competition. The results show that Vario retrieved about two-thirds of relevant document in the top five. The next evaluation is a comparison of Vario with another similar tool, which is Litva. At the time of the evaluation, Litva 2 was not yet released, so the comparison is based on the first version of Litva. For the comparison, we used a dataset of 800 variants in BRCA1 and BRCA2. The results show that there are about half of the document in common between both systems. It also shows that Vario is missing 9% of document retrieved by Litva. We did a manual analysis of this document, and it showed that it is mostly false positive documents. In particular, it is documents about not the variant of the queries, but another variant occurring at the same position, and the cause is the variant aggregation performed by Litva. In opposite, Vario is able to retrieve 41% of document not retrieved by Litva, and the manual analysis showed that it's mostly positive document. We also evaluated the impact of supplementary data on the reduction of silence. This evaluation is based on the same benchmark of 800 variants expanded with a random set of 1,000 variants from Clinva. On this benchmark, we got no result in Medline and PLC for 907 queries, and we wanted to know for how many of these queries do we get results in supplementary data? The results show that it significantly reduced the number of silent queries. For more than half of the queries that were returning no result using only abstract and full text, the use of supplementary material enabled to find at least one match per query. The last evaluation is about the impact of supplementary material to increase the number of document retrieved. This evaluation is based on the same benchmark as a previous experiment, and the results show that the recall is strongly increased. It more than doubled the number of document retrieved per query. Indeed, supplementary material enabled to find on average six document per query, while abstract and full text returned only four documents. Moreover, there are less than one document retrieved in the supplementary material, which was already retrieved using abstract and full text. Now I'm going to present the user interface. There is three ways to query Vario. The user can search literature for one variant or a combination of variants. The user can also search literature for a set of variants by uploading a file. Or the user can reload results previously written by Vario, which enable to share a query with colleagues or to come back later to continue working on a query. For this demo, I will use the first query mode for a single variant. So in this mode, you can enter the genetic variation in the field and optionally define some minimal patient information such as a diagnosis or some advanced option, for instance, to define a specific date range. The response time of Vario is quite slow, we should propose a link in order to come back later to see the results. The results are displayed by collection and each collection is displayed in a different type. In this case, we found one document in Medline, six document in PerMed Central, no document in clinical trials and 11 document in the supplementary material. On the left, there are some options to customize the display, such as the selection of entities to highlight or facets to filter the results sets. For each document, a few passages containing the variants are displayed and the full document is also accessible with highlighted annotation. And finally, documents can be flagged by the user in order to export the set of document considered as interesting by the user. The export can be done in CSV or in JSON. To conclude, Vario is an efficient tool to retrieve literature associated with variants. It retrieves about two-thirds of 11 documents in the top five. Vario has a very high recall compared to Lidva. The use of supplementary material significantly reduces the silent queries and especially the use of image, which is quite rare for literature search engine. It also enables to search for combination of variants. Vario keeps the specificity of the queries. We are not doing aggregation of variant. And as future work, we are working to improve the response time and we are also investigating the use of pre-trained language and ensemble learning models. Vario, Sinva and Sibils are publicly available. You can find on this slide the links to these services. We also have two publications about Vario. You can find the references here. This project has been supported by the Swiss Personalized Health Network, Biomedit Infrastructure, Elixic Data Platform and the Cineca Project. I would like also to thank all the people who have contributed to the development of Vario, tested it and given us suggestions and feedbacks. Thank you very much for watching this video.