 Welcome to this brief introduction to the Diseases Database. The aim of the Diseases Database is to link human genes to diseases and the database currently holds more than 8 million such associations, which can be accessed via a web interface at diseases.jensenlab.org. The associations come from a variety of different types of evidence, the first one being curated knowledge. That is, we import associations from the Medline Plus Genetics Database, formerly known as Genetics Home Reference, and from Uniprot KB SwissProt, that is the manually annotated part of Uniprot. These associations are causal associations. That means we are almost certain that variants in these genes are associated with the disease in question. However, the downside is that our knowledge of diseases is very incomplete. For this reason, we complement the manual curation with text mining, automatic text mining of the biomedical literature. So we start from a big text corpus consisting of full text articles from the open access subset of PubMed Central, as well as abstracts from the PubMed Database. We then exclude known paper mill publications, those are fake articles having been produced just to boost people's publication counts. We take this big corpus and we perform named entity recognition to find names in text using a dictionary of gene names and a dictionary of diseases. And once we've identified them in text, we can do relation extraction by looking for co-mentioning of these names in text. That is, we find papers like this where disease is mentioned together with a gene, and we count up how many such publications we find for a given gene disease pair. If you want to know more about how this is done in detail, I suggest you start by watching my introduction to the core concepts of biomedical text mining, as well as the more in-depth presentations following it. Finally, we include data from genome-wide association studies. These studies provide links between genetic variants in the genome and traits. However, the bad news is that these associations are just statistical correlations. There's no guarantee that a given variant that is statistically correlated with a trait is causal. And this makes the data rather difficult to interpret for non-experts. The data from GWAS studies are collected in GWAS catalog, and building upon that, the TIGER resource was recently created. TIGER takes the associations from GWAS catalog, maps, aggregates the data, and scores the associations in a way that allows us to subsequently put it into the diseases database. The scoring scheme takes into account both the statistical significance of the associations, as well as the number of studies in which it was replicated, and how well cited those studies are. Lastly, diseases integrate all of this data from the different sources of evidence. We do that by first assigning confidence scores, which work like Amazon reviews rated from one star to five stars being the associations we are most sure about. These scores are designed to be comparable across channels so that you can directly compare the confidence score from automatic text mining to the confidence score coming from a genome-wide association study. We also map all the data to consistent identifiers, meaning that all diseases are represented by terms from disease ontology, and all the genes have been mapped to string identifiers. This in turn means that it's very easy to produce disease gene networks by going into the diseases database, retrieving disease genes, and directly producing string networks of them. The database is available via web interface, it's available as web services if you want to integrate it into your own resources, and it's available as bulk download files. You're free to do anything you want with this data, it's all available under open licenses, and it's updated on a weekly basis. That's all I want to say about the diseases database. If you have an interest in topics related to this, I suggest you take a look at this presentation. Thanks for your attention.