 Welcome to this short introduction to the core concepts of knowledge graphs. The idea of a knowledge graph is quite simply to take domain knowledge what we know within a field and structure it in the form of a graph. A graph is nothing more than a network and if you are not already familiar with graphs and networks I strongly recommend that you go watch my introduction to the core concepts and network biology before continuing to watch the rest of this presentation. Here I'll go over the types of data sources used to build knowledge graphs, the challenges of combining them into heterogeneous networks, how we store these graphs subsequently, how we query them and lastly how they can be used in practice. There are two main types of data sources going into knowledge graphs. Ontologies which contain formal descriptions of concepts and how they are related to each other. A good example of that is genontology which for example has the structure of a cell in the form of concepts and relations. We also take a lot of information from databases, this includes information about entities such as genes and proteins and relations between them which could be functional associations between proteins like you would get them from the string database and disease gene associations which you could get for example from the diseases database. This knowledge can either come from manual curation of the literature or by automatically text mining the literature. It's important to understand that even though we have knowledge in these databases they are normally not considered knowledge graphs themselves. To have a knowledge graph you normally need to have a heterogeneous network, that is we need many node types and many edge types all within a single network. The result would look something like this where you have genes and diseases and drugs and many other types of concepts with many types of relations between them all in one network. The challenge of building this is that you need to combine many sources of data which all come in different formats and use many different identifiers for the same things. For this reason almost all knowledge graph efforts build on top of existing integrative databases that already took all the relations we have for one particular type and combined it into a database. Once we've made a graph we need to somehow store it. You can of course store it in a relational database but since you've just structured everything as a graph the obvious choice is to store it in a dedicated graph database such as Neo4j. The advantage of this is that the database inherently has graph structure that is it has nodes and edges and associated with those you have node properties and edge properties which you can use to store any other data about these nodes and edges. This includes provenance information that is where you got them from and where people can find more information. Once you have things in a database you need to be able to query it efficiently to make use of it. That is we need a query language that allows us to quickly access nodes, fetch edges between them or find the first neighbors of a node. If you stored data in a relational database you would do that using SQL. If you stored in Neo4j you would use the Cypher language instead. The advantage of the latter is that it allows you to do more complex graph operations such as for example finding the shortest path between any two nodes in the network. However it has to be said that in practice you don't often need this and for that reason the advantages of storing things in a graph database are somewhat academic. Lastly a few biomedical examples what do we use knowledge graphs for. Knowledge graphs have been used for doing gene disease prediction that means trying to find new associations between diseases and genes. This can be done by looking for metapath in a knowledge graph so that is looking for indirect associations where a gene and a disease are both linked to the same intermediate concepts. It's similarly been used for drug repurposing that is trying to find new uses for existing drugs by linking drugs to other diseases that they are not currently used for by looking at complex subgraphs like this. Lastly and most recently it's been used for clinical decision making starting from clinical proteomics data and then using a knowledge graph to identify which are the best drugs to use for this particular patient that we have proteomics data for. This work was done by my collaborators at CPR. It's important to understand that all these knowledge graph efforts build heavily on open licenses. It's impossible for even a group to build a solid knowledge graph of everything we know. Everybody builds on top of existing databases and that's only possible because the types of integrative databases that I make are openly licensed. And thankfully the knowledge graphs themselves are also openly licensed allowing everybody to build on top of what others have already done. It's also important to keep in mind that what matters when you work with knowledge graphs is the data in the graph. If you have good data, if you have a good graph, you can do a lot of useful things with it. If you have bad data, you can't. It matters a lot less how exactly you store the data. You could always store it in a different way. That's not a lot of work. Building a new knowledge graph is a lot of work. That's all I have to say about knowledge graphs. If you're interested in data integration in general, I suggest you go watch this presentation as well. Thanks for your attention.