 Good morning everyone, yep microphones on, welcome and first of all thank you for X-Wiki to organize this this deaf room. My name is Sandra van Doeren and I today I'll show a bit of the things that we've done during the last few years for a project at European Commission. My colleague is here as well. I might pass on some questions to him. So it's a project that we've been doing for a few years and well some well we discovered some interesting things with integrating a knowledge graph with CMS. So who am I? My name is Sandra. I live close by in Leuven. I'm an independent consultant so I'm not representing the European Commission here in any way it's just some of the things that we did there. I'm working there on software architecture and you well apart from that I have a I'm intrigued by all sorts of complex systems so and these are the contacts. I put the slides on Penta but I noticed that I forgot to put the file extension there so if you get the if you download if you would download them just put PPTX behind. I tried to change it but the system well it was a bit overloaded this morning so I lost that opportunity. So the place where I work is ISA. It's a unit that promotes everything open from the European Commission site so we work on open source projects. So the goal is really to promote interoperability between public services so that can be on data models but well one of the things that we have is the interoperability framework. We have a reference architecture that supplements that framework. All of these things are clickable in the slides but I'll skip quickly through it because there is not a lot of time. The main thing I wanted to point out here is join up because that's the action that actually where we did all of this work for and that's the content management system that is based on Drupal and one particular thing that might be of interest to you outside of the technology behind it is the licensing assistant that we have on this platform which is sort of a wizard that helps you in selecting an open source license. So if you're if you want to see based on the criteria it helps you determine what is the best license for you. So a quick agenda. What are knowledge graphs? How does Drupal and the CMS world integrate with this from how we then how we put everything together in one system architecture so how things relate to each other. Some conclusions and then the question and answers. Knowledge graphs. Well what is a knowledge graph? I don't know how familiar people are with the concept but I'll try to explain quickly what what goes into it. It is in fact a list of statements and a statement can be something of the form AW1120 is a room in ULB and you will see that I underlined parts of the statement and another part is in italics. This is really to indicate that well in this small sentence in this statement there is a subject there is a predicate and there is an object and I'll come back to this later. Our knowledge graph something new absolutely not. They're sort of rebranded of what Semantic web was in around the 2010s so I probably somebody at Gartner decided it was time for some new terminology and introduced knowledge graphs. The big difference with the 2010s is that at least now there is a decent set of tooling around the whole concept so now you can really work with these things. So we have statements if we throw these statements together they form a graph and that is our knowledge graph. You'll see different kind of terms thrown around so linked data, linked open data, Semantic web, knowledge graphs, RDF all of those things are just synonyms for the same thing while they're not a hundred percent but they cover the if you look around on the web these are all sorts of terms used for the same kind of ideas. Everything in knowledge graphs and that is what this makes it particularly interesting is based on standards while a lot of graph databases to think the Neo4js and they are based on very loose well they define their own standards and but what we use on the on the Semantic web is really W3C standards like yeah you would have HTML but so the information itself is described in RDF and that is a specification which is which makes it very robust and I mean very portable. So we have RDF that is the specification that defines how you can express statements within a knowledge graph. We have of course those statements we can make arbitrary statements but things get a bit more interesting if we put things in data model that's where RDF has an outcome in they allow you to express classes of things and how you can instantiate things. You need a way to query all of this madness and that is what Sparkle is. Sparkle is also very standardized so it's a standardized query language that allows you to query RDF data and really what it all comes down to is RDF is to information on the web what HTML is to documents. Documents express are a way of expressing information about a resource so an URI on the web is a universal resource indicator so it's really the identifier of a resource and a document is a way of representing a resource and the same way we can represent resource with a document we can also represent it with RDF. So a machine readable version so the information is readable for the machines instead of HTML documents that are readable for humans. How does RDF work? If we see here our table of beers some might be familiar of yesterday evening. What we do to express a tabular form in RDF is we take our ID we form it we turn it into a URI so we put something it looks like a URL but it doesn't necessarily point to a document on the web. We take our column name and we turn that into our predicate and then we take our value of the thing and it we transform into an object. So in RDF we always have this form of subject predicate object and this way we can describe whatever information that we want. Sometimes this can get quite big because you have to be quite verbose in your in statements but in the end you can with this simple format of subject predicate object you can model the whole world. Objects is important to point out everything here subjects and predicates and objects can always be URIs so subjects and predicates have to be URIs. Objects on the other hand can be either a URI or they can be an XML type like a date time or something of that. Then what's all of this about? Well the magic is really in the URIs since while you make a statement it's somewhere on the you make something a statement starting describing the world with your domain name as the first part. If you then if somebody else makes this is describing data in another domain well you know that you'll be able to merge everything conflict-free because well you just have a list of statements on one hand you have a list of statements coming from another place and you can concatenate the two together. What is another very interesting aspect of having these things displayed or represented with facts or statements is that you can build your applications in a way because they make the operate on this graph of data in a way that they ignore everything that is that they cannot understand outside of the data model. Now RDF we can represent it in multiple ways or XML, JSON, turtle or any other sort of serialization. So if your web service works with JSON you could add a bit of context and transform it into linked data or graph representation. I'll have to speed up a bit. Why would you want to do it? Well this idea of applications separating your application from the data is very important. Instead of putting logic in your application layer that knows how to interpret certain structures in your database everything is directly expressed in the data itself. And then as I already mentioned well you have this possibility of ignoring facts so you can really subclass or make application profiles of a data model and sort of specialize them for your application context. So a very generic standard and then make something for your application, add the missing parts and then this data can still be interpreted by applications that handle that work on this generic data model by simply ignoring the things that you stated that they cannot understand. Now well the obvious things that big corporations are using knowledge graphs for now are the interactive assistance like the tele screens of Amazon that you know what are they called, the things you can talk to. Alexa. Then of course Google uses it to improve search because well modeling knowledge and relating things to each other you can greatly increase your insight in how the world fits together and well chatbots are also another example. So this is how you express a query I'm just gonna speed up a bit. Coming to the Drupal part of things. What did we do? Well we have this project called JoinUp where we have this catalog of reusable solutions, open source, we try to express everything there in machine readable format so that people can also reuse our catalog and it is really a federated portal so there are member states of the European Union that have their own catalog and we harvest data from them and turn it, throw it together into one European catalog. It's licensed under EUPL and it's open source. If you grab the slides you can click on all those things and get to the GitHub page. So what was really the thing that we required, well that were required of us, the team that was responsible of the implementation. Well it has to be, the solution had to be free and open source. We had the requirement that our project had to be open source which is always of course very nice. Drupal was part of the requirements because there is a lot of knowledge in the European Commission on Drupal. They know how to host it. Well if you have a stack like this try to reuse it. We had to publish our data, open data so it could be the RTF part and then harvesting of other catalogs so integrating data sources and of course the whole solution had to stay compatible with the contributed modules in the Drupal ecosystem. So that was another thing that we wanted to safeguard. What did we do? Well we swapped out the database layer of Drupal so we wrote our own ORM let's say. We made a module that plugs into Drupal and that simply instead of doing SQL it does sparkle and since it's Drupal has a quite well-defined database abstraction layer we can now express the same type of queries in the ORM as it would go to a SQL database. So whenever we hit save on a form it just goes to the sparkle to the triple store database that stores everything in RDF instead of. And that was really our goal was not to store any linked data concepts outside of the infrastructure layer not to use any of those concepts so it's a normal Drupal site from the outside only on the infrastructure layer we have linked data things. I'm gonna skip this. Coming to it's not very readable the resolution got to me but what you see here is on the right side we have two databases that we use we store part of our content in a relational database and part of our content in a triple store database and everything fits them into the CMS system and we had to do a bit of trickery because well if you have two completely different database technologies things get a bit tricky if you have to do a query. So in the end we decided to well let's index everything in Apache Solar and we run our queries against Solar and then when we do object loads we just get them from the from the database and that worked out perfectly fine. We also have a copy of this triple store database that we put out in the public and there we have our public sparkle endpoint so you can you can go and query this. So quickly some lessons learned our objects well everything all loads from the database we since Drupal likes caching we ended up with a ready server so in also our objects gets gets cached in in redis and our queries go to Solar. So in the end the database the triple store database and the MySQL are just sitting there being idle only when somebody says something something gets written there. Of course what I think is the huge potential for big organizations is this ability of extending data models and still being compatible with with the base data model. So imagine you have 200 websites in an organization every organizational unit has a different notion of what a news item means this one director wants to have a subtitle in his news you can accommodate for this and still operate you can query on the corporate level with with the share data model and on the local level those specific websites can use their specific models. It's awesome for data integration so our data coming from the member states we can just pipe it through and get it all in our triple store database of course with validation in the process and working with reference data like we have code lists of countries languages these kind of things it's it's really enjoyable because we can we can simply remove everything related to a certain topic and replace it with a new version and everything is still there and working. Now let's have a round of questions yep excellent great no so the question was is everything stored in every database and or if the if is the data that we cannot express in a relational model stored in a date in the triple store or where it goes what kind of data the well the the reality is that we have a part of our data model the standardized we use ADM as AP which is sort of decad AP variation and all of the entities that are described in this model we store in in the triple store database everything else news events they go to the relational model it was if we would have the the the tooling that we built now when we started probably we would have expressed everything in RDF but at that time it was very difficult to to rely on the thing and things like versioning are also not so easy in RDF and that's why we decided to make the split we're using virtuoso the open source version it was what we got our hands out at the time and it was working fine so there's I think by now there are better maybe better products on the market but it's it works fine well virtuoso works works well it scales pretty well it's still one of the most performant ones it's just that the documentation is is horrible so that so that that was more the challenge than you had a question excellent yeah yeah so the question was can you do everything with solar because the document it's a document database and you sort of lose the expressivity of sparkle if I can summarize it well we have parts since we don't have everything in in our triple store database we had to do something Drupal works very entity based anyhow so it's not it's more it's more a natural document model I would say then then really graph data so it is already it's a bit it already puts a bit of an opinion on what your graph will look like but what we do offer is for the so we have the one database we have the other database both of them have their query layer from from Drupal on top we have an object model that is similar that is the same for both of them they're just entities in the CMS we index everything in solar and and then well you our queries don't go pretty wild it's it's more retrieving things related to well yeah it's yeah it's just simple things actually for the querying the solar the sparkle querying we do offer a public sparkle endpoint which is a copy of all the of a subset of all the graphs we have in our main triple store so there you can everybody who wants to have the fine grained sparkle things can can use that any other questions or can we start switching