 Tom and Max here they have a talk here with a very complicated title that I don't quite understand yet It's called interactively discovering implication a lot of knowledge and Viki data and They told me the point of the talk is that I will later understand what it means and I hope I will so good luck Thank you very much and have some applause Thank you very much. Um, do you hear me as a rock hello over again Thank you very much and they are welcome to our talk about the interactively discovering implication and knowledge and Viki data It is more or less Fun project we started for Finding rules that are implicit in Viki data and tailed just by the data it has People in insert it into the Viki data database so far And we will start with the explicit knowledge so the explicit data in Viki data with Max So right What what is Viki data? Maybe you have heard about Viki data, then that's all fine maybe you haven't then surely you've heard of Wikipedia and Wikipedia is run by the Wikimedia Foundation and the Wikimedia Foundation has several other projects and one of those is Viki data and Viki data is basically a large graph that encodes Machine-readable knowledge in the form of statements and the statement basically consists of some entity that is connected Some some entities that are connected by a state by some property and these properties can then even have annotations on them So for example, we have Donna Strickland here and we encode that she has received a noble price in physics last year by this property awarded and this has been a qualifier time 2018 and also for chap pulse amplification and All in all we have some 890 million statements on wiki data that connect 71 million items using 7000 properties So but there's also a bit more so We also know that Donna Strickland has field of work optics and also field of work lasers So we can use the same property to connect some entity with different other entities And we don't even have to to have knowledge that connects to entities. We can have her date of birth, which is 1959 No, 1959. Yes And this is then just a plain date and not an entity and now coming to from the explicit knowledge then We we have some more we have Donna Strickland has received a noble price in physics And also Marie Curie has received a noble price in physics And we also know that Marie Curie has a a noble price ID That starts with fizz and then 1903 and some some random numbers that basically are this ID And then Marie Curie also has received a noble price in chemistry in 1911 So she has another noble ID that starts starts with cam and has 1911 there And then there's also Francis Arnold who received a noble price in chemistry last year So she has a noble ID that starts with cam and has 2018 there And now one one could assume that well, everybody who was awarded a noble price ID should also have A noble so everybody who was awarded a noble price should also have a noble price ID And we we could write that as some implication here So awarded noble price implies noble ID And well, if you if you look sharply at this picture, then there's this arrow here conspicuously missing that Donna Strickland doesn't have a noble price ID And indeed there's 25 people currently on wiki data that are missing noble price IDs and Donna Strickland is one of them So I'm we call these these people that Don't satisfy this implication we call those counter examples And well, if you look at wiki data on on the scale of really These 890 million statements, then you won't find any counter examples because it's just too big So we we need some way to automatically do that and the the idea is that well If we had this knowledge that Well, some implications are not satisfied then this encodes maybe missing information or wrong information And we want to represent that in a way that is easy to understand And also succinct so it doesn't take long to to write it down should have a short representation So that rules out any anything including including complex syntax or logical quantifiers So no sparkle queries as as a description of that implicit knowledge No description logics if you've heard of that And we also want something that we can actually compute on actual hardware in a reasonable time frame And well So our approach is we use formal concept analysis, which is a technique that has been developed over the past several years To extract what is called propositional implications. So just logical formulas of propositional logic That are an implication in the form of this awarded noble price id implies noble id And well, so what exactly is formal concept analysis? After tom, thank you So, um, what is formal concept analysis? It was developed in 1980s by a guy called ruder fülle and uh, Bernhard Ganta And they were restructuring red lettuce theory lettuce theory is an ambiguous name and maps its two Meanings one meaning is you have a grid And have a lettuce there and the other thing is they speak about orders Order relations. So I like steaks. I like Putting and I like steaks more than putting and I like rice more than steaks. That's an order, right? and let us this as particular orders which Can be used to represent Propositional logic so easy rules like when it rains the street gets wet, right? So and the data representation those those guys used back then they called it a formal context Which is basically just a set of objects. They call them objects is just a name a set of attributes And some incidents which basically means which object does have which attributes So for example, my laptop has the color black. So this object has some property, right? So there's a small example on the right for such an formal context So the objects there are some animals a platypus. That's the fun animal from australia the mammal which is also laying eggs and which is also a venomous Like widow the spider the duck and the cat so when we see okay platypus has all the properties it has Being venomous laying eggs and being a mammal we have the duck which is not a mammal But it lays eggs and so on and so on and it's very easy to grasp some Implicational knowledge here an easy rule you can find us whenever you endeavor and Mammal that is venomous it has to lay eggs So this is a rule that falls out of this binary data table Our main problem then or at this point is we do not have such a data table for wiki data, right? we have the implicit graph which is way more expressive than binary data And we cannot even store wiki data as a binary table even if you try to we have no chance to compute such rules from that and for this the people from formal content analysis I proposed an algorithm to extract implicit knowledge from an expert So our expert here could be wiki data. It's an expert. You can ask wiki data questions, right using this sparkle interface You can ask you can ask is there an example for that is then a counter example for something else So in the algorithm is quite easy The algorithm is the algorithm and some expert in our case wiki data and the algorithm Keeps notes for counter examples and keeps notes for valid implications So in the beginning we do not have any valid implications So this list on the right is empty and in the beginning we do not have any counter examples So the list on the left the formal context to build up is also empty and all the algorithm does now is it asks is This implication x follows y y follows x or x implies y is it true So it's a true for example that an animal that does a mammal and it is venomous lays x So now the expert which in our case is wiki data can answer it we can query that we showed in our paper We can query that so we query it and if we if the wiki data expert does not find any counter examples It will say okay. That's maybe a true true thing. It's yes or If it's not a true implication in wiki data it can say No, no, no, it's not true and here's a counter example So this is something you you contradict by example. You say this rule cannot be true. For example When the street is red does not mean it has rained right it could be in the cleaning service car or something else so our idea what now was to Use wiki data and as an expert but also include a human into this loop So we do not just want to ask wiki data. We also want to ask a human expert as well so We first ask in our tool um the wiki data expert for some rule After that we also inquire the human expert and he can also say yeah, that's true. I know that or no no wiki data Is not aware of this counter example. I know one or the other case. Oh wiki data says this is true I'm aware of a counter example Yeah, and so on and so on And you can represent this more or less as just some mathematical picture It's not very important But you can see on the left like there's an exploration going on just wiki data with the algorithm on the right An exploration a human expert versus wiki data, which can answer all the queries And we combine those two into one small tool under development So back to mix Okay, uh, so for for that to work We basically need to have a way of viewing wiki data or at least parts of wiki data as A formal context and this formal context. Well, this this was a binary table So well, what do we do? We just take all the items in wiki data as objects and all the properties as attributes of our context And and then have an incidence relation that says well this entity has this property So it it is incident there and then we end up with a context that has 71 million rows and 7 000 columns So, uh, well that might actually be a slight problem there because we we want to have something that we can run on actual hardware and not on a supercomputer. So, uh, let's let's maybe not do that and Focus on a smaller set of properties that are actually related to one another through some kind of common domain Yeah, so it doesn't make any sense to to have a property that relates to spacecraft And then a property that relates to books That's that's probably not a good idea to try to find implicit knowledge between those two But well two different properties about spacecraft that that sounds good, right? So, um, and then the interesting question is just how do we define the incidence for for our set of properties? And well that actually depends very much on which properties we choose because it does For for some properties it makes sense to to account for the direction of the statement. So there's a property called parent um Actually, no, it's it's child and then there's father and mother. So, um, and and you don't want to turn those around Yes, you you want to have um A is a child of b that that should be something different than than b is a child of a Um, then there's the qualifiers that might be important for some properties Um, so receiving an award for something might be something different than than receiving an award for something else But uh, well receiving an award in in 2018 and receiving one in 2017 That's probably more or less the same thing. So, um, we we don't necessarily need to to differentiate that And there's also a thing called subclasses and they form a hierarchy on wiki data And you might also want to take that into account because while winning something that is that is a noble prize That means also winning an award itself and winning the noble peace prize means winning a peace prize So that that there's also implications going on there that you want to respect. So, um And to to see how we actually do that. Let's look at an example So we have here. Well, uh, this is Donna Strickland and and um Ah for for god's first name, uh ashken This is one of the the people that won the noble prize in physics with her last year And also jeramuro. That's the the third one. And they they all got the noble prize in physics physics last year. So, um We have all these statements here And these two have a qualifier that says with uh, and jeramuro here. Um And I don't think the qualifier is on this statement here actually But it doesn't actually matter. So, um, what what we've done here is well put all the entities In this small graph as rose in the table So we have strickland and marua and ashken and also anald and kirie that are not in the picture But well, you can maybe remember that and then here we have awarded And we scale that by the instance of the different noble prizes that people have won Right, so there's the physics noble in the first column the the chemistry noble prize in the second column And just general noble prizes in the third column Uh, there's awarded and that is scaled by this with qualifier. So awarded with uh, jeramuro and Then there's field of work and we have lasers here and radio activity So we scale by the actual field of work that people have And well, then if we look what what kind of incidents we get for donna strickland Well, she has a noble prize in physics And that is also a noble prize and she has that together with jeramuro And while she has field of work lasers, but not radio activity Then marua itself, uh, he has a noble prize in physics and that is a noble prize, but none of the others Um ashkin gets a noble prize in physics and that is still a noble prize And he gets that with jeramuro and also he works in lasers Um, but not in radio activity So francis armold has a noble prize in chemistry and that is a noble prize And uh, marie curie. Um, she has a noble prize in physics and one in chemistry and they are both a noble prize And she also works in radio activity, but lasers didn't exist back then. So she doesn't get field of work lasers and Then basically this table here is a representation of our formal context So and then we've actually gone ahead and Started building a tool where you can interactively do all this thing Is things and it will take care of building the context for you You just put in the properties and while tom will show you how that works So here you see some first screenshots of this tool. So, uh, please do not comment on the graphic design We have no idea about that. We have to ask someone about that But you're just into logics more or less on the left you say the initial state of the game on the left You have like five boxes. They're called countries and borders credit cards use of energy What memory in a computation And space launches, which are just presets we define for this can you can explore for example In the case of the credit card you can explore the properties for meke data, which are called card network operator and fee So you can just choose one of them or on the right custom properties You can just input the properties you're interested in in meke data Whatever one of the seven thousands you like Or some number of them on the right I chose then the credit card thingy and I now want to show you what happens if you now explore these properties, right? So the first step in the game is that the game will ask I mean the game the exploration process will ask is it true that every entity in wiki data will have these three properties So are there common among all entities in your data, which is most probably not true, right? I mean not everything in wiki data has a fee at least I hope so what I will do now I would Click the reject this implication button so the implication nothing implies everything is not true In the second step now the algorithm tries to find the minimal number of questions to Obtain the domain knowledge so to obtain all valid rules in this domain So next question is is it true that everything in wiki data that has a card network property also has a fee and an operator property and Down here you can see wiki data says, okay, there are 26 items which are counter examples So there are 26 items in wiki data which have the card network property But do not have the other two ones. So 26 is not a big number. This could mean. Okay. There's an error So 26 Statements are missing or maybe that's that's by by really that's the true case. That's also okay But you can now choose what you think is right. You can say oh, I would say this should be true Or you can say no, I think that's okay one of these counter examples seems valid. Let's reject it I in this case rejected it The next question it asks is it true that everything that has an operator has also a fee and a card network Yeah, this is possibly not true. There's also more than 1000 counter examples one being I think a telecommunication operator in Hungary or something so We can reject this as well Next question everything that has an operator and a card network So card network means like visa mastercard whatever all this stuff Is it true that they have to have a fee? Hmm wiki data says no It has 23 items that contradict it But one of the items for example is the american express gold card I suppose the american express gold card has some fee. So this indicates. Oh, there's missing data in wiki data There is something that wiki data does not know but should know To reason correctly in wiki data with your sparkly queries. So we can now say yeah, that's uh, that's Not a reject. That's an accept because we think it should be true But wiki data thinks otherwise And you go on and you go on this is then the last question Is it true that everything that has a fee and a card network should have an operator? And you see oh no counter examples This means wiki data says this is true wiki data says there is no count examples if you ask wiki data It says this is a valid implication in the data set so far which could also be Indicating that something is missing. I'm not aware if this is possible or not, but okay for me It sounds reasonable everyone has a fee and a card network should also have an operator that which meets a bank or something like that So I accept this implication And then yeah, you have won the implication game at the exploration game Which essentially means you've won some knowledge. Thank you And the knowledge is you know, which Implications in wiki data are true or should be true from your point of view And yeah, this is more or less the state of the game so far as we programmed it in October and the next state will be To show you some how much does your opinion of the world differ from the opinion that is now reflected in the data So is what you think about the data true Close to true to what is true in wiki data or maybe wiki data has wrong information You can find it with that but max will tell me more about that Okay, so uh, let me just quickly Well come come back to what we have actually done So we offer a procedure that allows you to explore Properties in wiki data and the implicational knowledge that holds between these properties And well the the key ideas here that when when you look at these implications that you get Well, there might be some that you don't actually want because they they shouldn't be true Um, and there might also be ones that you don't get but you expect to get because they should hold And and these unwanted and or missing implications They point to missing statements and items in wiki data So they show you where the opportunity is to improve the knowledge in wiki data are And well, sometimes you also get to learn something about the world And in most cases, it's that the world is more complicated than you thought it was And and that's just how life is but in general well implications can guide you in your way Of improving wiki data and the state of knowledge therein So what's what's next? Well, um, so what we currently don't offer in the exploration game and what what we definitely will focus next on is having Configurable counter examples and also filterable counter examples right now You just get a list of a random number of counter examples And well, you you might want to search through this list for for something you recognize And you might also want to explicitly say well, this one should be a counter example And that's definitely coming next Then well domain specific scaling of properties. There's still much work to be done currently we we only have some very basic support for that so you can have properties but You you can't do the fancy things where you say well everything that is an award should be Considered as one instance of this property That's also coming and then What tom mentioned already well compare your knowledge that you have explored through this process Against the knowledge that is currently on wiki data As a form of seeing well, where do you stand? What is missing in wiki data? How can you improve wiki data? And well, if you have any more suggestions for features then just tell us There's a github link on the implication game page and here's the link to the tool again. So Yeah, just just let us know open an issue and Have fun and if you have any questions, then I guess now would be the time to ask Thank you. Thank you very much. Tom and max so we will Switch microphones now because then I can Hand this microphone to you if any of you have a question for our two speakers. Are there any questions or suggestions? Yes Hi, thanks for the nice talk. I wanted to ask what's the first question. What's the most interesting implication that you've found? Yeah, that would have made for a good backup slide The the most interesting implication so far The most basic thing you would expect everything that is launched in space by humans. I think that landed from space It has a landing date also has a start date. So nothing landed on earth, which was not started here. Yes Right now the game only helps you Find out implications. Are you also planning to have where I can also add data? Like for example, let's say I have 25 noble laureates who don't have a noble laureate ID Is there is there plans where you can give me a simple interface for me to google and add that ID because It would make the process of adding new entities to wiki data itself more simple Yes, and that's that's partly hidden behind this configurable and federal counter examples thing We will probably not have an explicit interface for adding stuff But most likely interface with some some other tool built around wiki data So probably something that will give you quick statements or something like that But yes, so adding adding data is definitely on the roadmap Any more questions? Yes Wouldn't it be nice to do this in other languages too? like Yeah Actually, it's a language independent So we use wiki data and then as far as we know wiki data has no language itself You know, it has just items and properties or cues and piece and whatever language you use It should be translated in the language of the properties If there is a label for that property or for that item that you have so if wiki data is aware of your language we are Oh, yes Of course the tool still needs to be translated, but that is to itself. It should be Hi, thanks for the talk. I have a question right now. You can find missing data with this right or surplus data Would you think we'll be able to find wrong information with a similar approach? Ah, well it Actually, we do. I mean if wiki data has a counter example to something something we would expect to be true these this could Point to wrong data, right if the counter example is a wrong counter example If there is a missing property or missing property to an item Yes Okay, I get to ask a second question So The horizontal axis in the incidence matrix You said it has seven thousand It spans seven thousand columns, right? Yes, because there there are seven thousand properties in wiki data Um, but it's actually way more columns, right because you multiply the properties times the arguments, right? Yes, if you if you do any scaling then of course there will give you multiple entries, so that's what you mean with scaling basically Yes, okay, as you can see here already seven thousand is way too big to to actually compute that How many would it be if you multiply all the arguments? I I have no idea. Um, probably a few million Have you thought about a recursive method as counter examples? maybe Wrong by other counter examples Like an argumentative graph or something like this Um Actually, I don't get it. How can be a count example be wrong through another count example? maybe some example says that cats can have golden hair and then Just another another example might say that this is not a cat or like Also, they are the property to be a cat or something cat-ish is missing then, okay No, we have not considered so far deeper reasoning so far this, um Horn propositional logic, you know, it has no contradictions Because all you can do is you can contradict by a counter example But there can never be a rule that is not true so far Just in your my opinion, maybe but not in the logic So but we have to think about it that we have bigger reasoning, right? So Sorry quick question sir because you're not considering all the seven thousand odd properties for each of the entities, right? What's your current process of filtering? What are the relevant properties? I'm sorry. I didn't get that Uh, well, we we basically handpicked those. Um, so you you have this input field here Where you can go ahead and and select your properties We we also have some predefined sets okay And and there's also some some classes for groups of properties that are related that you could use if you want bigger sets For example space or family or what was the other? Uh awards is one, yeah It depends on the size of the class for example for space. It's not that much as 10 or 15 properties It will take you some hours, but you can do Uh, because yeah 15 or something like that. Uh, I think for for family. It's way too much. It's like 40 or 50 Properties so a lot of questions I don't see any more hands. Maybe someone who has not asked a question yet has another one We could take that otherwise we would be perfectly on time and maybe you can tell us where you will be For deeper discussions where where people can find you Probably at the couches Yeah, or just running around somewhere. Um, so there's there's also our deck numbers on the slides here It's six two eight four for tom and and six two seven nine for me. Um, so just call and Well, then well, we're hanging around. Thank you again. Have a round of applause. Thank you