 Thank you very much for inviting me and I would like to start my talk by saying I found the poem before very interesting about the reigning facts and this is exactly what text mining is trying to do is to put some order in all those facts that are constantly reigning or arranged upon us. So the overview of my talk would be to, I assume that some people of course might have heard about text mining and natural language processing but my background would be to motivate why we're working on text mining, why text mining would be of interest to new informatics. Then I will talk about text mining tools, the types of technologies that not only us but other people have developed to allow semantic annotation. Semantic annotation allows us to create different applications including search and also the last application I will talk about is how we can link knowledge pathways and models with a literature. In order to do that we need to have the infrastructure so in the middle of my talk I will discuss about some initiatives that the community and also us have been creating on integrating text mining processing and annotation. So a few words about our center, NACTEM or the National Center for Text Mining was established in 2004. It was really funded because the biologists were very much needing text mining. So the first four years we focused on providing text mining services for biologists but very soon it was clear that we need to expand to other areas like medicine and also social sciences and currently since 2011 we're a full sustainable center working in different areas but still our major focus is biosciences and by medicine. I don't think I need to say much about why we need text mining. We all know that we have a tsunami of information. The types of documents we have to mine and we have to extract information from are different from abstracts, from midline to full papers, to clinical trials, to medical records, textbooks, monographs but also what we call online discussions and online fora, great literature and so on. About the size of midline from 14 million abstracts that we had in 2005 now we have more than 22 million abstracts. So the overwhelming information is textual and all this textual information is of course in unstructured format. So what we are doing in text mining is to render this information which is in a structured format to structured information and to be able to extract knowledge and information which is hidden and into facts. So now I will talk a bit about how are the processes or what are the pipelines we are using to analyze text. So if we start from a very simple sentence, a normal sentence that we have in the literature, typically which is unstructured, typically we go through different processes of text mining. I don't mention all of those here for obvious reasons, but normally we have to find the words, to find the sentences, to tokenize, but also to what we call to tag. Is this a noun, is this a verb, that's linguistics, but this is an important part of what we are doing to be able to understand the text. Depending on the levels of sophistication and the levels of application we use different types of tools. The second part of what I have here in my pipeline is a name meant to recognition. By that we mean we have a token and we want to say is this a gene, is this a disease or a metabolite. So how we can automatically recognize these processes as well. And then we need to create more structure into these types of representations. So we use different parsing methods to be able to create, as you see here, different trees to create syntactic structures and superimpose semantics on the syntactic. So by following these pipelines that the whole community natural language processing is following, we are creating, taking a structure text and we create structure. In the end, we create different types of layers of annotations to be able to extract facts and events dealing, for instance, with negative regulation or positive regulation or phosphorylation, etc. An important part of this are the resources and the resources are ontologies that the community is building and these are actually the resources that give us semantics. And also different lexical, much more lexical representations that we are needed. So as I will talk towards the end of my talk about bridging pathways with text, what we are trying to do is to put together literature and knowledge. Now this type of mapping is not trivial. In the literature in text, we have words, we have phrases, we call them sentences and paragraphs. But in knowledge, in the knowledge domain in biology and neuroscience, we have semantic entities or diseases or brain regions and we have what I will call, I will discuss a lot, events. This is going to be a big chunk of my talk. By events are much more interactions between those entities and there are quite more abstract representations of what a scientist understands. For example, phosphorylation or inhibition, how do we bridge those two domains together? We use domain knowledge and we use repositories and databases and ontologies. But the mapping still is not trivial. So it's always context dependent, task dependent and we need then to create this type of tools of name entity cognition, relation extraction, event extraction to be able to extract assertions and claims. So we have, as our speaker said before, a different type of paradigm right now of sharing information and knowledge. Traditionally, we use the information retrieval to extract the documents. We use different databases but we also have what we call research data and information from the semantic web and text mining and natural language processing methods. All these disciplines are merging and all these disciplines allows to do knowledge sharing. So what is the impact of text mining? In the next part of my talk, I will take you step by step to different types of technologies, hopefully not too much in detail, to show you how actually we can start by simple types of information like discovering concepts or what we call them technical terms, name entities, all these course methods are automatic. We can go to a step further extracting relationships between the entities and events as well. For instance, in order to extract causality in text, we need to have, to do most of the presentation of relationships. But also we have methods to extract the dimensions, the views of these assertions. Are these statements hypothetical? Are they negated? Have they been mentioned in this source or in another source? Are we certain about this claim? So by allowing multiple levels of semantic annotation, what we can do is create tools, we can allow or we can cluster documents, we can classify documents, we can do information access which goes beyond index terms and much more advanced applications like trying to reconstruct or even construct pathways and models from the literature. So the way we are actually allowing to reach content, we're doing by layers of semantic annotations. The first one is terms and I will describe very briefly the tools you have developed for that. Semantic entities, relations among entities and those are projects and services which are currently online so you can have a look on our website and see how they work. Terms and associations, different systems which extract direct and indirect associations and last discourse information from text which can allow us to actually classify information into negation or not. So the simplest, if we think about text mining pipelines into a different layers of sophistication perhaps the simplest is extracting terms. But this is an important part because technical terms and concepts characterize documents semantically. So we know we're talking about the neuroscience topic or about the specific on Dutch Heimler's disease because the terms tell us so. But when we're trying to extract terms, things are not so simple. Typically we have to deal with what we call normalization but that is variability. So we have lots of spelling variants and acronyms, they all mean the same thing. So we need methods that allow us to group these variants into the same concept. And in addition, we need to disambiguate. Very often we have, especially with acronyms, a word meaning different things in different domains, even within the same area, but also they could be referring to different species. So species disambiguation and general disambiguation is a major issue, a major challenge in natural language processing and something that has been, the community has been working over several years. So very briefly to do the first part, term extraction, I will introduce term mine which is a very simple term extraction service which is available on our website. The only thing that we need to have is the part of speech tagging which is basically recognizing if a sequence is a noun or a verb or an adjective. Then you incorporate normalization to be able to automatically extract terms. So the way it looks is if you have a text or a collection of documents, then they're highlighted, automatically highlighted as what are the most important terms and then they are ranked. And by the ranking then we can actually automatically identify that this text talks about a specific concept. Now this is very important because by doing that we can immediately use those terms as instead of index terms so we can do classification and clustering. The next part of actual normalization is recognizing acronyms. And here I have an example like VLA which is quite relevant to this audience which has about 13 different meanings from a line. So these have to be extracted, they have to be normalized so you find all the different synonyms and then disambiguate it. So what actually we are doing by extracting acronyms, we are creating a databases, a dictionaries automatically giving all the expanded forms. About abbreviations are quite a sort of well-known problem in biomedicine because 81% of abbreviations are ambiguous. And each abbreviation has about 16.6 different senses. So it is very important to be able to find disambiguation tools. So this is actually another screenshot of a disambiguation acronym service we have created which allows you when you have a text to find what we call the local, the acronyms that have been expanded in text, and the global, the acronyms that have not been expanded but we need to disambiguate providing the full form. So why we need these tools is extremely important as I will say later for search. So if we do query expansion, if we do query, it is very important to be able to incorporate these types of acronyms and expanded acronyms in our search. But now I am going to something which is much more, these are the simpler things, more challenging. If we want to extract text, we need to deal with more structure representations. And one of the major issues is if we want to extract causality from text, how do we go about it? So again, if we start from raw text, then we go through the pipeline and we see the following sentences about the interaction of IL-10, we can of course find the name entities that we are talking about, proteins, but then the next step is to find the events. Now, I will spend some time about the notion of event because this is quite fundamental in the types of systems that we, not only we are by the communities building currently to try to reach, combine knowledge with text. As we can see here with events, we have words like for instance induces, that these actually for a biologist could mean something like a positive regulation or phosphorylation. If we put all these things together, we want to in the end to say that this sentence, the first sentence, it causes another sentence, but to do that, we have to go beyond actually the strings in text. So I will just spend a bit more about event extraction. If we take now, this is another sentence about expression of Aurora B. Enhances phosphorylation of S6K1, we have the first one, we have triggers, a type of event, which is gene regulation. Now normally, we have triggers, something that tell us that this event is about expression and what we call as theme is what actually, what is the patient, what is caused by this event, by gene expression. Typically in text, we map these to what we call the subject and the object, but we cannot think about the causality as the subject and what is happening to an event, to an entity as the theme. So we have this simple sentence has three different events dealing with expression, with phosphorylation and gene expression, and also some more complex events like positive regulation, which includes a different event, like gene expression and phosphorylation. So to be able to, for a biologist, when we read this sentence, we understand, people understand that we were talking about phosphorylation and expression, but what we are doing is to capture this automatically. So this is actually what I talked before, bridging knowledge with the text. To do that, we have created systems, and this is something the community currently is very natural language processing community working in biomedicine, is working actively in methods and systems, but automatically extract events. So automatic connotation of events in text are normally consist of extracting the triggers, so basically the words, the phrases that characterize an event type, typically the verbs or nominalizations like inhibition, and then what we call typed arguments, what is for instance the causality, what is the site, so you have several arguments that are the context of this event, and then the attributes, is this something negated, is this speculative, and so on. So now I will talk about the tools that we have built to extract events, and some of the, I'll mention briefly some of the challenges that the community is currently working on. Typically most systems, most successful systems right now use a pipeline of methods for extracting events, and most systems are really machine learning. I don't want to go through too many of these details because perhaps it's too computationally linguistic for this audience, but typically in order to do that we need to use parsing results, parsing the results finders the structure, the syntactic structures, different dictionaries, and we can combine different methods to be able to achieve that. So currently the event mind the system we're using is producing one of the best results for several event extraction systems. Our community is also creating challenges which allows us to compare and evaluate our tools, so we're creating a notated corporate golden standards which allows us to compare and improve as well as our performance. So the event mind workflow is basically to solve different types of classification in a pipeline. So normally you have a trigger or an entity detector, then you have to find the relations between those entities, the arguments, the semantic roles. Normally an event is not binary and that's actually the difficulty, the different arguments you have the location, you have sites which have to be identified, and then what we call in hedging is the modality, if it's negated or speculated. So now really the problems that we have in event extraction, they're two. The first is actually how if we are training our tools in one corpus, how in one domain, how can we actually make it work equally well in another domain? So we're using domain adaptation techniques for that, and another challenge we have is what we call coreference which are very briefly introduced. So what we want to do is to use information from different resources and combine it from the community as much as possible. So to give you an example about domain adaptation, if we have a corpora say for instance on infectious diseases and another corpus on neuroscience and we want to train our model in the epigenetics, what we are doing is using with transfer information from external corpora to the new domain, epigenetics, using domain adaptation methods. I'm not going to discuss about stacking here, but these are the methods in the community where this community is working currently. Coreference is a very another tricky problem, and if you look at this sentence, at the bottom sentence we have different contents events of a recent gene expression, and it says of the gene B gene, then the expression of this gene was prolonged compared to that of C gene. So if we don't actually identify, if we don't link of this gene to gene B gene, then we will not be able to extract the event. So what I'm trying to show here is that language and how to find in text is not simply expressed. So we need to uncover this type of information, and we need to be able to link coreferring expressions between that gene, it refers to gene B gene and so on. So by combining actually domain adaptation and coreference, we can then improve the systems we are creating. And normally typically we calculate that in F score, which is harmonic mean between precision and recall. Another problem we have, and I might go a bit quickly on that even the time, is we want not domain adaptation is working extremely well in one domain. So you have a specific, you have a target corpus, but still you have a limited scope. So one of the challenges now to build models, which have a wide coverage of event extraction. So we can recognize many of these events in different domains using as many resources as possible, and actually going beyond domain adaptation. So if we want to build the corpus as we're doing for training our systems, which is really wide coverage is really impractical, because it's absolutely impossible to deal with so many types of semantic information. So what we have done is actually to use a combined use of different corpora. I'll give you an example here. So most types of resources sometimes they have a partially overlapping scope. So we have events in different corpora here, corpses and semantic annotated text, which have different scope. So what we are trying to do is, if we're trying just to merge this type of corpora, we have problems. So here is an example. If we see here, if we just merge for example a hypothetical corpus X, which is human P53 and NFKB, it has been annotated with organism and protein. And the other corpus is annotated P53 as protein, and NFKB as protein complex. If we just try to merge this type of corpora as they are, we can have what we call falsely created negative examples. So we need to find methods of alleviating this problem by actually unfilter the annotated examples. So we generate basically negative examples only from those which are appear as marked up as positive semantic types in text. So for instance, we know, we assume that NFKB will never appear as a positive example of the semantic type in this corpus, so we can filter it out. We have done the same thing for events. And without going through my details here, basically those are the problems we are trying to extract and recognize automatically events by using an partially overlapping information, semantic information, from different resources that the community is building. I'll skip the comparison because of not having much time, and I would just want to spend two minutes on the meta knowledge attachment of events. The last part in my layers of annotation was about discourse. And although it looks very linguistically, it's actually extremely important because once we have extracted statements and assertions, we want to make sure that the statement, if it's certain or if it's negated and what actually it means, what was the perception and the impression of the author, the intention actually of the author. If we go back again to the extracting information in events, in sentences, we start looking then at different linguistic cues in sentences. Like here, we have showed, and showed may talks about a type of analysis, but we can have an investigation, we can talk about methods. All the types of information are marked up in text. We have also something like we're unable. This is a negation. So clearly this statement tells us it's not just triples of information, but means something else. And then we have significantly. So we can say something with high confidence. And to express actually what we are doing automatically, we are taking a sentence like the one you see above. And we're marked it up with different types of information. But this talks about an analysis. The type of certainty is quite high. The polarity is negative. The manner that is expressed is high and the source is current. The author is talking, research has been doing in this paper and not in previous work. But we can combine this type of information and actually we can create new knowledge and negate it. So how this actually is working, and I'll go very quickly on that, we have built search systems which take the sentences and automatically mark them up and classify them as a statement. We have a specific certainty level or it's negated or it's analyzed. So we have search systems that actually are doing that. Now why this is important is a thing is extremely important because not all triples are the same. If you just say in a sentence in a document you need to be able to highlight the type of information and the provenance and the intention of the author. I'm going to stop now here about events and I'm going to talk very briefly about infrastructure before I go to the search systems that we have created. Now infrastructure is extremely important in the community and in our, one of our preoccupations from the beginning in the center was to create platforms, interoperable platforms that allow the community to combine and share resources and tools. So in our case we have built two infrastructures and very much based on IBM's YWIMA and this is something again our community is very actively working on. So the two systems you compare in Argo, one is much more a standalone Java application and can be installed on a local computer. And Argo is a web-based application and has lots of interactive components and it has been used and is currently used for bio-curation. So what we are actually trying to do is to bring different tools together, allow the community to create workflows automatically from the text mining tools we have developed and actually see the annotations and to manipulate the annotations automatically. So again the workflow is more or less that people know about scientific workflows but we have also text mining workflows which are exactly the processing pipelines I talked in the beginning. So the different components we put together and clearly we don't want to use one of the problems we always had in the community is how you translate from a type system to another. So to avoid these type of problems either we wrap those tools into a similar type system or we have tools that automatically do the translation for you. So what we have created is actually a comprehensive biomedical text mining tool kit which has a number of corpus readers. Those are annotated corpora that allows to evaluate different text mining tools. So all these types of annotated corpora I mentioned before. You have different format readers, you have writers and you have syntactic tools, semantic tools, it's also multilingual. So we have about 200 text mining components not only are all for the community. So what do we need that? So as I said before most of the tools, most of the analysis we are doing is task dependent. So a name entity cognizer can work extremely well in a specific corpus, but then it doesn't work for your task. And you say, well, most people say, well, a text mining doesn't work. Well, it doesn't work for the specific versus corpus because you need to customize and adapt it. So the way we are doing it is actually to create evaluation workflows as we find here to evaluate the performance of different systems. So you have two name entity cognizers and normally you evaluate them against the gold standard. So that's the one way for the community to see what is, for instance, you see here in this example, you have a text has been marked up and you see what's the gold standard in red and how different analyzers perform. But then we thought that this was not sufficient and we created another much more general and more sort of bigger infrastructure called ARGO which has different processing components taking it from your compare, but also it allows you to create automatically workflows and actually you can mark up once you extract the documents, you can do the processing remotely and last you can look actually manipulate the manual annotations. It's also going to be combined with different types of applications. So how this works, it's actually the view allows you to manage your workflows. It's the main view of ARGO. So in this case, you can, for instance, you want to find to disambiguate species or you want to extract documents or finding specific name entities. The second part is actually the processes view and once you start, you run a workflow, then it tells you the progress of the different workflows that you are running. So normally ARGO automatically switches to this view from the workflows view whenever you start a new item. And here are the different documents that have been actually extracted and allow you to manage your file space. Let me show you how it works. So basically the middle panel allows you to start now manipulating, dragging components and creating workflows. So you want to find a search, you want to find that use a service as I'll tell you afterwards about Clio and then you can manipulate, drag and drop different tools to create this type of canvas. The workflow has automatic processing and also manual correction with different search engines and different types of multi-format serialization. So you can actually combine them to RDF or XMI or whatever. And the important part of this is also Sparkle which allows us actually to translate between different components. So I'm going to be quickly now because I'm a bit worried over the time. So once the system automatically annotates the document, what the curators are doing are to filter and convert different annotations. So for instance, as you can see here, you can select or deselect and you can also select a link with different ontologies as we have done here with the KB which actually allows you to see also the select specific entry and see the syntactic structure. Now I'm a bit late in the last 10 minutes I'm talking about some of the services all these text mining tools and the infrastructure has been combined to create two things, pathways and search systems. For the search systems, we have created a number of components. The first is doing search based on name entities rather than the string you can search on genes or proteins or diseases. And the first system I'll show you is called Clio is a faceted semantic search. So for instance, if you're putting a, actually I cannot see here so I have my specs. So if I, for instance, my first query is ATP and you have about 11,000 or 17,000 queries which is a lot. But if you specify here and you put gene, you have much less results or documents, substantially less. Then you can start manipulating the types of facets by for instance, putting different queries like here, you can say for instance, I want this gene with this disease. So here we chose Alzheimer's disease, you have 11 documents and at the same time you can even specify even more. And you might say, I want to disambiguate the species and you have human, you have seven documents from the 13,000 documents. This simple example is actually to show you how by doing an analysis of semantic types it can allow you to create a much more interesting and pertinent search systems. The next system service you have provided is actually a searching semantic search based on facts. And this again uses name entity but also you can do a query. So for instance, if my query here is what activates p53, you might have the sentences, automatically the sentences from the documents are extracted and you can see here that you find the snippets of information from the whole literature which is pertinent to your query. I mentioned events so you can do the same search by going not through triples actually and activates but through this information like localization or a different types of biological events. In this case, the user can actually specify a higher and upper level of analysis, localization, you can specify a TNF alpha. And again, you can automatically extract the sentences from text and show you exactly the types of information that you need. So this actually search systems allow you to really manipulate the information and find it much more pertinently. We are also working with the Europe PMC creating the evidence finder. The evidence finder is actually automatically it's gained based on full parsing all the text mining processing tools I have expressed before to allow you to search. And this is on full papers about two and a half million full papers. So in this case, the system is generating questions. So you might give a very simple example of a query, the questions are automatically generated. And if your, for instance, are here, you click on a query, you extract the snippets of information automatically. And you can see how are actually highlighted in text. I'll just go very quickly now to show you other types actually of association here. So all the systems, they're finding what is known in text. But another step further from that is to find what is unknown, what are the direct and the indirect associations. And the system's factor is doing that. So if I ask about ecadirin through indirect associations through a gene or another disease, what you find here is the element of surprise. So you don't expect to find these concepts co-occurring in the same document. They're co-occurring in different documents, but because A occurs with B and B with C, we make an assumption as well that A is related with C. So in this case, your query ecadirin is associated with Parkinson's disease via these specific genes. So this type of information is quite important because it allows you to visualize it because it can tell us what is hidden. Now I'm going to actually the last five minutes of my talk why I told you all that. How did we put all those things together? And why did we do all that? One of the extremely important applications is actually linking pathways and models with the evidence from literature. We all know that pathways are extremely difficult to construct it manually. We know that actually they have to be maintained and they are curated manually. So all of the items and the systems and the services and the text I told you before, the events are put into practice to do that. So currently, PathTex, the system of development supports SBML models. So what we are doing, if you're looking at the SBML model and you took a reaction or a reactant and modifiers, we are automatically translating this reaction into queries. These queries are calling the different text mining systems that have just very briefly mentioned the fact that the oil media. But the core, the fundamental part of this work is the notion of event. That's why I spent so much time talking about events. Events, in order to link the evidence from the literature to text, you need to be able to extract events and in effect map the reactions in the pathways to what are events which are scattered in text. And even more so in pathways, you have very specific information about reactants and modifiers. These do not necessarily map in text into the same thing. So maybe a reactant, you will not find exactly the same information in text. By using the representation as a very briefly tackled before, we can extract this information automatically and guess more or less what is going to be the context. So by basically, by looking at the models, we can not only extract documents based on the relevance of the reaction. So basically, your documents are extracted, how relevant are they are, the specific reaction in a pathway. But at the same time, we have ranked the results, we have developed different ranking mechanisms to combine the information from these services and provide one-in-one interface. I don't have much time, but the query generation basically is, as I said, based on the three different types of systems. So you have each system demands different types of queries that clearly demands semantic types and the media is like the events. So again, the relevant, the ranking of the documents is based on the types of the reactions that we have in the model. We, in order to do that, so if you have an SPML, and currently we have only a self-designer, you have different types of reactions and reactants. These are automatically, actually, we have created rules to find these reactions into event representations. And this is how you're doing the query. So just to more or less finish about the ranking. So you combine, we combine different search systems that we have developed. And again, this could be augmented because if you have the infrastructure, as I mentioned before, you can combine different other technologies. Each service combines different tools, but interoperability achieves that to use different tools with them together to be able to do that. You can use different ranking mechanisms. We used rule-based and machine learning. I think it's a detail here. But the important thing to remember is that actually there are possibilities of finding and ranking documents based on the actions and using this information, actually, to support not only the curation of pathways which are currently doing, but also to go a step further and allow to suggestions to maintain and also augment the pathways. There is a web interface and it's also an API. So you have both a web-based user interface and set for programmatic access. And as you can see here, if you go on our website, you can select one of the pathways we have developed. We have actually included or choose your own. And you can do basically query based on a reaction. And then you have the rank documents. You have a kind of a highlighter in blue. It's a level of confidence based on our results. I think I'm going to, because I don't have much time and I'm going through that. So I'd like to conclude now. Very simple conclusion is I hope that I just demonstrated to you that text mining is an enabling technology which is used for evidence-based knowledge discovery. Event extraction, although it sounds pretty challenging, this is exactly what the community is doing right now. It's very important for the types of applications that are needed in this field, in your field. And again, infrastructures are extremely important and integral part of all the bioinformatics and neuroformatics applications and the acknowledgments about our funding and the team at McDonough, thank you. Okay, we have time for one or two questions. So in your introduction, you mentioned that one of the applications of this technology is clustering of documents. And as a neuroscientist, I go to a conference with 30,000 attendees and at least 10,000 abstracts. When we send in those abstracts, we have to specify what area to put them in. And I feel like I'm using concepts of the 1970s that they do this. Right. And how close are your technologies to allowing us to really kind of cluster those abstracts in more interesting ways? Well, we have several systems. We're doing it real-time, which I can show it to you. So, give it to the society then. Yeah, absolutely. But we have, actually, we created clustering methods that allow you to do it real-time. So given a query, the system automatically generates semantic labels, clusters. And they're based on the termine on technical terms. So you have to extract the terms and use different, of course, all machine learning. But you can automatically cluster the documents based on semantic labels. But you can choose. It presents semantic labels to you. But you can say, well, I want now to find the documents, zoom into the documents on this specific cluster. Let's, yeah, absolutely. We have it on our website. We've done this for clinical trials, for food security, and even for newswire. So. Great man. I've been doing a lot of documents that contain information about productivity. So there you find many sentences that are ambiguous. Example is, the notion of biofacility at A, which is also referred to as B. And there you don't know if that's the best people to behave or if people do the dog's biofacility at A. So I was wondering, if you, with your technology, can create a service that we can run this publication through, preferably before they're published, to see if the publication is mineable, by human or by a public algorithm. Well, that's a very interesting, because it goes into them, you said about how we offer, interact with text mining tools for offering, basically, and scholarly communication. This is exactly what text mining can do. So we were talking, of course, with different people who create offering tools, which allow you to disambiguate, suggest to you the different types of ambiguity, and suggest the types of disambiguation as well. And all those tools are quite interactive, because text mining suggests is the same thing with associations. But then you can really interact, you can really suggest and choose, for example, we can link with ontologies to allow you to disambiguate, as I showed before, with infrastructure. We'll link with Kaby, and then it's up to the researcher to say, well, I want this specific entry. So absolutely the technology is there to be integrated into a kind of offering environment to allow people to do that, and to actually link with other authors and other types of statements. For instance, one of the things that I said before about facts which are negated. So you're writing, for instance, your paper. And it's not only the disambiguation. You think that this is something that you're at the paper and you say, well, I'm pretty certain about this. But the system can automatically say, hang on, this specific assertion has been speculated by X, Y, and Z, and has been negated. What are you gonna do about this? So this is exactly the types of formation that are of interest, yes. One last question. So you and your colleagues seem to make impressive progress, but I'm wondering, I want you to say, kind of agnostic about the quality. And as you're getting to evidence, scientific evidence, one of the concerns is that different articles have very, sometimes vastly different quality. And I'm wondering if you've been in the sense of learning quality as well. Yeah, that's a kind of a very interesting question. The different types of measuring quality. First of all, quality is for the text mining tools to quite extend our results, are actually close to what a human can do. So the performance of our systems, it very much depends in the machine learning based methods to the types of annotations. It has to be as close to a human knowledge. So we can then ascertain to what extent our systems perform well. But the second part of your query is how do we know the source? How do you know that this specific type of information comes from a credible, you know, the second part? This is actually, again, quite easy to do by putting a kind of ranking based on, for instance, this paper is from a good journal or a bad journal. That's very easy to integrate in the infrastructure. So the different types of ranking and evaluations, types of certainty and levels of certainty of assertions we can incorporate by looking, for instance, at parameters like X journal, X impact factor is good or X author has a good reputation. Those are different types of parameters one could introduce in a system when you're doing your ranking and classification. Absolutely. Okay, let's thank Sofia again. Thank you.