 Probably some background on the company, basically the top of the Reuters is a large multinational corporation and the actual data assets of the company basically roughly lie under these kind of five umbrella divisions, so there's finance and risk, legal, tax and accounting, IP and science and Reuters news. And because of that kind of umbrella organisation, the actual data assets lie directly underneath those and for good reason. So some of them are due to the growth by acquisition nature of the company, so that means some of the data is siloed because of that. But also it means that the actual content is segregated by business domain. And the benefits of this is that any kind of data assets that the company has designed, content controlled, edited and published by each business independently in a federated manner. So what that means for our customers is that they may buy licenses to data sets from different divisions of the company. So in order to use those, they want to use those with their own data, but also for example with open data. So with things like dbpedia, open street map, and then of course to query across all of those in one easy way. And that's where we get to the knowledge graph. So a knowledge graph is essentially a way of representing data like this. So we have to say there are kind of objects. For example, in this case there's a company. The company has some attributes. So it has a name as a time it's been incorporated, a website. And then it links to another object. In this case it's got a quote and that quote has some other attributes. It's Rick, it's ticker exchange and so on. And you see similar things from other parts of the company as well. So for example in IP and science, something similar, same company, but it doesn't have that kind of wealth of organisational data. It instead focuses on something else. For example, it has a product icon that uses a patent with this application number that's filed and published on these dates. This is granted to the company. So this kind of data lives across these domains. But what you really want to do of course is to bring those together and link the companies where they're the same entity. Of course in order to do this this requires concordance of entities across different business domains, which is a challenge. But once you manage to do that of course then you can end up with a system like this. So this is a representation of the whole company's data almost across the different business areas. So for example there's corporate information about companies, directors, the products they make. Information from FNR about instruments and quotes and fundamentals. Legal information about cases and dockets, citations to tax, to tax information from tax and accounting. And of course IP and science information, products from those companies have patents, have trademarks. And of course all of these things also get mentioned in the news from Reuters. Now identity itself is a large challenge. So if you just take the example of organisations, organisations could have different IDs in every area. So within finance and risk, tax and accounting, legal, IP and science and news, they have different identifiers. But also it kind of gets more difficult than that. So even within one division like finance and risk there may be sort of more than 20 identifiers that a company may have dependent on jurisdiction or business area and kind of what kind of products and services they actually do. So internally in order to solve this Thompson Royters created the content marketplace. Now this addressed the problem by introducing an information governance framework. So this created authorities within the company to maintain common reference points for major objects. Things like organisations, people, instruments, quotes. And earlier there was a talk by one of my colleagues Bob who said that we've called those PIRM IDs and these are the common method for identity across the company. And because our customers have the same problem and so do the open data community we can't really introduce rules that we can sort of do in our own company. That's not practical to do to other people, that's not going to happen. So instead we would offer a solution of open data of saying there's now an open PIRM ID system where we give away a big chunk of our data for you to be able to use, to be able to reference and then kind of grow a network effect on an ecosystem around this. And the PIRM IDs are useful as a common reference point and there's a few reasons for this. So they are specifically maintained for identity reference and this isn't a side effect of some other data gathering process. And PIRM IDs themselves they're well described and they're maintained relative to the real world. They're maintained as a part of what our business actually does. And you know the coverage and granularity reflect the needs of the community and of our actual data products. And it also means that if you use PIRM IDs yourself then you know that everyone else knows what they are and that they can access them and there's an open license and they can get the additional data and links as well. In terms of how we've done that, the content marketplace itself revolves around a data writer and registry. So this is the current ISO, IEC 11179 type of enterprise mess data system. This is the kind of thing that's sort of typical I think with kind of XML interchange. For the actual knowledge graph itself we've invested in using semantic web of big data technologies heavily. So the whole graph itself is RDF and we use AL to specify all of the ontologies within the different business areas. The querying is done with Sparkle and then we also use SPIN to do inference rules. So SPIN is basically a semantic web rule system where you can use Sparkle construct queries to create rules over the graph. And then internally we use GenR and Sesame to do some of the actual programming. In order to actually deal with the large volumes of data we've got across the business, we use basically the Apache big data ecosystem. So things that any of the all sort of technology people recognize, how do you map reduce Spark, Kafka, Uzi, Cassandra, Elasticsearch, the sort of benefit to this really is that kind of those technologies are things that kind of give us this strong platform on which to build and that kind of level of support and kind of proven technology. And the content marketplace work itself, that gave us the linkage to all the perm IDs. And so we took those technologies and we took that system and we scaled them up to basically query and manipulate the data at scale. And the knowledge graph itself then allows us to provide lots of data but also provide lots of perspectives on that data. In terms of the process, we've got a minimal set of tools to basically get and put data into the graph and then determine the minimum viable set of data we had to bootstrap the graph so we could very early on show benefit to people who've invested in us doing this. And part of this process we determined that we had to maintain this federation of data internally to basically try and put as little restrictions on what the business areas are doing themselves so that these data authorities retain all of the editorial and publishing control that they've always had but instead build a kind of a connected tissue over those authorities that already existed in the company. And what that approach means is that if we can prove out that internally then we can use the same approach to link to customers and to link to open data as well where, again, those are federated independently controlled data sources. You'll be interested in some statistics of some of the data. So behind the POMID system there are a few of our data sets on metadata organisations and people. Those represent 2.27 billion triples. And on top of that we have some rules with SPIN to basically kind of reverse and predicates and connect relationships together and things like that. And so there's additional 78 million triples on top. To sort of put that into perspective, wiki data, for example, is 367 million triples, dupedia 474. Freebase is 2 billion. So the data that we have on things like organisations and people is sort of equivalent to something like Freebase. And then there's things like Uniprot, sort of very large protein databases in the bioinformatics sphere. Things like 7 billion. So the thing about our knowledge graph here is the data that's just backed up per my ideas is very large. And so this kind of gives you an idea of the kind of scale that you'll have if you kind of go down this route, if you do any work with knowledge graphs. It is quite a large amount of data that you'll end up working with. And one of the things you get when you have a knowledge graph is the ability to provide many views so you can answer many different questions. And what you kind of are aiming for is to provide many different lenses over the graph. And one way of looking at it is to say you want to query first for absolute facts. So things like patterns issued by a company, litigation history, market capitalisation history, things that are kind of solid numbers, but also to be able to make inferred and abstract connections, things that are relevant to the research you're doing or what you actually want to get out of the graph, to be able to sort by things. So sort by litigation history, but within only an industry sector and then wait by market capitalisation, things like that. And kind of be able to combine those absolute facts with the kind of inferred, abstract data that you've created. And then to also iterate and build layers of these queries on top of each other. So once you've generated those queries, then generate more on top of that and increasing in sophistication and complexity as you go about doing whatever kind of research or investigation analysis tasks you're trying to do. And then also to be able to handle the relative truth of those facts. So once you've built up layers of complexity on those queries, where those numbers come from, how have they been inferred, so you can trust that data actually comes in somewhere that's something you can use. And finally, a quick use case of some of the data. This is an example screened from the Open Perm ID website. This shows some of the details you can get from just kind of about Lockheed Martin, for example. And you see it's Perm ID's there. That's the kind of internal Perm ID itself. And this is something you can sign up to at the moment on Perm ID.org. And there's kind of other tools that Thomson-Whorter's have around this. So there's something called Calais, which I think if you've been in the semantic world kind of world, you'll know about that. It's also known as Thomson-Whorter's intelligent tagging as well. But basically, if you put free text in, it then basically links to anything that it thinks is an entity, and also gives you the Perm ID. So if you have a piece of prose, like a news article or something, that immediately links you straight into the knowledge graph with exactly the same ID. And then that means that once you've got that ID, you can go straight into the graph. This is an example visualization tool. And then see, okay, now I've got that news article and I've found the entity I want. Then I can go and find deals, I can find other companies, I can find patents, and I can walk through the graph, find out what I want from whichever source material that I actually need to use. Okay, so thanks. If you'd like more information about this, there is a white paper published by Thomson-Whorter's and the ODI. And that was, it's available at these two URIs. And that's my email address. You can email me there if you want any more information. Thank you. APPLAUSE So we have time for one super quick question. Okay. Oh, it's a dead heat. I'm going to let Mike take the... Yeah, one question, just two parts if I could. Firstly, you said sign up to the site, and I just clarified the extent to which it's open or signed in. But my main one was just in terms of how this sits alongside other public IDs like the open corporates ID like companies house. Do they link to one another? Is there a C also facility whereby people may register? How does it sit in the open data community against people who potentially are commercial rivals? Well, I think the benefit of kind of semantic web and link data is that anybody can publish links between things as they want, right? So by providing this, we're opening this up. You can download the whole data if you like, things like that. It is genuinely open. And so, yeah, we're not doing that linking at the moment. We may in the future, I can't say. Okay, thanks very much, Dan. APPLAUSE