 I think everybody's heard of Deutsche Bank, if not in positive and negative press, perhaps, but let's move past that. I've been there for about six and a half years. In January of 2015, we started the Chief Data Office, which collected some disparate groups, enterprise data architecture, the data lab, the data science team, into a team under JP Rangaswamy, whom we stole from Salesforce, shamelessly stole from Salesforce. He has a TED talk, by the way, if you're interested, it's about how information is food and how that works in the ecosystem. It is interesting. I know it sounds weird, but go with me on this. As David mentioned, when JP Rangaswamy joined us, his mission was to create a semantic bank, and that builds off of something that someone mentioned earlier, which is Tim Berners-Lee's concept of a semantic web, where data is in text documents or in web pages. Because it's not in searchable containers most of the time, it's hard to get at. It might be human-readable only, where it's not machine-readables. We're trying to get to a scenario where any content on the web or any content in the ecosystem is both human-readable and machine-readable. Again, this is just a little bit of us. I have the same problem that any bank has. I'm sure you all are aware of this. I could bore you with why, how we got here and all that kind of stuff, but I'll skip over that. We have the same sort of fragmented architecture that a lot of different banks have, because we've thrown money at different things, and some things have succeeded, and some things have failed, and we weren't very well integrated, blah, blah, blah. You've all heard that story before, so we want to move to a better, more integrated infrastructure, and that's doing that means we have to integrate our data or make it a common language or access it in a common manner. This idea of semantic layers, semantic web, semantic bank, we heard the phrase digital enterprise earlier. I think one of Deutsche Bank's goals this year is to create a digital bank. I don't know what that means, but there were three things we wanted to try to achieve as part of the data architecture team, as part of the chief data office, to sort of get us started on this path towards semantic bank. We do a lot of reconciliations, as you can imagine, way too many, so we wanted to get rid of that, because the new thinking or the new idea is a hardened, transparent, immutable ledger EG blockchain. So how would we do that? We had to have consistent transactions, consistent language, and we don't have that now. The other problem that branches out from the reconciliation is this need to map things, and I want to use my pet phrase, and I don't want to use yours, and it's all very difficult, but if we have consistent language as a baseline, then we don't need to do any of that mapping, that translation, that time wasted mapping things, translating things, misunderstanding things, misinterpreting things, it's gone. And then, as I've heard people say during the past few days, getting anybody to change anything in a data architecture or a database is a very long and bureaucratic process, so the newer things like Salesforce, for example, you build an app and they put it out there on their app store for you, you can deploy that really quickly. We want to do that inside our infrastructure. So how do we do that? Well, we have to make data something that doesn't take a year to change. You can't have a hard, intractable database. You need to have some flow, and you need to allow for different versions of things. You need to allow for things that might go back and change in time or time series or all those types of things, but to do that, we needed a much more flexible data infrastructure. All right, so that's all very interesting, but what did we do? So we were a group of data architects, as David mentioned, and we were trying to figure out how to get started. What could we do? We could try and we thought, well, we could try and do the usual canonical model story, but everybody in this room knows that doesn't work, and we didn't have the emotional stamina to do that, so we decided not to do that. But if we were trying to standardize against a language, we had some of the groups that were trying to standardize against FPML or some of the other some of those that are not ISDA, but some of the other languages for transactions, which is all there again, all very interesting, but it only covers a small subset of the content we need to cover. So not also not a choice. So a few of us were talking and we had a our market data architect tell us that he had problem that he thought we should all focus on, and that was the thing that we decided to do. So what is that? Hold your breath. What we thought we were going to do was he had a problem where we're spending tens of millions of euros, dollars, various currencies on market data. We're one of Bloomberg's favorite people, I'm sure, as well as Reuters and everybody else. So part of the problem that he had was people had coded things against that proprietary language and market data providers being with there. They got smart to that and they wanted to charge us more money because we were starting to begin they had to preserve versions for us. You know, you've heard this story. So we needed to move to something that was technology agnostic, something that that we could use to decouple the provider language and our own internal structures. So we decided five was going to be good for that because it's technology agnostic, system agnostic, it's industry standard-ish because it needs to have more coverage. Dennis isn't listening, so I'll keep talking. Oops, now he's listening. I know, I know, that's why I said it. But we could use it and it was because it was an ontology. It gave us that jump from third normal form ETL 1990s style data warehouse blah to something new, link data, semantics, triple stores, RDF, all the nice buzzwords that everybody's heard all week. And then we wanted to use it as a common language to search through a portal. So what our market data architect had done was catalog all his different sources and he did a portal over the top of that. But you had to choose the source you wanted to search in before you could search it. So if you just typed in American Airlines or whatever stock you or the ticker, AA or whatever, it really wouldn't get you anything because it would ask you which source and that means you have to know what's in each and there are hundreds of these sources. So that really, we needed to abstract that a little bit further. So as you can imagine, we thought, well, we'll get a business friendly language out of this, something we can search with. We'll determine what's duplicated or in some cases 10 times, 20 times, whatever. And we can track who's accessing what by the portal because if you type in guilt bonds, Europe will know who you were and what you accessed, which sounds a bit Orwellian but it needs to be done, especially if you're paying license fees to Bloomberg or something like that. So this is just a scary picture that is a subset of our market data landscape. You can't see that from the audience. Don't strain your eyes. But the idea would be to not have these loops going around and having people to know what's in each source in order to get any value out of it. And also eliminate the problem where a certain desk is using a certain set of market data and another desk is using another set of market data and then the risk and finance guys can't reconcile that because they're pricing it differently. You've all heard that story. So we needed to get past that. Well, I have some animation here. I didn't realize that. Okay. Fun. Okay. So what we decided to do was test out this idea of an ontology architecture, ontology driven language, things that were more than classifications, taxonomies, data quality rules. We didn't want to have just a canonical model. We wanted to have something that would describe the content that we were trying to work with. And this just shows, my slides are on the portal by the way. So this is just describing the steps that we took to try and test that out. So the first one was, we all had to figure out what a 50 ontology was because this was a couple years ago and we didn't know. We had to refine and choose a scope that fit. Then we mapped and resolved the gaps that we couldn't find in FIBO. And then we created URIs for that data. Eucobius referred to that earlier. And then we had to, what did he say, triple eyes, triplicate, triple eyes? Create triple stores out of that information and then put that behind the market data portal. So what is an ontology exactly? I think everyone in this room, you probably know what that is by now. But being data architects coming all from database backgrounds, we didn't understand what that was. So we did do some education. We met with Mike Bennett and the guys who, and Jürgen Seimer, who's another one from the team, to just tell us what that is and why we should care. And it sounded like, okay, again, this takes us away from traditional relational structures to something much more flexible and something we can expand and scale really rapidly. Why FIBO? Well, because it was a technology agnostic ontology for financial industry, and that's what we were. And we thought it was something that we all thought, let's start here. So our next question was, all right, well, now that we decided on FIBO, where is it? Where can we get it? How can we use it? What do we do? And that required, again, a little research and searching and finding where it was. So we wanted to start with the version that's, I forget the colors, sorry, I know the team knows what color is there. But we downloaded the RDF files from the OMG for the sections that we wanted. And then, so then great. So we had all that information. But unfortunately, the tooling that we were using was Power Designer, which doesn't read RDF. So we had to do something else. So we had to find something else. So we ended up having to convert to reverse engineer and mess around with it until we could get it into Power Designer. We wanted to put it into Power Designer because that's where we had some of other sources modeled. In retrospect, I probably wouldn't do that again. I think we just forward engineered on to graph data as a triple store or something and not really worry about that. Because we didn't know how it was going to look or how the data was going to map, we pulled it onto a modeling tool and then looked at the gaps in between. And that's why. We also, at Deutsche Bank, didn't have any ontology tools. I don't think our enterprise tools team knew what an ontology was, but fair enough. Right? So and then our market database, we couldn't access that from Power Designer or from the RDF tool because that was in KDB. So it's a columnar tick database that just ticks over. So that was a problem too. So a few ways we got around that. Some reversing forward engineering, some learning, and then we got the market data ontology up in Prodigé. I think people have said this during the week, I am not here to talk about any tool and where the other. Because we were doing this as a side project with no money, we couldn't really spend any, we couldn't buy anything, we couldn't dip into top braiders, you know, some of that stuff at the beginning. So because the team, the EDMC guys used Prodigé, we just thought we'd start there. And actually one of my guys who works in Cary went to a class that the Stanford team gives on how to use Prodigé and how to develop ontology. If you have the resources or the time, I recommend that because he thought it was really good and he's now our Prodigé expert. So we've got it up, we've got it running, we know what it is, now what? Well now we need to know how does the ontology match the data that's the market data that we have in our market data environment. And we made some assumptions that we knew we'd either prove or disprove. One of them is we, it's probably not going to have everything, that's okay. It's probably going to have some different names of things because as I said before, people like to have their pet names. So the names and meanings might not align. And if we, the content itself will drive the mapping gap. And what I mean by that is we didn't want to spend a lot of time, you know, scratching our heads and trying to fill in blanks for things that we might not ever use. So we decided to look at that reverse engineered dataset. I think we had it in Oracle at the time. And see what content it was that we would push into the ontology rather than spending time filling the ontology. Because that could have been a rabbit hole from which you would never have recovered. So we decided to do that. And then this is just some screenshots of what it looks like in, I think this is protege. Or is this top rate? This top rate, this top rate. But Shannon, you just said you didn't have top rate. Ah, I did not. One of our consultants did. You know, fully licensed. But we did eventually get a trial version and use that. But we pulled the data in and then mapped it to the ontology. We filled in the blanks. And now we had what our attributes meant in that other language, which is what we were going for. And we did it for a bond feed that looks like global, something EVB, but I don't know what Merck means. But this is an Oracle table with D2RQ to try and connect things together. All right. So once we did that, now we're back in protege again. And we just filled in some of the blanks. So in protege, you have to be very careful about where you add things because you don't want to put the wrong thing under the wrong heading. We would have a tendency to whiteboard it out at first because we were a bit nervous. And we also weren't experts. So we thought, well, let's test it and see how it goes. We also would run some of the tests in our Oracle instance to see, does this really work? Is it really walk like a duck and quack like a duck and all that stuff? So that worked out pretty well. And then we did have some blanks to fill in, not an inordinate amount, but some number of blanks. And then this is just when you look at the slides, you'll just see the subset of stuff that we had going on in protege. Okay. So now, so what? Okay. So I've got a set of data mapped to a common language with rules and other ontology things, taxonomy, classification, et cetera, and a common language. Okay. Well, now I need to have a unique instance for each one of those things. Why? Because as Yacobius explained earlier, thank you very much that we needed uniform resources identifiers or unique resource identifiers so that we could resolve to each one of those pieces of data. If you were looking at it on a graph or in a link database, you would want to go right to where that thing is. And so we created, we needed to create unique identifiers for all of those things, because we wanted to have this concept that everything is a resource. Therefore, everything needs an identifier. Each one's resolvable, each one's, and then creating the URIs is a callable service. So you could use it from anywhere, anywhere data is coming in, you should be able to run the service and attach a URI to it based on an algorithm or based on a template, et cetera, et cetera. So how did we do this? For some of the stuff, we ran Sparkle queries. So we wrote the way that we wanted to do it, ran it, and dumped it into the triple store. I think, yeah, that's these, or you can use a concept in a top rate called spin map, which is map things and then create the template for the URI and then it will assign the content for you. Just depends on how you want to do it. I had two sets of people, one set of people that were very comfortable with doing it in Sparkle, which is fine. And then I had another set of people who were like, no, no, no, we have to use the tool. It didn't matter because the end content was all the same. And at that point, I really didn't know which one was going to be better. So I let them both do it. I think in the end, because we didn't buy top rate, we do it with Sparkle queries, but that's just current. Right. So now that we've mapped, we've run our URIs, so everything has a unique instance. Now we need to sort of pivot that and create triple stores out of that. Why? Why would we want to do that is what I asked. That's what I asked the team, because it sounds very counterintuitive. If you have everything from market data, especially, if you have everything in data sets and you can see it in a portal, why would you want to recreate it again? And the answer is you can't really get the commonality if you don't have it all in the same format or all in the same language. And there wasn't really a way for us to do that over the hundreds of little data sources that we had. So we just created a mechanism to, every time something landed, it would run through some brief routines and then end up in the, which essentially becomes a data lake, if I'm honest, in the right set. And I was uncomfortable with the data duplication, but because we weren't, because this is trial and we weren't really sophisticated about it, I'm sure there's a better way to do it now, but at the time that was the way that we chose. I've seen a lot of presentations about how you could do that differently or how better, fine, but this is just the way we did it for this exercise. Okay, and then here's another unreadable slide with some, yeah. It's just artwork really from the back. I mean, you can't, I could have anything up there. I could have terms and conditions up there. No one reads those either. But anyway, but this is just when you do down this, it will show you the way that it runs and how it works. And this is, again, I think in top grade, yeah. It's because the guys that did the loading and the manipulating were the ones using top grade, the guys that did the formatting and the extracting were the ones using Sparkle. Just, and then here's just to screen an example of how we mapped to publish price, which published price is a thing from Bible. So we wanted to have that, that was, and that was our tick database. That was all price data coming in and we mapped it and then ran our queries and triplified it and put it into our triple store. I'm not going to bore you with this. All right. So fantastic. Right. Now we've got all of our market data or our market data for a certain subset in a triple store behind a portal. And actually the portal was written in another open source tool called open mama. Never heard of it. But that was what the market guys were using and the idea was to track what sources were the most, the best matches, the most commonly accessed things and what was the things that people downloaded the most often because then we want to get rid of those other things. That was one angle. The other angle was why do we have 10 versions of this thing? Because we can see that if we run the maps and all the queries and it's exactly the same set. The only difference is the date on which it was sent or there's 10 versions of the same thing on the same day or whatever. We want to start turning those off because those are all, those are getting more and more expensive. We're data hoarders. I'm sure a lot of you were data hoarders. You've seen that show where people get buried by stuff in their garage. Yeah. We're very much data hoarders. We have, we need, the first step is admitting you have a problem. All right. So anyway, so then, right, just trying to think a bit. So what did, so the market data architect started to use this information to turn things off and his, his, he was able to limit some really obvious things that we couldn't even map because they didn't have any content in any language whatsoever. We don't even know why they were there. And then other things that he had, he had to show his business users hard. I said, embarrass them with facts, right? You guys are paying for this. No one ever uses it. We know because we're controlling it under the portal, right? And that, once that gains momentum, right, it's, it's hard to stop it. And then, yeah, but you have to keep them from being too ruthless because then they turn too many things up. But so that, that was the end results. And it's still live today. We're, he's still making changes to it. He's going to do some other reference data. But for the market data pool, we have, we have that up and running. So what did we achieve? Well, we created a, a semantic layer for market data that we can abstract our language, which is mostly FIBO in this case, plus some, plus some additions. We've abstracted that away from proprietary language. So now we can start turning things off and we can have our internal stuff code against that technology, vendor agnostic language, using URIs, using services, using any of those APIs that we can code around using that same language. So that insulates us from the change against coding directly against the vendor sources. We designed a portal, right, to use that common language on the portal. Actually, it has things like the last term search, we put the links to our glossary, which has FIBO in it. We don't have all the FIBO V in there yet, but we're getting there. I think we're just going to supersede when that's ready. But we have links so people can find out what does this mean. We move market redundant, redundant market feeds. And then we automated this part where things come in there, they get URIs and then they're triplified and dumped into the data lake. So we don't have people doing that. To be honest, we never had people doing that. We did never hire a group of people to do all this by hand and then get rid of them because that wasn't the case. It was always something small and it didn't take us long. I think the thing that took us the longest was fiddling around with the tools to get them to read stuff because we didn't have any ontology tools or RDF tools or any of that. And then just going through the agreement of definition and terminology took the longer. But once we had that done, it's bang, bang, bang. You just run your queries, you set up your stuff, you put it in there. You can rerun it as many times as you want. That's fine. The hard part is the tedious messing around with tools and getting people to agree to terminology. That's the hard part at the beginning. So what did we learn? Well, we realized that we had to change our thinking from storage in containers, so third normal form, tables, files. We needed to move away from that into what does this mean and where is it linked? This was our first step towards linked data because we had looked at the technologies. We didn't quite know what to do with them. And once we saw how we could link things together using this common language and the triple store RDFL stuff, we thought, ah, now we end up, so we've done a lot more with other sets since this. But this was sort of our first, we need to get away from decanting data into containers to just linking it together in an open manner. We need to get our tools sorted out because you can't really do this if none of your tools read RDF or ontology development. Our interoperability constraints, the first database we chose was one that none of our tools read, so that probably wasn't a brilliant idea. But with a little reverse engineering and moving around, we got past that one. If you don't have a URI, don't bother because you definitely need that unique instance to track everything. And everything is a resource and everything changes over time, so you need to have that abstraction between the unique instance of something, whatever that thing is, and how you're referencing it. Because all the other fields you're using to link things together with, so you need to have that URI. And best of you can make it a callable service in one place because it's just a template, it's just an algorithm that everybody else can call and then you're doing it in a similar manner and all your tools should be able to handle it. And I didn't like that we had to duplicate stuff, but because in an ideal world, we just have sparkle endpoints that access things and brought them all together, like the graph that Eucobius showed. But we just didn't have time to get there. And it's too hard for us to do that because there's a lot of concerns about security and our bank is very afraid of things like cloud and I don't know, and they don't like to share data because they're afraid of exposing some restriction or whatever it is. So yeah, they get a little twitchy about that. So I just have a few conclusions, you can read each or leisure because I'm sort of out of time. Dennis, can I go to questions? Okay, questions? This gentleman here. So the question was, where do we go from here? And saving money, what is the net benefit? So we've already used it as a node for something we call data observatory, which is different types of these implementations that use linked data. It's similar to the knowledge graph I think that you showed. It's very much the same concept as the knowledge graph where you've got different sets of data that all use a common language and are linked together. So we've done that for market data, we've done that for the risk data. So we have our team doing BCBS 239 who use our internal glossary and have created some ontology for that. What we need to talk to is the gentleman that was on earlier about that. But and then saving money, we've turned off market data feeds. We haven't, we've got rid of consultants, which is always nice. And our next, the next branch is we're going to go into our data source catalog or our system of record catalog, because we can associate that to applications. We have a functional taxonomy and a data taxonomy which then linked to the, because all, because we mapped FIBO to our data taxonomy, which is already mapped. So we can, it creates this big, you know, web of, web of information that we can use. So we're, we're not there yet. I think the challenge that we have, the fear of sharing, as I mentioned, because we're very fragmented, it's hard to get things mapped to the right thing in order to have some common language. Fortunately, there was a big, push a couple of years ago to map everything to our functional taxonomy. So anytime we have some of that, our application catalog actually is mapped to the functional taxonomy. So we can use that. We did a pilot for our location strategy team, which asked a question, what people are doing what jobs in what locations and what applications they dependent on. And that, that goes up to that knowledge graph, right? Where you've got those things together and you can tell that you've got 100 people in Sydney, Australia who are dependent on a very critical application that's hosted in Indonesia for some reason, you know, it's like what? And, you know, you don't want to have too many of those things popping up. So let me ask, answer the Bloomberg question first, because that's a quick answer. I don't know if everybody in the room knows Bloomberg has their own ontology, right? And we didn't, we didn't do that. I think that's because our market data architect wanted to try this to see if he could, because it wasn't just Bloomberg, he wanted to, we had dozens of vendors and some of it was pretty old and he wanted to embarrass people with facts and get rid of stuff. So we didn't do that. It doesn't mean we couldn't do that because Bloomberg did come see me and bring me their ontology and it's very interesting, but I just, I couldn't do anything about it because we were trying to do something else. And then your other question was how many people? So people using the market data portal are a couple of hundred and they're all risk data analysts, the market data analysts that pull data down daily to do stuff with and to do analytics, that kind of thing. For the risk observatory, which is another one of these, which is the new thing, that's probably another couple of hundred people. And then we've got dozens using the location observatory, which is who does what on what place. And every time we add a node, we get dozens more until we had to move it to half-deer servers and whatnot because it was getting hit through different portals. There's no confidential information in there at this point. So it's really just what is the information about your organization and the content that you need to know about to do your job. And we want to be able to segregate parts of it, but we haven't, we haven't really tried that yet. We'll probably do a client's observatory or something like that, but that will require segregation entitlements. And we haven't really, entitlements is always a thorny problem that we just haven't got to yet. Okay. Let's give Shannon a round of applause. Thank you. If I, if I didn't get to your question, just try and catch me later. Happy to chat about it. It's one of my pet projects. And I thought it was, it was a learning experience, if nothing else. Thank you, Dennis.