 So first what is Nexen? That's the marketing part then the challenge that I'm working on. Smart data versus dumb data. You can read it. Okay, Nexen. The bank basically wants to transform itself into what they call now a digital enterprise, a technology company. The world is digitizing at a very rapid pace. The bank becomes basically one giant computer, with some bankers as well. But mostly IT people. So the focus, the focus goes to IT. And all big, bigger banks say that now. I heard Goldman Sachs is saying it, ING in Holland is saying it, Deutsche Bank all say we want to become a technology company. So that's a big change that's going on. And that has to do with disruption. The reason why I started working for this bank is because I had heard a speech of the CIO, Suresh Kumar, and he was saying basically he wants to become the Amazon of finance because he wants to say before we are being disrupted by small startups, we better do it ourselves. You see that in all industries like retail, in retail Amazon and eBay have changed the world dramatically. You see it in travel, in communications, you see it everywhere. And now also in finance. Finance is one of the last markets that is about to be disrupted. And you see blockchain and also knowledge graph. Okay, let's skip this one. That's all marketing. So you see that we currently have 185,000 online users, many different roles in industries. This is the current situation where we have 15 client platforms, many systems. We have basically thousands of systems, half a million tables. It's one giant. I won't say mass, sorry, that's the wrong word, but it's siloed. There are many silos, hundreds. The next digital ecosystem that we are trying to create is basically trying to unify in all kinds of different levels all those silos. Basically, we want our customers to have one interface, just like Amazon AWS is creating all these interfaces, but they're one consistent set of interfaces. Rather than connecting to all kinds of different systems with different technologies, we would like to have one core technology, which we call the Nexon APIs. It's all REST based. So basically all bigger systems in the bank, all the backover systems are now translating their interfaces into what we call the Nexon APIs. But then if you would just do that, then you would still have all those different silos and all those different data models. So if you would like to know the position of the bank against IBM as a customer and a partner, there are so many different relationships to a company like that, then you probably would still have to call dozens of APIs and get with different primary keys and the results would be returned in all kinds of different data sets and data models. You would still have to do all the hard work to make it one semantically consistent, give one consistent result. So we cannot just unify the APIs. We also have to unify the data and that's where my team comes in. This is the Nexon data team, where we try to do things. It's more research so far. We try to do things in a fundamentally different way using semantic technology because basically it started with my boss, Raj Patel, who is basically the chief data officer of the bank. He used to work for UBS as the chief data officer and then for Google on the Google Knowledge Graph where I like myself working for years on semantic technology. And he got the opportunity to convince the CIO to go for semantic technology as a potential, I'm not saying the way, but a potential solution for a whole number of problems. So let's talk about those problems. The challenge, our current data landscape consists of like I said, half a million tables. That's just Oracle. That's not counting all the other technologies. Thousands of systems, more than a hundred different technologies. Then we have to deal with all kinds of regulatory requirements. Maybe our melon is a bank too big to fail, so we have to create a living will and all the regulatory requirements for that as well. We have, of course, BCBS 239. That's I think the most used term today. And many other regulatory requirements all across the globe. But the funny thing is that those regulatory requirements actually make sense. They are just common sense questions. Like if you provide me as a regulator, provide me some data, then can you please explain where the data comes from? Who says so? Does it come from your spreadsheet or does it come from systems? How many systems were involved? Tell me exactly where it came from. How the lineage works. Explain to me every step along the way all the way to the authoritative source. Explain to me in a machine readable way that which versions of the programs that have been used to calculate or aggregate that data. So we need to do a lot more work to make that happen. To make the, to have to do the fundamentally right thing. So, and then, yeah, another part of the challenge is the technology disruption where we have to deal with, of course, the big data. It seems like it's almost exponentially growing. So it's massive. They have all this data with how to make sense of it, of this data. How to connect the dots. Then you have blockchain. And I think actually we've been discussing this morning already, but I think combination blockchain and Knowledge Graph is a killer combination. And I will try to explain why. Then you have machine learning and all these new paradigms basically where, yeah, we need to be, you need to be able to use that in a proper way. For example, machine learning. We are now a current team of data scientists that do the machine learning, but 90% of their time is spent on harvesting the data across the organization and to make, to make sense of it. They shouldn't do that. They should work on machine learning, not on harvesting data. So one of the use cases for the Knowledge Graph is to give the machine learning people the data that they need to do their work. And then there's another massive change, of course, this in the banks usually work with files. And those files are generated in the night, at nighttime, into the day. We need to move away from this file oriented world to a real time world, where an event driven world. And that's a massive change will take maybe decades to go there, but it will happen. Then we have all those silos, all those different technologies, different companies that have been acquired with their own systems, different jurisdictions, different, sort of different ways of defining a silo. And you can see a silo as a line of business or department. You can see it as from a legal perspective, the jurisdictions that you have to deal with. But you can also look at silos in terms of technologies. The Java world is completely different than the cobalt mainframe world. They have different ways, different things to do. Then we have the old school system integration, and where we are copying data all over the place. So if you look at all those data sets and all those database that we deal with, then I think a very conservative estimate would be that 80% of that data is not authoritative, is a copy of some other data. So data gets copy around all the time and can get still, etc. So if you make decisions based on still data, or you don't know exactly what version or do you have the right version of the data? Is that then the right decision? You don't know. And at least you cannot prove it. So that's my main point of the slide. I would call all that data dumb data. And what is dumb data? Data that is not smart. So data that has no machine readable versions definition of the meaning. That is dumb data. Data that has no universally unique, resolvable HTTP URL identifier, and not only an identifier, but also a locator. Talk about that later. Data that has no standards based in their formatting, etc. Data that has no concept of biotemporality. So most databases you look at are current state databases. They assume the current state. But what if you want to see what happened over time, etc. You can and then people, of course, they can you can do an article in the national database, you can create your own tables and make make your own solutions for supporting biotemporality. But it's not, it's not built into the standard models. It's not that takes a lot of work to have a complete biotemporal picture of what happened in your bank. And last but not least, all of the data that we have, it's not properly normalized. So we're copying it all the time. So we, the one point of the knowledge graph is to basically to do to to have extreme normalization. So what do we do? We call this triplification. So we need to get the data into smart data. We created make triples. So we call the triplication in our team. So let's say you have a metrics, a CSV file or a table or whatever, then you have a row, and a column and a data point. So how do we translate that? I can't barely read my own slides here. So first, we have to identify the columns, the columns have to be identified as attributes and those attributes can have to be mapped to a model, an ontology. So on that axis, we are doing attribute resolution. That's actually where the most work goes to see the meaning across the different data sets. Very, it's not just saying this column with zip code is the same as the zip code there. That's the simple one. But we have, you have all kinds of complex constructs in one database that are basically the same in another database. For example, we're now mapping JIRA time registration with our HP system called PPM that is also used for time registration. So we're getting the data from those two systems. They both do time registration, but they have completely different ways of storing it. In PPM, you store it per day in one record for one week. And then in JIRA, you store it for all the issues that you have in JIRA. So it means the same thing, but the underlying world is completely different. So how do you map that? That's the attribute resolution. Now, we also have the identity resolution, where one person, for example, an employee can can be can be stored in in many different systems, dozens of systems, usually. So we have to identify that that's usually pretty easy because we have an internal employee ID, but still then even then things can go can go wrong. So how do you do the identity resolution? That's a whole different field in itself. And what we are doing is to create a URL for every so instead, basically give a new primary key to every object in our universe. And then you when you have done that, then you have the values. There's also a value resolution there because think about dates or numbers. If you say, this is the price and that you give me the number 100. What does it mean? Is it $100 or 100? What is that? So you have to resolve that as well. And usually many values are actually pointers to other objects to reference data, which is the enrichment part, where we have enrichment stages in our in our data flow pipelines. So eventually, this is very this is a simplification, of course, but eventually, you get the predicate object subject, the triples out of this. And then you can, I think, call it smart data, because I if you give me a triple instead of just a cell in a CSV file, then in a triple you can, if you say this is the closing price, said, okay, give me the URL to the ontology and I have a machine readable, my program can read and interpret the dead ontology and knows exactly what you're, what you're dealing with. And then we can compare apples to apples instead of apples to oranges. So if you look into the triple. So the subject is, is this, that's the identity, we call that the nexon IRA, because it's not just an identity, it's also a locator. It's, that's two functions for this primary key, which is different from the normal primary keys. A normal primary key is just an opaque code, usually, or worse, that it has meaning. But we try to create meaningless URLs by just using 128 bit UU IDs. And those, but those URLs are have this domain nexon built data in the in the URL. So that, that means that if a system would like to see give me all the data, give me the 360 degree picture around this identity, then, then we can provide that. But also if a user clicks on that link, then we would like to show an engaging webpage, we call that the semantic landing page or slap. And that's our internal systems. And I'm glad that NASA just showed you an application of how that would look like, because I cannot show my own application due to legal problems. But actually, the demo that he just gave, I worked with some of the text in my former job at Bloomberg, where I also did the Bloomberg Knowledge Graph project. And we started working this out this, this semantic landing page concept there. And that basically means you have, if you have billions, and eventually, maybe even trillions of objects in your, in your Knowledge Graph, you cannot generate or you cannot hand code web pages for each of those of those objects. So you have to generate them and you have to do that in a in a smart way. So, but luckily enough, we have now the ontologies, etc. All the knowledge about what the data means is in the same database is in the same Knowledge Graph. So you can just generate quite an interesting webpage. And people can start using it potentially become a whole new way of dealing with it. It's rather than going from, and let's say an action object interface, like, for example, the Bloomberg terminal, you have to type in the function code that you want to do. And then, and then you find the object that you work with most of the time. This will be an action object interface where you go to the object, your search or you navigate to the objects that you want to work with. And there you find all the actions that are applicable to you in your context, with your entire entitlements and the contracts, and all of that that are relevant to you. And then you can see all the actions that you can perform on on that object. Then the predicate side of things, that's where fiber comes in. The predicate that defines the meaning. What does this triple mean? And then you see a similar IRI, that's an ontology axiom IRI. And it points to one particular axiom in an ontology. There's one error in, I didn't fix that, the hash at the end should be a slash. We had a discussion in this week about this, but this should be a slash at the end. But this is probably going to be the URIs that we are going to work with, where you can specify which maturity level of fiber that you want to use, which particular version, and that's the color. You see the pink, that's the, let's say, the bleeding edge version of fiber. And then you can say, give me the latest version or give me version 3.2 or whatever. And if you go for production purposes or regulatory purposes, then you want to use the green version, which is the omg-approved ratified version. And in one Knowledge Graph, we can never go down, so we have 247 operations, etc. So one line of business might use FIBO version X and another one FIBO version Y. So we need to be able to support multiple versions in the same Knowledge Graph at the same time. And that should even work well with reasoning. So these versions are critical for a Knowledge Graph operation. Okay, more about it later, perhaps. Okay, so then the object part, which can be either another Nexon IRI or a literal value. And then we have the context or the main graph, which we use for us, the name graphs are used to represent the data sets. So we process data from multiple sources, clean it, create, map it to higher level ontologies from the source ontologies in the database to the ontologies like FIBO. And FIBO is not just the only ontology. And I would like to make that point here that FIBO is not supposed to be, I think, one canonical, a new canonical model for the whole bank, replacing everything else. FIBO is just one ontology that you can use next to many others in the same database. We can have multiple different versions of FIBO, but you can also have other overlapping ontologies that it can all work together in one system. I think that actually comes to the core of semantic technology. Semantic technology is based on the open world assumption, as opposed to the closed world assumption. So this dumb data that we currently work with is all based on the closed world assumption where you have to agree, parties have to agree about the meaning of things. In the open world assumption, and that's the core of the semantic technology world, you don't have to agree. You can agree to disagree. You can store your object with your ontology, with your version of FIBO. As long as you use the same identifier, I can read it, I can map it to my ontology. And we can work with the same object. We don't have to copy anything. We can work with the same object. You can use your version of your schema or ontology, and I do use mine. So that is an extreme change in the way of thinking as compared to other technologies. And I think actually this is maybe the most fundamental driver of, because so far I don't know how many people in the room have seen a bank work with one canonical model across the firm. I mean, can someone please raise their hand? It doesn't work. I've never heard of it. I've worked in many banks and I've always seen people saying, okay, we're having the canonical model, but the canonical model of today is something else than the canonical model of tomorrow. So you already have multiple versions to deal with, let alone that you have all those different lines of business, they will never agree on using one canonical model. So the whole concept of a canonical model in my eyes is actually, it's just a dream. I think a knowledge graph comes much closer to that dream than the current way of doing things. Okay, what's the next knowledge graph? This is kind of an example of what kind of data would be, would we put in there? I would say all data, but that's a little bit far down the road, maybe, but let's start with the core objects of what makes a bank a bank. So you have the customers and the contracts, especially the contracts. You have all the products, etc. Let's get them from all the authoritative sources. Actually, let's try to make a model of who are the authoritative sources. That's another question that many people, it's in the heads of people, but there's no system out there that says exactly this person is for this system is owned by this person and that person owns this particular attribute and that is the authoritative source for this particular thing. I mean, many people work on it, but I've not heard of a large corporation like B&Y Mellon, where you actually have that full picture of what data is out there. It's mainly in the head of people. So that's one, one use case is to create those models and find the find, basically create a picture of our data landscape. Where are all those data sets? How do we get the data from those data sets? How do we map it and do that in an agile way? You can never do it in one go. Don't do it waterfall, etc. Just get the data in there and then massage it and make it better and better, change the ontologies and stitch it together because sometimes you don't even see that there's a, that you can connect adults across two different data sets until you actually worked on it. And one person looks into both data models as, Hey, this is actually the same as that. Why do we call it differently? Let's give it one name. And let's make it, let's make it one object. So why the next knowledge graph? Let me zoom in because I can't read it. Sorry. Okay. Okay. So what, yeah, one reason is to, to go to a higher level of data management maturity, by creating this knowledge graph. I mean, this sounds very presumptuous. And so we first have to be successful with this project. But the goal is to go to a high level of data management maturity, where we, where we can basically prove to the regulators in a machine readable way, whatever they want to know, that's how data is, has this right, the right quality comes from the right sources and all that. Yeah, another reason, big reason is of course, to create a holistic view across the whole financial industry. And our, the, the goal is to create the largest knowledge graph of the whole financial industry, not just for the environment, but for what we call the next and universe, where you have all kinds of parties like other banks and financial service boutiques and all kinds of other parties that would consume this data. And that they, their data can also be added as well. Let's skip this. So we're talking about enterprise data unification, but industry data unification. And that's why the FIBO is a financial, it is the financial industry, business ontology. We use open standards. And I would like to add one that I didn't see yet in the most slides. And so we have RDF, Allen Sparkle, of course, those are the three pillars under, under the semantic technology world. But there's a fourth one that's also very important. I think that is the link data standard. Sure, Tim Berners-Lee came up with that himself. And I translated it as these next and IRISE I discussed earlier. So the IRISE, the URI, the URL, whatever you name it, is not just an identifier, it's also a locator. That's a core principle of link data. You have to be able to click on it. And that link has to be available as to be served for potentially decades. Because especially in the combination with blockchain, you would put that link in the blockchain. You cannot store all your data in the blockchain, you can store a contract or a transaction in the blockchain. But if you, but that contract and transaction has to link to other data. And that other data needs to be immutable by temporal data. Because someone, let's say you make a transaction about a bond and it expires after five years, then you get money. All those links in the contract still have to work and should point to the same data that you had when you made that deal. So that's that's one reason why I think the combination of link data and the knowledge graph and blockchain is very good. Furthermore, the data that we would provide through next data has to be resilient and secure. It's an important slogan in our company at the moment. So it has to be elastic and on-demand and reactive. We use a reactive microservice architecture. Reactive is a whole different way of programming. It's also a complete new paradigm change of how you build systems. Look up look up on the internet reactive manifesto and you'll see what I mean. And then, of course, it has to be regulatory compliant. And I translate that does make the common sense and do the right thing to make sure that you know what your data means. And that it has the right quality. Okay, very quickly. Next in data itself is the underlying platform that powers the knowledge graph. It's let's skip this one. So we are using this reactive microservice architecture Scala Acca. This is the same technology that underpins technologies like spark and Kafka and storm. It's all using they're all using Scala and Acca. We want to have low latency high throughput. We're thinking about creating the real time knowledge graph where you have everything is in memory. Okay, you can download this presentation on the website. We built any questions? Yeah, I guess. No, one. Yes. Yeah, I know. Yes. The octopus will be here for the next two weeks. So you can ask him as many questions as you want. Let's give him a hand.