 Let's give these guys a warm introduction and they're going to talk about how FIBO can be used to support BCBS239. So to conclude, and this is a takeaway, so if you want to leave now, I don't want you to leave, but if you want to leave, this is the one thing you need to remember. Banks can demonstrate BCBS239 compliant risk reporting without disruption by using FIBO and also semantic data virtualization technologies, which we'll be presenting. So that's a takeaway. Don't forget. So what I'm going to do is I'm going to pass it on to Mike because he's the guy who has the finance background and I'm the technology guy, so you won't believe all the finance stuff coming from me. So I'm going to pass it on to Mike. Thanks, Juan. And according to the thing there, we have half an hour. So thank you for your patience. We did get off to a slightly slower start. So let's go back in time to 2008. And in the previous talk, you heard about these things called counterparty exposures. So each bank represented by this little Doric kind of thing here has various positions, various swap trades, loans, repos, all sorts of things with various other banks. And one famous point in 2008, one of them goes, I'm not going to make the noise, pop. So that leaves everybody wondering what their position is. And you notice a couple of those banks are looking a little bit shaky as well. So it suddenly becomes kind of important to know what your position is with regard to all these other people. And here's the interesting thing. Nobody was short of data. And yet it took people weeks to find out what their exposure was. So not having enough data wasn't the problem. It's actually turning that data into something like knowledge. And that means having meaning, having some kind of common way of understanding all that data. So as you heard earlier as well, the risk data aggregation, BCBS 239 as it's called for those of us geeks that remember numbers of things, is the risk data aggregation reporting regime that's been introduced by Basel, the Basel Committee for Banking Supervision of course. And it requires banks, but it's an interesting thing actually. It's not like every other report where here's a list of things we'd like you to report on. Please send this report. It's kind of a meta regulation. It says, look, whatever report you're doing, and here maybe is one or two turns we'd like to see, but whatever report you're doing, we'd like a different one tomorrow. We'd like you to be able to produce different reports for the next crisis than the ones we asked for the last crisis. And we need you to be flexible and that means that we need you to show that you've got a solid data governance regime, solid IT architecture and proper business management of IT. You can't just have all this data sitting on these little silos and then run around like whatever when suddenly we asked for a report. So really this is a requirement for how you generate reports rather than a requirement for yet another data model or yet another report template or anything like that. And so BCBS 239 sets out 14 principles that essentially cover the basics of what kind of reports they want, how you govern it, how they ensure that you're actually looking at the right things for risk and so on. And as Elisa mentioned earlier, it identifies 30 banks that are known as globally systemically important banks, or to put it in plain English, banks that are too big to fail. And they were all due to have something by January of this year, which as you heard, they didn't. So they all came along with, you know, I've been working with a lot of these guys for the last year and a half, couple of years, and they knew they wouldn't quite have whatever it was because they were still trying to find out whatever it is. And meanwhile the domestic, the ones that are too big to fail from the point of view of one country rather than the whole global system, also need to do these kind of things in the very near future. And these are banks that are very interconnected, got a lot of capital exposures and so on. So the 14 principles, I'm not going to go through what they are, but it just gives you an idea because it's things like timeliness, adaptability, accuracy. It's all about being able to be very flexible about how you produce information in reports. And in terms of those overarching themes, it means having this kind of governance, having the right infrastructure, being able to aggregate different risk data at different levels of detail and so on, and be able to show oversight of all that stuff. So it's not a deliverable, it's an objective, it's a way of going about this stuff. And firms need to therefore revisit their technology infrastructure and ensure that they have the right infrastructure, the right processes, the right architecture so they can adequately respond. As one person put it, we don't want to solve the last crisis by sending the report that we sent in about the next crisis, by sending the report we sent in for the last crisis. So there's a couple of important concepts you have to have. You have to have the right capabilities. You have to demonstrate control, good old-fashioned quality assurance essentially, on how you're running your data and you have to have a controlled environment to do that. So what we're going to show is something that brings all that together because there's very much a business concern, having a controlled environment, having business oversight over information, and that separates out business from IT. But what we're going to, and we'll come back to this diagram a bit later, is how the ideal for meeting this is to have something like an ontology, some knowledge base where you can ask meaningful questions using semantic types of queries without having to create a whole extra lot of semantic data until you need to for particular reasoning applications, but instead be able to interrogate existing data sources. You heard about how many different silos there are. Some of those things might be master data sources or golden copy or whatever, but being able to create adapters that let you query that stuff from the business level view of the meanings of things. And so for that, and this is really the next point I want to make, when we start looking at the governance side of things, a lot of people are saying, oh yeah, we need a glossary, we need a dictionary, we need a thesaurus, whatever that is. What's interesting is when the business people are starting to say that this is what they're after, they're looking for some kind of dictionary or something, they're asking the right question. They now say we as the business are taking control of the meaning of stuff, that we're not saying it's an IT problem, we know that we need to understand what it is that's in our data before we figure out how to create the reports for it. But of course they wouldn't know what we know, which is what an ontology is, that you can actually represent meaning in a formal structured way. So when they're starting to look for glossaries and dictionaries and terminologies and stuff, you're talking to the right people and they're asking the right question, but the answer to that question is words are not the answer. Language is a slippery thing. People have tried for centuries to standardize on terms and say we must use, you know, there's a wonderful thing I saw for the 1920s the other day where people were discussing the meaning of firewall, which is ironic enough because in those days it meant just one thing. But even that nobody can agree on what one word means and words are not the answer. You've got to have some formal principled language that talks about the concepts and then you can add the words later and the rules and all that other stuff. And so in FIBO we had what I call the FIBO moment where a bunch of people who a lot of us have been involved with a lot of other standards efforts in messaging and so on, there are lots of people arguing about a term. Oh, could we call it a firewall or something else? What are our 30 critical data elements and what should we call them and what's the counterparty? And then Mike Atkins stood up and said, well, you know, we're not going to get any agreement about these words, but what if, you know, can we agree what the concepts are? Park the words off to one side. I forget those. Can we agree what are the concepts we're interested in and then come back to the words? And actually everybody could. Everybody in the room knew that they understood the concepts they were working with and if they're told you don't have to care what you call them, then that problem goes away. And that was the birth of the concept of FIBO and the need to build a business resource which is an ontology. We didn't use the word ontology at first and then eventually we said, right, it's an ontology. Business meanings that are formally framed but are business facing and that's what we built out. These are all the different parts of FIBO. You've seen parts of them in action. Just now you saw a lot of interesting terms from the derivative space that we don't see very often. All of those things are in there in a business subject matter expert review resource that we're now bringing forward into more semantic web-friendly formats. In terms of the instruments, we've got market data like pricing, yields, analytics, securities issuance processes, corporate actions, all sorts of stuff in there. So that's FIBO. Now, what do we do with that? Here's this business-facing resource but we can make it directly work with technology. For that I'm going to hand you back to Juan who will explain how. Thank you, Mike. So let's talk about the technology, what we're calling a semantic data virtualization. You have your underlying sources. These are your legacy relational sources and probably all the conversations we've had here today. We're talking about all my data is in Oracle and so forth. And you have internal data. You also have external data you want to connect to. And then you have FIBO. Or you basically have your target view of the world. And this is what you want is your homogeneous view of all your heterogeneous sources. And a long time ago I've ever seen something from Dave McComb. This is really your lingua franca. This is how I want my organization to talk. And it doesn't matter where my data is, where my sources is, how it's organized, how it's structured. I just want to be able to view it, have one single homogeneous view that we all agree on, and then we can have different mappings to different sources. And each organization can have their own view. And this is the FIBO basically kind of upper ontology, upper model. And then every organization can make it customize for their particular use cases. And I know that we use the word ontology here, but I mean in my experience I kind of stopped using it because say ontology and people think I'm talking about cancer and oncology. So now we're using the term enterprise knowledge graph. And graphs are now very sexy. So that's actually... So we call it an EKG. So the FIBO EKG. So it's the heartbeat of your organization. So what you want is that you have FIBO, right, as your ontology, as your EKG. And you want to be able to create mappings to different sources. And now I can have one target view where I can now do all my questions and reports and APIs and search and create dashboards and everything. So all my applications and people just have to now worry. Not worry anymore. They just have to talk that single language, which in this case is FIBO. And we're here in the finance and it's FIBO in different scenarios, different life science. Different types of EKGs and so forth. So in a nutshell in the semantic data virtualization what we want to do is that we want to be able to have an EKG that is your homogeneous view of all your heterogeneous sources and we want to map all your data sources to that EKG. And then we have the question of should we translate the data to fit this particular data model which is the graph and the ETL format or do we do the no ETL approach, which is what my question is, right, to ETL or no ETL. So what goes underneath the hood a little bit is, so you have your standard relational model and relational databases and because we're using all these semantic web standards which are RDF, Fowl, Sparko, this is all a graph data model so really to be able to get all your legacy sources to work with this new semantic technology, what we essentially need to do is to map, get relational data into a graph, into RDF. Now the question is how do we do that? So this is actually a question that when people started RDF it's pretty old, it's almost 20 years now I think. But around 2007 was the first time that people got together and said, alright, but we have databases in RDF like how do we put those two things together? So it was actually the first meeting in 2007 that we had about that and the result of years and years, almost years of work after that were two standards. One of them is called the direct mapping standard and the other one is called R2RML which is the relational database to RDF mapping language and these are the two W3C semantic web standards for connecting your relational databases to RDF. So again as Dave was saying I started in this process 2007 from day one so I've been part of the whole standards process and my whole PhD was all about this and I'm one of the editors of the standard so this has been part of my life. So now what we need to do is be able to create these mappings. So the first step is to be able to create these mappings and then once we've created the mappings we can use the mappings. So we need to be able to create mappings from the source to that target EKG and the mappings are represented in this standardized language called R2RML. Now R2RML is also itself very meta is RDF and which means it's all declarative which I think is very very cool because what we do and the tools are presented in terms of that target model the target EKG so everything would be in terms of FIBO for example. Now in the no ETL approach what you do is that your data stays where it is but you're still able to access and query that data in terms of that target EKG the target model and then you would have wrapper systems that would do the translation from in this case Sparkle and the query in terms of the target down to the source so you don't have to worry about moving the data. So in the mapping step as you can imagine right you have to be able to have a source then you have your enterprise knowledge graph in this case could be FIBO and then you would have a user and then the user would have to have knowledge of the source have knowledge of the business knowledge about the finance and FIBO and then hey these two things come together and I'll go a little bit more into detail about what kind of thoughts need to go into so and the thing is you can reuse a lot of the existing expertise that you have in your organization use your existing people who know SQL who write all their existing SQL reports and stuff because you already have a lot of business logic in there which is what you really want to map to that target model and then once you do those mappings right now your result is going to be the literally the physical mapping in R2-RML in this case so this is what that looks like which is horrible and I don't want you to learn that that's why we create tools so you can automatically do these mappings now what's interesting is that because we can now reuse a lot of automatic techniques to do mappings between schema so when we do these mappings we can generate them automatically so the user now just has to go and make sure that what things was correct so once the mappings are created now you got to use the mappings in the ETL process I can have a mapping and then I can run an ETL using these mappings so I mean we're a vendor in these tools and you can or people do write scripts for this stuff and then you generate the RDF and this RDF can now get loaded into a triple store there's triple store vendors here I know the star dog is one for example then you can now run your sparkle queries directly onto that triple store so in this case you have a copy of the data but now that data is also in terms of that target which is FIBO now the other approach the no ETL approach is where you know what I have my relational data and I have the mapping that I've created but what really stands up is a virtual triple store so you think it's a triple store you can query it like a triple store but it's not really a triple store it's still your relational data your relational database so what happens is that you can now take a sparkle query and this sparkle query will be in terms of that target EKG in terms of FIBO and then internally what the systems do is that they translate that using the mapping to SQL queries down to the database and then you get your results back so in this case you think your query and everything in terms of that target but your data didn't have to move so what does this look like when you put it all together you have your different relational databases all your different sources and once you've created the ones you can set up for example these no ETL engines and what this means is that now they're virtually basically a triple store and now everything is a graph now the cool thing about graphs is that if I have two different graphs representing two different data sources how do I get two graphs into one I just start adding edges between nodes and that's actually the reason in our experiences that we see that adding edges between nodes is faster coming to when we want to do data integration so by changing the representation of basically the model relational to graph data integration just becomes faster and by changing the representation we've reduced the problem of data integration to finding edges between nodes and that's the faster problem that we've seen we've created tools which are much easier to just do this is one on one so when I have different sources and they're all graphs I now can now query each one of them for example and now I view as if it were one giant graph and I can now query that all together so in this case we're all federating the queries and then we're pushing all the work down to where all the sources are and then we've done all the work to make sure that performance is not an issue and you can have access in real time of your data without having to move it and then now you can do all your reporting but this my point is that I'm not saying that you have to do it this way right you can want to be have a flexible very nimble and have probably a hybrid approach where I can say you know what here are some data sources that I really need to access in real time or because and or because of security privacy reasons I can't move it but there's other data that I can move right either it's external open data or other information that's probably just stales and get updated that often so I and make sense to physically have it all in the same place or even for just analytical purposes or some type of analytics you can then ETL that and put it into a triple store and then because the no ETL portion virtualizes your relational database as if it were a triple store now you're all talking graphs doesn't matter if they're in the relational database or it's physically an RDF database and now you can go continue and keep doing your reports so one of the things that do come up is about these mappings and there is some work that needs to be done but what we've realized is that you don't have to boil the ocean when it comes to integrating all your data so traditionally you say well I got to figure out where are all my sources I need to integrate then I need to do all my enterprise schema and then figure out all the ETL jobs and then move all my data then six months later after a million dollars I can now query it then I realize I'm missing data this is the wrong data I can't get results and everybody is saying hey what's going on well we're have because of this whole graph approach we can be very nimble to do things slow start small and get bigger so what we usually do with our clients is we sell them just let's start with the solve set of questions that you want to get answered which are the hardest ones the most expensive what we can't do right now and let's work on that and that means that if you have an EKG we can we know what part we want to use if you don't have one we know what to start building and then we look for the particular sources and from those sources what are the tables or the attributes that are needed to answer that and you only create the mappings for that and then you can write those and run those queries then you get answers back literally from basically from day one so you start getting the value from this from the beginning instead of waiting all the way six months so that's why even though mappings you need to have a lot of the business knowledge understand people from the business understand where the data is when it comes to the mappings so obviously you can think that the easiest way is if things map one to one right name first name is first name here and so forth I mean that's a simple cases but you have other things that say hey this couple things in the database can mean this different things in your target right so one thing here can mean many things or many things here can mean one thing and then you start having discussions with people saying wait no but that's not for me it's different right but for me it's another thing and that's the good that's good that's the discussion you need to have I realize that that's how you start probably extending your EKG right I mean in an e-commerce setting that we have we say what is an order right depending who you ask an order is when somebody click checkout on the on the website an order is when it actually process your credit card an order is when you got delivered to your to your home so you ask different people they meet they think about different things for the same word again that's not about the words concept so you start thinking about where is that in your data so you can have different mappings and say you know what the joint of all these tables for example will map to a particular attribute or concept in your database so these are the things that you have to think about from the conceptual point of view and you look into the data to create the mappings and then these mappings you write them in SQL and it's all declarative and mapping language supports this and at the same time I could have probably data values right data values but those things show up those concepts are concepts in the ontology in the EKG but in my database they're data values right so it's data metadata but here everything is data metadata so you need to be able to how do I reconcile those things so so these are examples of types of mappings out that occur and all that are conversations that need to be happened and the standards always support these types of mappings so these are things that we go through with our clients when it happens when you're thinking about well I'm hearing about FIBO my data is an oracle how do I put these two things together so these are the process that you go through and we've been having a lot of experience with that so I'm going to pass it on back to Mike to say well once you've actually mapped things together what are the types of reporting now on top the applications what do you do Mike Thanks so you can see why we're excited about this stuff you know we created a common language because the industry said we need a common language we don't need another bunch of X about messaging there's lots of those but we need common language we said great okay we've done one we've focused on meaning what do we do with it well yes we need to integrate how do we integrate so seeing a practical solution like this which takes a business facing model where the business owns the meaning remember that's one of those key requirements in Basel that the different business units are responsible for meaning of the data they own the reports and they own the management dashboards and things and being able to put that directly to work in technology without any sort of sort of impedance mismatch as it were with IT it's extremely exciting so let's close the loop and show how we pull this together now so the Basel requirement that I mentioned BCBS 239 it requires you to report very flexibly it also requires that you can prove that the things the management is looking at when they're dealing with risk are timely and up to date and are indeed the same things as you said to the regulator and not some other thing that you made up and so being able to make meaningful semantic queries and ping them into a meaningful model of business meanings which is the ontology where you virtualize the knowledge graph means that now you can meet quite a large part of BCBS 239 because you can demonstrate that what the management is seeing is the real thing by not having to stand up triple stores except for particular applications where you need them the data that's been managed and has been gone through all the processes you've built up for BCBS compliance you've shown all your data governance and so on all that wouldn't be much good if you then said ah but we actually reported on this data over here in this triple store that's different data you can prove that you're reporting on the very data that you've put all your data government measures around and that what management is seeing in their dashboards and so on is a real reflection of the data that you've put all this new effort into looking after and doing the data quality and governance and so on and if you've got a new query a new risk factor that comes up a new thing you want to see is just a new sparkle query in technical terms a new semantic query as far as the business people are concerned and there's lots of tools and techniques for doing that and similarly for reporting you can show that your reporting is timely and accurate your reporting has real data that's right there you know if you want to draw inferences from it you ETL that into a triple store to draw new inferences like we saw about counterparty exposures and transitive exposures across networks of types of entity and so on and you can create new reports without building a whole lot of new technology again it's just an update to the sparkle query so it kind of takes IT out of the loop IT has a function a function to look after all that stuff but getting the information out in a timely way becomes a business problem with a business solution thanks to the kind of technology that we're seeing now that allows you to virtualize your data and essentially provide with minimum application development the ability to extend and vary the reports that you need to do and it's non disruptive so you're using existing systems of record existing data quality measures so very exciting as we say I'm very happy to be working with these guys so the takeaway then as you heard from the beginning is banks can demonstrate BCBS 209 compliant risk reporting without disruption by using FIBO and semantic data virtualization thank you any questions yes okay so the slides I sent them a couple hours ago so they should be online soon and so the question is what is the performance with this type of solution with large amounts of data and large amounts of mappings so the answer to that is that for our tool which our tool is called ultrap we scale as well as your underlying relational database scales so a lot of the all the science and technology came out and actually kept sent as a spin out from the University of Texas in the computer science department so I started the company after I did my PhD there and what we studied and what we figured out was how do you take the semantic query over a type of wrapper system and you take that same exact English definition of a query you write it in SQL to your database and the queries are basically the execution speeds are practically the same and actually more often than not you look at physical query plans they're literally they're the same so we crafted a system where we took advantage of 30 plus years of optimizations from the database and we pushed everything down to the database so the answer is we scale as best as your underlying relational database scales yeah I can send you a question in the back yeah so probably on top yeah so the question is that there's other tools out there like ours and there's an open source tool and this question is that if you don't have referential integrity constraints can you still do the mapping the answer is yes so what we do underneath the hood is we extract basically an ontology from your relational database and then you have your target and with this we now have two ontologies and now we use existing ontology matching techniques to help you automate that so it's just to help you get closer to the end if the referential integrity constraints are basically semantics in your database so if you have them great we'll take advantage of them if you don't have them don't worry then there's probably extra work that needs to be done but it can still be done okay the last question yeah so we do support federated queries we have reasoning coming in from our lab inside so we already support subclasses that we'll be doing inside again that's probably a scenario the question is how much of reasoning is needed and if you need a lot of expressive reasoning that may be I mean Altul you have QL RL and EL right so it's in a lot of the work that we've done the scientific work is that you can push down to the database which is what's called RDFS plus right so sub property subclass which means equivalent class equivalent property and transitivity symmetric so that's stuff that we have in the pipeline to implement but we so it may depend on the type of query that you do that you can push that if they don't conflict with each other then you can do that but if you have to do reasoning that is a scenario where you want to have the hybrid and then do ETL for some part and then use your own tools to do what you want to do and that way using semantic web applications is a matter of here's a use case for a reasoning application we'll stand up this triple store to do exactly this and you know all use cases need reasoning so if you're doing just plain data integration I mean you may not need reasoning first thing so it really depends on your use case okay let's give these guys a hand