 Mae'n rydyn ni'n fwyaf cael iaith o bobl i chi'n ystyried. Mae'n rydyn ni chi'n mynd i'r gweithio gwybod. Pe ffeidio bai yn bachwr. Gweithio gwaith yng Nghymru. Mae'r gwah Kimchi a'r allan i叫 City. Felly mae fi bwyd hacia'w, mae'n rhan i'r arferio edrych ei gweithiau. Rydyn ni'n rhan i'r ffocused mewn cwyrdd. Rydyn ni'n gweithio'r cwyrdd o'r cymryd nad oedd o'r cyfwyr nolog. yw gwneud eich bobl ni'n gweithio ar y cyfnodd. A'r cwethaf ymweld sy'n dweud i chi wedi'i gweithio'r cyfnodd sy'n cyfnodd cyfnodd cyfnodd cyfnodd ac cyfnodd cyfnodd cyfnodd cyfnodd cyfnodd cyfnodd cyfnodd. A wedi mynd i gael eu scenario, wedi gael eu gwneud, mae'n rhaid i'n mynd i chi'n ddim yn llwyddo ac mae'r gweithio'r bod yn rhaid i mi sy'n gweithio'r gweithio i gael. Ond ydych chi'n gwybod dwy'r cyfnodd, y dywedodd a'r cyfnoddau cyfnodd ywr gweithio'r gweithio'r cyfnodd. Yn y gŵr, 30 munud, Beker a Rhywun yn gallu i amwysgwyd o'r gweithio'r cyflwygu cyffredinol a'r cyflwyngau cyflwyngau cyflwyngau. Yn y gŵr, dyma'n gwneud dyma'r cyflwyngau cyflwyngau cyflwyngau cyflwyngau a'r cyflwyngau cyflwyngau cyflwyngau cyflwyngau cyflwyngau. A greaddfyn ni'n meddwl ita phobbiadau'r daloeth dyfynodd yn dwithego'n ddymiad byddai cyfieisio cyfieisio a gyfieisio'n ddyddiad ar gyfer y dynodd a ddano i ddweud y dynodd yn dweud rythaf sy'n teimlo i chi'n meddwl i'r ddahl. Ysgrun이 itsiwn ar y ddweud yn rhan o gyfrifoedd o'r gaur dim, ond i gynyddiadau o ddond i'n meddwl y mae hyn wedi g annoyedd ar bod yn syl cevodd fod yn ddweud, leidio i'r gweithio ddatyr gennym. Os ydych chi'n gweld y legend ddiwg, os wych yn gweithio'r hyn teimladau newydd yw ychydigau sydd gyda'r newid. Mae ymlaen gwybr â diffwng yn ei hion i fynd i fynd i'n gweithio. Rwy'n meddwl, felly i'r cyffredin ar ymddangos y mod i ei gobyl. Felly, er mwyn amgylchedd, â'r cofosod o'r cywyr ymddangos i fynd i'r gweithio o'r gweithio, data engineering department in Goldman Sachs. I co-manage our global data models and governance team, so there we're really looking at modeling our transactional data for the global markets division, making sure that we can process their data for all of the trades that are going through from front to back. And prior to that role, I was actually in operations, so I was part of the credit middle office supporting our credit derivatives trading desk, and really there I felt firsthand the complexities and the difficulty in managing such large amounts of data in the financial industry. And that's really set me up for my role now, both within the firm and in the industry. And so from an industry perspective, I also co-lead the Finos financial objects and really working on that industry collaboration with many of you here as well. So Bika, do you want to introduce yourself? Yeah, thanks so much, Fii. Thanks everyone for joining us today. We are super excited to be here. So I'm Bika. I'm a vice-president in data engineering at Goldman Sachs. And I've been part of the legend open source journey right from the beginning. And drove various aspects of the legend project and continue to do so as my role as product manager for the legend stack. I also work with internal and external clients and the Finos community to help them understand the legend tool set as well as the functionality and also support them in their journey to evaluate and adopt legend. So fun fact about me. I'm originally from Germany, lived in five different countries. And I may be one of the very few Germans who doesn't like eating potatoes. So handing it back over to Fii. Thank you. So data governance can cover a vast array of concepts and really depending on your use case and your business and your organisation is going to depend what really drives that for you. But it's said that data is the source of knowledge. And even for us as individuals, I think that's definitely becoming more relevant. You may have travelled here today on public transport, whether you were querying train times, using the status boards on the subway. Perhaps you ordered an Uber. In all of those scenarios, you were relying on the data that was put in front of you. For those of you who bucked that trend and went for the healthy options, well done. But why is data governance important in the business use case? And really for the financial industry, there's external and internal factors that drive this. So externally being able to deliver exceptional client service has to be top of the list. Being able to exchange large amounts of data and keep up with the moving markets, as well as being able to transform data for downstream processing, such as perhaps settlements and confirmations. Another key post-trade processing aspect is regulation. And we've already heard about it today in some of the keynote speeches. And it's something that's going to continue to drive the industry as we go forward. Regulation in itself is a form of data governance on us as a financial industry. But equally, it requires us to then have our own internal data governance to be able to ensure that we are sending accurate, timely, complete data to our regulators. From an internal perspective, really the baseline is for driving business decisions. Now, this can be on the revenue side if we're trying to sort of drive those business decisions in terms of trading. But it equally covers other aspects of the organisation. So driving business decisions in terms of human resource or perhaps more something that's a little bit more in the news at the moment around COP26, some of the environmental, social and governance aspects of the organisations that we're having to sort of comply with. So all of those hopefully show you why data governance is important. Just a touch on a few and I'm sure many others will be coming to mind for you. But what do we mean by data governance? So data governance, as I said, will touch on a couple here. I'm sure there's many of you out there which will have others in your mind. But starting with ownership, privacy and security. So really thinking about sort of who has access to my data, is it being stored in a secured manner? Really important for us as well as individuals. We want to make sure that companies that own and have our data are holding it in a secure manner and it's not getting sort of into the wrong hands. Regulatory compliance, touched in it already, in itself is already a form of data governance but equally means that we have to drive our internal data governance to meet it. Data lineage and data contracts. So really thinking about where does my data come from? Do I have any SLAs in place if I'm a producer or a consumer of data? I'm really being able to document those in data contracts to make sure that that lineage and traceability exists across organisations. Data lifecycle and milestoneing or perhaps versioning. Really making sure that you know which version of your data you're looking at. This is something we'll come and talk about a little bit later as well in the terms of the industry collaboration efforts as well. Really making sure that we know which version of the data we're looking at to avoid duplicating efforts. And finally, data quality. This is the one that we're going to spend a little bit more time on today and really put it into the context of how legend can help you. So some of the concepts you'll hear us coming back to are accurate, timely, completeness and also traceability. So you'll hear us bring those words back in over as we go through and sort of how legend can help us meet that side of things. So, Bika, I'll hand over to you. Yep. Thanks so much, Fi, for telling us more about data governance and also bringing up the importance of data quality in that context. So what I'd like to do is now dive a little bit deeper into the subject specifically how legend may be able to help increase data quality in each individual organisation. But first, let's take a step back and let's imagine we work in a firm that is interested in getting data about the vaccination status of its employees as part of the return to office strategy. That may sound simple, but in reality, getting high quality data that are relevant for your use case and actionable can be quite tricky. So let's take a closer look at how a typical scenario to retrieve raw data for a business-driven report may look like. You can see here at the top what kind of report we have in mind. It should list the legal entity, the office location, the employees first and last name, as well, of course, their vaccination status. So, but how do we get the data now? Now, a hypothetical firm, there are three databases. A firm database, a person database, and a vaccine database. And in our example, the names of the databases like clearly indicate what kind of data we would expect in there, but that often is not the case. And what you can also notice here are the fairly cryptic column names of the data sets in these databases. So a business user who may be less familiar with where the data is stored or what the overall database schema looks like may have difficulties knowing which database and databases and columns of interest for their particular use case. To now actually retrieve the data, we may have to use SQL or any other query method to extract the data from the different data sources. And then we may even have to use some Excel magic to merge all these different data sets into one coherent report. So, long story short, this is likely going to be an error-prone, multi-stab manual process that can only be performed by a quite tech savvy person. So, bringing this into the context of data quality, you may notice the following problems. It may be tricky to ensure our report's data is complete as the business user may find it difficult to verify that all the relevant databases and columns have been queried. This would also require them to understand how the data is related to each other. It may also be complicated to ensure that the data is accurate as the process of data extraction and creation is quite manual. The business consumer also may have to solely rely on the accuracy of the query written by the developer with little opportunity to validate the data by herself. So, and lastly, there is little transparency about the origins of the data and how it got transformed along the way. So, if our data is being moved or altered, our queries may break. And we as the business user have no idea what happened. So, may there be a better way? And as you may be guessing right, there is with legend. So, what is legend? Legend is a free open source data management and modeling software. And we work together with Finos to make the code available on GitHub for everyone to install it on their own premises. Finos is also hosting a public version of legend on their servers. And we will talk a little bit more about the industry collaboration happening with that version of legend later on in the presentation. So, at GS, we developed the platform because we've seen firsthand the struggle with data silos, duplication and quality as the complexity of data just accelerated dramatically. With legend, there is now a free solution on the market that aims at providing efficient and reliable access to timely, secure, and safe data. The heart of the platform is legend studio, our data modeling environment. And studio allows users to define business friendly data concepts, connect disparate data sets and also visualize model data for easier collaboration. And most importantly, it allows developers and non-developers to work together using the same platform so it's intuitive and flexible interface. So let's take a look at legend more closely in the context of the use case that we brought up earlier. So instead of querying raw data, we can build a data model with legend studio and query model data. You can see a data model that would work for our use case displayed here. And I walk you through it in just a second. But maybe let's define first what a data model actually is. So simply put, a data model allows you to build a better understanding of your data by creating business friendly concepts of your data and also define data relationships. By doing this, data models add a layer of abstraction on top of your raw data to organize it and make it more usable in a variety of use cases. Concretely, where we had cryptic column names before, we can now define business friendly concepts that are relevant and meaningful to us. Specifically, a firm class, an employee class and an office location class. And we can further define attributes such as the first name and last name in the employee class to define the introduced concepts in more detail. We can also specify how the different concepts are related to each other. So for example, our firm class can have one or many office locations or zero or many employees. And really, really importantly to note here is that this entire data model can be created in legend without writing a single line of code. And in the context of data quality, this is fairly important as the data can now be described and agreed upon across different teams independent of their technical knowledge. This tackles the historic divide between business and tech teams or differently said, the consumers and producers of your data. Lastly, you can see here that our relational databases that store the data we are interested in are mapped to our data model. And these databases can be of different types and scattered across the organization. Legend brings all of these together in one coherent data model. Quick side note here, legend currently supports connecting to Snowflake and BigQuery databases as well as an in-memory H2 test database. We also recently added the capability to connect flat data sources such as a CSV file to your data model. And we at GS, we continuously work on adding new connectors to the platform and would encourage the community to do the same. So putting our business hat on again for a second here, what legend allows us by adding or by enabling us to mapping these relational databases to our data model is for us to create and query and execute model data without the need to know any technical details about the firm's databases. This is enabled by the platform's powerful execution engine that is running the queries. Thanks, beaker. And do you mind if I jump in with a couple of questions as you go through this next bit? Yeah, sure. So you've made it really clear why we need a data model. Perhaps as someone who's a little less familiar with this use case, could you perhaps share some features that allow me to get familiar and sort of being able to navigate the model a little bit more? Very good question. So what I could do to help you navigate the data model a little bit better is by adding descriptions to my classes via tag values. So you can see an example here on the slide where I added an alias for firm, namely organisation. So if you search for key concepts using different terms such as organisation, it really helps you to find the information that you're interested in. So making it easier for people to search across the data model can definitely help reduce data duplication. As it facilitates reusing existing concepts and their mappings to data stores. It's especially easy to look out for these descriptions in the actual data model code in text mode. And what's the text mode that you just mentioned? Yep, so... Yep. Userscan and Legend Studio actually easily switch back and forth between a user-friendly user interface, which we call form mode, and the actual data model code in text mode. So changes made in either view can be seamlessly translated. One more reason, right? It's super easy for developers and non-developers to work together using the same platform. That's great, and I'd like to come back to that point you made on data duplication. Are there any other features that allow us to sort of avoid data duplication? Yep, so... Can you see the error pointing to this illegal entity class here? That adds a hierarchical layer to your data model in which the firm class inherits all the attributes from the legal entity class. In Legend, we call this supertype. This reduces the need to recreate attributes and allows users to leverage existing defined relationships and mappings to data stores. So just to check my understanding as well. So we're saying that through the idea of tag values and the inheritance and also being able to map your data models to the data sources, we can really help remove that data duplication as we guide people to the authoritative data sources for them to build upon. Yep, correct. And this may likely lead to more accurate and complete data. Great, and further looking at your model, I assume vaccination status is going to be quite a sensitive piece of information for people. Is there a way that we can make that clear on our data model? Yeah, excellent point you bring up. That's definitely something we have to keep in mind. And Legend allows you to add labels to your classes and attributes via stereotypes. So in this case, we can add a sensitive label to both the employee class as well as the vaccination status attribute. This feature is quite helpful to draw attention to data that may need specific entitlements. So the owners of the data can then make sure it's properly entitled so that only Legend users who are allowed to can access it. It's great, I love that Legend can really help us make sure that data is not getting into the wrong hands. On a different note, can Legend help when I'm querying data to actually bring it back in the format that I'm interested in? Yep, there are actually quite a few different ways that you can accomplish this. So for every attribute that we define in our data model, we can specify the data type. So for example, if you only expect numeric or date entries, you can specify this via the respective data type. And you can also specify which fields are mandatory versus optional or how many values you expect to retrieve via the catenality. And what if I wanted to actually restrict the values that get returned to me? So if you know your data quite well, you can specify the valid entries via enumerations. So for example, I have predefined the valid entries for vaccination status, more specifically fully vaccinated, first shot only and not vaccinated. That's really great. It really helps making sure that we sort of understand the data and get it in the format that we need. Yep, so those were all fantastic questions for you. And we did touch upon this already a little bit, but I'd like to spend a few more minutes on data accessibility through Legend. An important factor in data quality is to make data safely accessible to end consumers. And ideally, the end consumer should also be easily, should be able to easily understand the data they are looking at. Even better, data consumers should be empowered to build data queries themselves without the need to know any coding language. So that way, they can really be sure that they get the data they want. All of this is possible through Legend. Data consumers can use business-friendly terms from the data model and build their queries drag-and-drop style. And you can see the results I expected to see in my report on employee's vaccination status created here in Legend, where I just dragged the key concepts that were relevant to me, such as office location, into the execution panel. And then all I had to do was hit the execution button and the data was returned to me. So for enhanced transparency, Legend makes it also easy for consumers to understand how the physical data sources have been mapped to the data model attributes. You can see the mapping details from our use case here on the slide further at the bottom. Making it clear to our consumers of data where the information is coming from is a key ingredient for high-quality data. And lastly, I'd like to mention that Legend also allows for programmatic consumption of model data, not only via ATOC queries, as you can see here. Users can create APIs with a simple click of a button and then consume the model data via executable Java files which they can deploy in their Java applications. And we are also working on making the consumption via REST APIs available. So hence, high-quality model data can be used systematically in production processes. And this is really helpful. I have a couple more questions. So what if I'm interested in a slightly different query? So I want to know if employees are vaccinated or not vaccinated and I guess this might become more important to us as we go through the next season with sort of booster jabs, what do we really mean by fully vaccinated? But do I have to go and create my own data model to be able to do that? Yeah, good question. You actually don't. So what we can do in this case is do a transformation from the model that I built to the one that you want to see. We do a model to model mapping in this case, mapping the enumeration values I specified to the ones that you have in mind. And in the context of data quality, you'd still see the original shape of the data and how it got transformed along the way, creating lineage of your data. And you can see the model to model mapping and the slightly different query results here in the slide. An interesting point to bring up here is that you can see how the accuracy of the data is very subjective to the individual use case. So your expectations feed towards receiving accurate data was different from the one that I had originally. So a legend is providing the flexibility to end users to bring the data into the shape and quality they want to see. Yeah, it's really true. And how about if I wanted to restrict the value? So perhaps I'm just interested in my legal entities or a firm that's got more than 100 employees. Yeah, so what you can do is you can create a constraint, in this case on our firm class, which adds a validation rule to your data model. So if you're executing your query, legend would return a defect if the firm does not have at least 100 employees. And these constraints can be both on the source and target class, which may enable some interesting and complex cross-divisional consumer and producer constraints. Great. So in an attempt to summarise everything that you've just seen, legend is a toolkit that offers a way to instill data quality into your data models and when you're querying data sets. The main aspects that we have been through are completeness, through sort of cardinality and enumerations, accuracy through data types and constraints, timeliness through that reducing of data duplication, looking at sort of tagged values and inheritance. And finally, traceability with the model to model mappings and also being able to actually identify the sensitive data points as well. Hopefully through the use case we've taken you through today, things are coming to mind for you in your sort of business or your organisation where this could be helpful. Perhaps if you're a producer of data, you want to go out there and now make models for your data to really help your consumers to be able to engage with it. Perhaps you're a consumer of data and you want to go and build your data model to be able to map that back up to your producers. Or perhaps you're the engineer that right now feels like you're sat in the middle between producers and consumers and actually you could go and connect them directly using legend and step yourself out of the process. These are all ways that we have done it at GS and so hopefully that's an encouragement to you that you could go and do the same and as we've said, obviously there's legend is available for you to download on-premise if that's the way sort of that you wanted to go and feel free to come and speak to us about that if you wanted to. We also mentioned at the beginning that Finos has the shared instance and that's where we do a lot of the industry collaboration. So just to close out the session, I want to have a look at that piece and how data governance is being instilled there as well. So in the industry collaboration, we've been collaborating in the industry for a number of years already. Many of you will be familiar of industry standards that are already out there but as we sort of launched legend, we built some pilots around the ISDA common domain model and so this is a model looking at transactional data, specifically in derivatives, but actually looking to expand that to broader products as we go through. And so really legend helps to bring those business people into the conversation as you've seen and we've really seen that push forwards over the last year or so. Over the last year we've had, I think we're at about four contributions from the community back into the ISDA common domain model. That's at a much quicker pace than we've seen the ISDA common domain model being developed in the years prior. And we've also seen two new models being created using legend. So some of you might be familiar with the currency reference data or seen that one and also the product control. So looking at price dissemination and how we actually do that alongside banks and vendors. What's really we've seen as an encouragement is that we can get much broader participation by doing this on Finos. We can bring our clients into the conversation. We can collaborate with vendors. It's a safe space for us to be able to do that together and really legend provides that change management aspect to this to ensure that we're working on the right version. We've got the latest version. It allows people to sort of jump in and get started at the time that they're ready. And also legend allows you to do all of that through the GUI. So no need to understand GitLab for those of you who are out there who are. But it is all happening in the background so it has that SDLC and change management in the background. But you can do it all from a touch of a button within the legend GUI being able to make sure that your data is being tested. That you can send it for review and finally have it merged back into the final one. A quick note on the industry body. So you will have seen or heard this morning that Isla joined and Isdor are also members. So having those industry bodies as part of the Finos community now really helps us have that oversight as well as we're building industry standards such that we can really build the confidence of the industry together. If you would like to get involved in any of the industry work please do come and find me today. I'll be at the legend stand or around. But please do come and have a chat. It would be great to get you involved and start getting some more industry standards out there. Thanks so much for joining everyone and find us at the legend table.