 Thanks everyone for your time. Thanks Finos for that awesome organization So and thanks for the the opportunity to really speak about open source technologies Obviously otherwise that will be probably the wrong forum, but open source technologies in the context of accelerating trade cycles And especially relevant as we are moving towards towards T1 settlement So with me today the clicker doesn't work with me today Ashley trainer or Solution architect for financial services at Databricks Me Antoine Amanda technical director for financial services at Databricks and Ashley and I are absolutely thrilled to co-host this session with Steven Goldbaum and Efrem Stanley Respectively a distinguished engineer at Morgan Stanley and a technology fellow at Goldman Sachs To talk about legend more fear How are those different open source initiatives can coexist can be integrated and harmonized when coupled with Databricks back-end So we have a packed agenda. So let's get started right away with that code or probably more of a statement than a code But something that we've seen in Wall Street back in 2015 when the industry was moving from t3 to t2 So two days to close and settle all those trades as a way to minimize that risk risk of those trades failing and Therefore minimize all those margin codes, right? So time equals risk reducing that time makes sense But that was back in 2015 built specifically for period of high market volatility What have we seen since 2015? We've seen a Brexit. We've seen geopolitical tensions We've seen a global pandemic and with it a frenzy for retail investment from meme stock from ready post We've seen an explosion of data and that explosion of data Really pushed our trading systems to the edges really exposed clear limitation of what our back offices are able to cope To scale to adapt to those changing conditions But volatile market means that we need to bring that time even further down to one day So initially suggested by the DTC see no on route for being enforced by mid 2024 and that gives us practitioners just one year to design to implement to test Not one or two or three systems, but an entire ecosystem shared with all market participants So from the buy side the sell side the custodians the clearing houses Somehow all those different services and systems will need to work in harmony The challenge we see is that back office are plagued with legacy technologies plagued with silos data silos people silos Across asset classes across regions and plagued with manual processes. I Wouldn't be surprised if today I hear some analysts are physically running from one mainframe to another with a floppy disk Right or sending that via FedEx or recording an entire Excel spreadsheet on a database on every single day So the way we're gonna achieve T1 as an ecosystem as an environment is through automation and Automation would be kind of naive to think everybody will be using the same technology and the same proprietary model Automation will need to be about standard and open source is key So somehow it's aligned with those those key values of open source technologies Creating new standards that enable that Interoperability across different systems that may be using different technologies, but harmonized around those those different standards and driving that collaboration So legend and Morphe are great examples to enable that Interoperability data as a contract models as a contract logic as a contract that can be interpreted across different systems And before I hand over to Stephen to talk about Morphe Just want to stress really the importance of having that right data culture We've seen that over and over whether it's data driven whether it's about cloud whether it's about open source Organization that went through that kind of behavior shift that transition in term of a data culture win Why because they can adapt to all those changing Condition much faster than anyone else and to put that in perspective of T1 when DTCC estimates that the margin calls can be reduced by 40% by moving to T plus one day That's billions of dollars that could be reinvested somewhere else Assuming we can connect the dots Assuming we can cut the middleman kill those manual process and streamline that entire process throughout that entire trade lifecycle So let me hand over to Stephen to talk about Morphe We will be looking at an example that is not necessarily a both T1 settlement But you will be looking at this through the context of regulatory reporting But the principle is that all those technologies are generic and can be adapted and we can use that for LCR calculation today the same can be used for trade settlement tomorrow all those different technologies can be Interoperated a choice different system. So without further ado over to you Steven and Thanks Okay, so yes, so we're here to talk about how we can bring the Finnaz ecosystem together to do new and interesting things and What is more interesting than regulations and what better regulation is there than the liquidity coverage ratio, right? and so the liquidity Otherwise known as the LCR because it's a mouthful it's really about Categorizing cash flows so you take all these cash flows and Categorize them into groups and then once you've categorized them You will apply haircuts to those groups and then you aggregate all that up And then you apply some math on top of that to eventually come to a single number That's what the whole thing is. So there's a lot of different kinds of processing there and Excellent excellent. Thank you. And so One of the things you might ask is well, how can we implement that and that's where morpher comes in So what we've done is coded the LCR and morpher And just to kind of highlight some of the things you can do with that is with morpher You can interact with it interact with the logic in a non technical way So what we're looking at here is a subset of the of the regulation Specifically about assets and what we're looking at is the section that Takes those assets and based on the attributes of the assets figures out what group it should be put into And so one of the questions that you often get as a developer is you know, the business will come to you and say Why did this particular asset get categorized that way and instead of having to go dig through code? We can actually give them this tool and say, you know what you can go and interact with it and find out and then they can Go click on this thing and this is all generated from the business logic But they can go in and they can see exactly why we got to certain results the way we did and On top of that We can then unit test it. So then we can plug in values into this inputs area and Put those in a unit test So all of that is just a way to make sure that everybody's confident that we got the rules correct And we do that by having an interactive session with the people who know the rules so that they can go in and Understand that we what we've implemented and then put in their own test cases to that So the next thing you might ask is okay, that's great, but then how do we execute it? Well, what we can do is we can use more first tools to generate into different execution contexts And so in this example, we're going to generate spark so that we can execute into the data bricks lake house platform and by doing that we're handing off execution to a managed platform and then the other thing you might ask is well, how do we integrate that and The answer to that is well We can take the information of the the structure of the assets that needs to be Handed to this calculation In order for it to run and we can hand that to an integration technology like legend So we can take this this is the structure of the asset by the way We can take that information and hand it to legend and then let legend pull that data in from on-prem And put it into the lake house for execution And so that way we've got the entire ecosystem again This is all open source technologies and we've got this all working together in a way that Provides value to everybody and with that I will hand it to Ephraim Sorry Thanks Hi, everyone. Thanks for having me. My name is Ephraim Stanley. I work for Goldman Sachs I'm also a contributor on the legend project So today what I want to do is give you an overview for the legend data platform and walk you through a real demo of building a data access API that we called a legend service So legend is a data modeling and data management platform and a data model is a systematic way By which you describe the logical shape of data now data models are super important because they allow multiple parties Whether you are users within the same organization or even across multiple organizations to have the same consistent view of data Now what legend let's us do is access the data that's in different physical databases Through the lens of a logical model And the way we do that is by creating a mapping between the logical shape of data and the physical shape and using this mapping Legend can take a logical query convert it into a physical query run it against the data and get you the data back Right and this is again like super important because no matter where your data is It respect of how it's laid on desk you get a consistent view and today thanks to our Partnership and collaboration and Antoine's personal contribution like legend now supports data and Databricks databases using a JDBC connector So with that I'm going to have Antoine help me with a live demo So what you're seeing right now is an application that we call legend studio Which is a very interactive rich data modeling environment You could do many things with studio like you could define a data model You can manage the life cycle of the data model through its through its integration with a get back in like get lab And we can also execute data access queries, but let's just start by looking at a data model. So here we have a very simple Data model of an asset which is modeled as a class with a bunch of attributes and now we move on to a mapping So on the left side we have the logical shape of the data and then on the right side we have the physical model which in this case happens to be a Relational model with a database table and then in the middle you have all the mapping rules in the simplest case You can take an attribute like business line and map it one to one with a database table column in the more complex case For example Maturity date you can map it to a sequel join Now with this mapping in place then we move on to what we call a connection or a runtime So a connection is where data lives now in this case We're saying our data is in a Databricks cluster and you can authenticate with one or more supported authentication schemes Now with all of this in place we can then move on to define a service. So let's look at a definition of a service So the job of a service is to define the data that you want in this case We're saying we want to project certain attributes of the asset class But to execute this query we need to map it with the mapping that we just described And then we also need to point it to where the data is which in this case is a Databricks cluster Now with all this in mind with all this setup will see the magic happen now when Anton executes the query We'll just blame the Wi-Fi here. Yeah So what legend is doing here is taking the logical query Converting it into a sequel query running it against a database and giving you a data back Wifi. Yeah, maybe the Wi-Fi, but So we're seeing that so Steven created Almost an input data model for your job, right? So you were expecting data to come in a specific shape specific strict your specific schema You define that schema a frame as a target class and we've mapped that physical model that physical table into this So that means that if we can pass that data that information to a more free application the rest becomes seamless so I think that that really leads to What we wanted to check at what could be that kind of glue in between? How do we pass and integrate all those different open-source initiatives and make sure that the schema of one can be fed into the Protocol expected by a different different service different systems And with that if actually you can elaborate a little bit of the role of Delta Lake and the importance of Delta Lake as an open-source Technology as we are seeing in financial services Yeah, okay, so everyone. Oh, I'm too short for this. Um, so my name is Ashley trainer I'm a senior solution architect here at Databricks and before I get into Delta I first want to just give a little background on Databricks So Databricks is a data and AI company We were founded by the creators of the open-source tools spark Delta and ML flow and the Databricks lake house Platform really aims to bring the best of a data warehouse with the best of a data lake So the reliability the performance the governance the strong governance of a warehouse But with the flexibility and the openness and the machine learning support of a data lake And today we have over a thousand customers in financial services who are using the Databricks lake house to power their data use cases The foundation of the Databricks lake house is Delta Lake And Delta Lake is an open-source table format that brings reliability and performance to data That's directly in your data lake instead of putting it into a data warehouse There's over a hundred and ninety contributors from 70 different organizations And actually as of this summer Databricks announced that all of the Delta Lake APIs We're actually going to be open-sourced to make sure that no matter who's using Delta and where you're using it You're getting all of the functionality and performance that comes with Delta Delta Lake Now that includes things like z-ordering and change data capture and dynamic partition dropping all that kind of stuff and So when you think about and one other thing and Databricks also ensured that all things going forward will be open-source in Delta Lake And it's that openness the insurance that everything's going to be open in Delta that really makes Delta an integral part of your Strategic decisioning when you think about your tech stack in data moving forward and it's not let's replace our old technology with Delta Let's bring in this open-source tech It's actually harmonizing the technology that you've decided on by the underpinning of these open-source technologies like Delta So what we're going to demo today what we're going to hopefully show in a second if Anton can switch the thing one more time is Bringing legend modeling together from Goldman Sachs and more fears transformation of business logic into spark as an engine to actually tie these two technologies together To actually do LCR So let's let's see here. You scroll the other way than me, right? Alright, so here we go So the first thing we want to show so the model that we saw earlier in legend that model has been Okay Created into a jar file which been added as a dependency to our Databricks spark cluster. Oh wonderful So what you can actually see here is that we have all of the entities of this legend model available to us in this Databricks Notebook that is full access to a Databricks spark cluster So we see here are the entities the services The functions that were predefined and these are all actually going to be powered by Spark and Delta on the back end So this will allow us to do things like create data models in Delta tables execute queries all powered by spark So we'll see here This is the actual service that we were looking at earlier And so really simply and programmatically we can create this spark data frame called inflows by executing this legend query To get inflows with buckets so under the hood what's happening here is spark jobs are kicking off We're reading data from those Delta tables and we're bringing in this data frame that complies with the input That's required for our Morpher transformations But we haven't actually had to write any of that as a user that's all been abstracted away in this legend model So now we have that data frame. We're going to pass it off to Morpher So Morpher can actually take that business logic that we saw earlier and Transform it into Scala spark code, right so that we can run it on a spark engine And so that's exactly what's happening here And we've actually compiled that spark code into a second jar that we've added to this cluster as well So with a simple line of code we're able to transform that initial data frame Into our final value calculations here again with no joins no data sourcing no Understanding the logic simply just applying this model to the data frame So now that we have our results we want to persist this report somewhere so we actually go ahead and persist it into a Delta table and that gives us a lot of additional Functionality so one of the things that Delta does really well is gives us full basically DML and DDL support on top of What our compressed parquet files really? so we're able to Do things like evolve this report schema over time We're able to do things like add access controls on top of this report and one other thing That's really interesting is Delta has this concept of what we call time travel which is basically the ability to every time we update or Overwrite or delete this table. We record that transaction as a separate version of this Delta table So what we see here when we describe this table is this table actually has three versions The most up-to-date version is whatever we wrote last But if for some reason we need to audit this report as of a different point in time either because it was updated or maybe it was deleted We can actually go ahead and query that really simply using either a timestamp or a version number Just like this in SQL so we can actually see this report here as of December 2nd You can also query it by the version number itself The last thing I'll say about putting this report into Delta is it also gives us the ability to use Another open-source tool called Delta sharing in order to provide this report out to Other users who may need access to this report But we don't want to actually copy this report to another location or maintain data pipelines to deliver this Potential for data stillness all this kind of bad stuff that comes with data delivery So Delta Delta sharing is another open-source tool that Databricks open sourced in 2021. It is an open Protocol for data sharing that allows for real-time sharing of even large data sets Without ever having to copy it or move it to another location And the really powerful thing about Delta sharing and that it's open source is that because it's open source It can be integrated into all of the tools that you see in your toolkit today tools like spark and pandas and power BI and Java and Databricks of course all have integrations to Delta sharing so without copying this report at all We're able to create a recipient create a share Put this table right in that Delta share and then grant them access to this data without ever taking it in this case Out of our S3 bucket Yeah, so this is really really great right because first what we've seen is that full reproducibility reproducibility aspect of Delta enforcing that reproducibility of your data But because we version controlled all the morphia logic unit-tested that logic and because we can compile and automatically deploy that logic into your underlying environment and Same with legend. We have all the evidence of the Reusability the reproducibility and audit requirements from a code from a data from a data model perspective So we are in full kind of Comfortable position to know exactly that report at that specific point in time But one actually just showed is really critical to enable that last mile that delivery whether it's from the back of his Back to the front of his or whether it's from a An organization to other organizations elsewhere Regardless of the underlying technology so you mentioned Excel which is a good example When people are using Excel on a day-to-day basis and to use that information usually we will be sending that data over Someone will be responsible for bringing that data coding that into a database and then exporting that to excel here We're not even transferring that information. We're pointing Excel directly to where your data sits With that specific version that we agreed. So I think we that that concludes the demo that we wanted to do Again coming back to T1 settlements. It's really about automation and automation means Protocols and standard and interoperability of code of data of models So those capabilities are generic enough to adapt to all those different Environments and different systems that you may have so this is what will drive that automation So we want to thank you all for your time today If you have any question, please feel free to Visitors downstairs at the booth will answer any question that you may have if you want more information about all those Those great initiatives So I think the best bet is to go at the Phinos website and all the Phinos GitHub Organization where you'll find legend you will find more fear you will find that legend Delta Contribution that we announced last year as a labs project. That is no a top-level Phinos Project, and you will see all those different Announcement and partnership moving forward on those models and calculations Thank you very much for your time today Fantastic panel