 Hello. Thank you all for coming. So the title of this talk is Data Platforms Made Easy, an introduction to the track data and analytics platform, and it's going to kind of be in those two sections. So we'll start looking theoretically at how to build an analytic process in a controlled environment like a bank, and a landscape of those processes, while eliminating a lot of the complexity and challenges that we typically see in that space. And then we're going to have a look at track, which is an analytic platform built on that pattern. So a quick bit of background about track and how it came to be. So it started as a project for a client who wanted a new risk platform. This was one of the UK's major retail banks, and their first use case they put about 300 billion of assets onto the platform. But we very quickly realised that the platform would have a much broader applicability both within the bank, but also across the industry and potentially other industries as well. So we decided to make an open source version of the platform, which we did, and we released that on GitHub under Accenture's GitHub organisation. And last year at the FINOS conference, we started talking to FINOS about coming into the foundation, and we're very happy at the start of this year to have become a contributor, and here we are today. So, yeah, that's the history. So we'll have a look then at the theory of how we're going to build a platform, and I'd like to just quickly run through what an analytic process looks like, so we know what we're trying to build. So on the left here, you can see a bank. Maybe they've got some retail and investment products, and they're collecting data from the operation of all those products into a number of collated data stores. So there might also there be market data, there might be scenario data, reference data. We don't really care about where all that's coming from. We just need to know that we've got some data that we can use, and we want to produce some outputs that are going to go to various people, regulators, board, analysts, whoever. And really what we want to do is fill in that gap in the middle in a very general way where we just say we've got some inputs and we do some analytics and we produce some outputs. And for all the complexity that we see in that space at its fundamental level, that's really all we're doing. So let's start filling it in. We're going to start with some models, and we're not saying what these are, we're just saying we've got some models, they've got inputs, outputs, and parameters, and we're connecting them together in some kind of flow or data pipeline. And again, we don't say what that looks like, just that models feed into each other until we get the result we want. And then we've got some data, so this could be results that are produced, it could be intermediate data. And again, we're not saying what this is, what shape it is, we're just saying we've got a number of data sets, each data set has a schema, and we can have as many of them as we want. We can have different versions, but we're really free to use whatever we want. Then we've got data preparation, so this is sometimes thought of separately, it might be operated separately, but essentially it's just another set of models and flows that are connected together with the difference being that the inputs are these collated data sources coming from the bank's operational systems. And then we've got a human element, so this is people looking at this process, maybe they're looking at what's coming out of the models, deciding if it makes sense, a different model, maybe they want to apply an adjustment, maybe they want to change a parameter set, run a different scenario, and those adjustments can get fed back into the pipeline, so they become a part of the overall process. We've got a governance aspect, so we need to know what we're running, we need to know that it's compliant with any bank policies, we need to be able to attest to the regulator that's been run, if there are any check boxes we need to check, then we need to make sure that those are checked, all our models are in governance and have been reviewed and so on, and then we can get a policy seal of approval to say this is all good, and then we need some sort of instruction to say of all these elements that I've got here, this is the set that I want to run to produce some final numbers that we use for whatever it is that I'm doing, and if we take those elements we've got an analytic business process, so the key thing here is that we've not said at any point what the shape of the process is in terms of the shape of the flow, the shape of the data, the particular models that are being used, it's all built from these general components that could be anything. So now we want to see, we know what we're trying to build, can we break that down into pieces and then put it back together in a way that lets us build different processes without introducing a whole load of complexity, so let's start with what the pieces are. So those are all the elements, essentially, that went into the process, and we want users of the system to be able to add these whenever they want, so if they want to add a new process they can say, well, I've got some of the data, I want to add some more, put in some new models, connect it up in a flow, maybe define a UI and there you go. But the problem with letting people do that, of course, is that you're going to end up with a big mess very quickly. So we want to see if we can maybe contain these elements to stop that mess from occurring, so we're going to introduce the concept of a box. So this has two key properties, one is that each box is self-contained and the second is that whatever we put into the box is what we're going to get back out later on, so the box is immutable in that sense. So we need a way of being able to box each of these individual elements and the way we do that is with a combination of storage and metadata. So we'll just run through them very quickly. So for data we're going to say that we have some immutable storage where we store data probably in something like Parquet files or something like that. And we also have a schema, so we know for each data set what the schema of that data set is and where we can find it. We know we put it there and it's not going to change. And then the same thing for models. Models will generally live in a repository and that could be something like GitHub if it's source code or it could be a Nexus if they're built binaries. Both of those are very helpful because they're immutable and they have their whole version history. So we've got the immutability and then all we need is some metadata that says for each model what its inputs, outputs and parameters are and that will allow us to use the model. So we need that metadata as well. And then it kind of gets a bit easier. A flow is really just a description of a data pipeline and if we describe it we can do that with metadata. We don't need physical connections so we just have some metadata that says what the shape of the flow is. A job is just an instruction to run a flow with some models and data. So again that's just a piece of metadata. An overlay can be a change either to a piece of data or it can be a small model if you're doing something like an apportionment. So we know how to put both data and models into a box. So therefore we can put overlays into a box and we just add a piece of metadata that says how that overlay is applied. A policy is just metadata as well so that can be rules, conditions, requirements for attestations and so on as well as information about how those are fulfilled. And for UI we don't really box up the UI but we can use metadata to describe what a UI needs to display and drive individual screens off of that metadata. So that allows us to put all of those pieces into a box in such a way that we can stack them up and index them and then we can use them later when we need them and plug them together to create various processes. But to make it work we're going to need a little bit of technology support so we're going to introduce some capabilities. The first one is a metadata index. So this kind of metadata might live in various different places. It might live in model repositories or something like that but we really want to have it all in one place where we can easily search across it and bring all the metadata that we need for a particular operation together without needing to go out to lots of different services which can take some time. So we put it all in one place and we collect up the metadata from all of these items. Some of them are actually just metadata so we don't need anything else at all for those. We need a storage capability so some way of abstracting data storage that we can get data in the format that we need when we need it. Probably the model repositories already exist so we can just use those but we need storage for both those things. We need some method of execution so something that's going to actually take all these pieces for a particular process and run that whole process to produce a result. We need orchestration that's going to say what to run and when and we need some form of policing so this is our application of policies where we say those are needed. So those are the capabilities that we need to be able to use these things, put them together to create a process that can produce results in a controlled way. Now what you'll notice looking at this is that we've now broken the problem down into kind of a technology component and a business component so all of those services along the top are pure technology components that can operate on any model's data so we don't need to really make changes to them when we want to extend the process and the things on the left are all things that can be business owned and don't really require any new technology capabilities when you create them so it creates a very clear separation of responsibility with technology looking after general capabilities with no business content as it were and business components that can be created, maintained and added to the platform by modelling teams or business users without really having a technology impact and that is very helpful in terms of delivering things quickly. So the last thing I just wanted to run through is what happens when we actually execute something in the platform so we start by sending an instruction saying what we want to do and we're going to record that instruction so that we've got a record of it we can recreate it later we can refer back to it then we can go and get all of the metadata that we need for that particular operation we can check it against the policy service to make sure everything is compliant if it's an in governance operation and then we come to execution so what the execution service will do is it will fetch all the boxes that it needs from storage and it will put them together into one process that will be executed so we're not running a series of individual processes or containers for each individual step we're bringing those models all into one process and running the whole thing end to end according to the instructions that we're given and that will produce some data so we can put that into our storage service record some information about it and then it's available for people to use and that is kind of it so I'm going to hand over to Greg to show you what that looks like in practice Thank you very much, Martin hopefully I'm going to smoothly transition over to a demo of what track would actually look like for an end user so you can't see my screen see if that's going to come up before that happens, so Martin's covered the core concepts of track so you're running five microservices and track has seven different object types and fundamentally those object types once they're loaded up into track become immutable and so what we're able to do then is to use those microservices to build applications for various organisations this is the example which I'm going to show you now hopefully show you now that's my thing on a different screen this will be novel okay, so I've got an example of a web application which I'm running from my local machine track's running in the background and I'm going to use the remaining time to show you a little bit of example of what track's capable of doing so one of the things you're obviously going to be able to do is to run a process and here we're talking about highly regulated models and reporting that you would find in a typical bank or a financial institution and so one of the things organisations that like that will have to do is they will have to have a data lineage for all of the data sets that they report on they'll have to be BCPS 239 compliant they will have to be able to demonstrate that they have used the right models for the calculation applied the right overlays to them and they can understand not just the model execution piece but the overall end-to-end process now typically banks find that very difficult to do what they end up with is a long chain of handoffs between different teams at the organisation each owning their own separate EUCs with a set of manual controls imposed on top of that process making it very disjointed and one of the things that track can do is because it has immutable objects the things once they're ended up don't change and because all those things have metadata associated with them it can build all of those types of controls out of the box for end-to-end user running models so typically a bank will want to categorise each of its data sets each of its models, each of its processes with business language so I'm trying to run a mortgage model and I want to calculate the impairment of that mortgage so we can use the track metadata model to catalogue all of the different objects that we load up and as a search API that means that we can search for things associated with the mortgage model and calculating impairment so we've got up here a simple example where a bank has got different divisions and we can pick personal customers and within there it's got a range of different portfolios that it has so I'm going to pick mortgages and there are different model types so you might have impairment, capital whatever regulatory models you've got and so I can go and pick the individual flows and those are one of the object types that Marty talked about and say I would want to run this process now unlike traditional banking systems where they have a production environment there is one version of the model to run and all the periphery processes that a bank has to run like model calibrations, model reporting and model monitoring all of those types of processes can be put together into one single track environment for different uses so I've got an example where I'm saying I want to run this calculation process now because track has all the metadata that it requires to understand exactly what this process is we can look at the metadata associated with this flow so some of that metadata is added automatically by track so the who, where and when and some of that metadata is added by the user when they created this object so you can see here I've got a business segment so it's saying this is a flow for calculating mortgage impairments users can add their descriptions and they can give titles the metadata model is open and ended so we can add anything we like into it but we can also use the metadata to map out the process itself so we can use the information about this flow that says what data sets that it needs what parameters it has and what models there are to run so we can actually automatically map it now I used to work in a retail bank and when audit came along or the model governance team demonstrates to us that you understand how your model has been implemented in our regulatory return you would have to go and send off an analyst or a senior analyst and say please can you go and look at what we actually do and put it all together for us and create a document that says this is all the data dictionaries we've got involved in this process this is where they sit this is all the calculations that they've said that we get out and this is where we use them and with one click of a button track can provide you with all of that information automatically and here's an example of that where we're mapping out the process flow that we're going to run so you can also see I won't go into too much detail due to time but you can see in here if you click on individual models the inputs, the outputs and the parameters that are needed by that model so I picked this flow and I can go and look at information about that flow if I want to understand more about what to run you saw in that chart there were four separate models those were the black boxes so from that flow the user interface has said I'm going to go and search for all the data sets that I know about that can be used as inputs into this process and I'm going to go and search track for all the models that are possible candidate models to run when you execute and it's built me a little user interface dashboard that allows me to select exactly what things I would like to use so here it's got these four boxes for the four models I can click on this one and I can look at the metadata for this model and if I want to I can swap out between model versions so on a traditional production system there is one version of the truth one production model and one of the consequences of that is if you want to make model change it can be incredibly slow but what organisations want to be able to do is to load up champion challenges they want to do what if analysis they want to understand waterfalls about how their financial positions will change as they make days changes or code changes and so what we do in track is to load up candidate or alternative models into the production environment to use the same data sources as the production versions and I can configure my process to run using the in-production, in-governance models or I can swap out for something I'm working on to see exactly what the impact would be so similarly below that it's gone through all the models I've selected it's looked at what parameters they have a little UI that says actually if you want to set all these parameters then go ahead and then it's also found a range of data sets for me to be able to use so some data sets which might be your account your mortgage accounts at the end of month those might come across as a batch process loaded by a support team but there are also smaller data sets that the model is tend to be in charge of where they're trying to set model parameters or economic scenarios for example that they want to use and so we have the ability to load those up by the user interface and once you've done that your model parameters become available for you to select so once you've done all of that you can go through and look at a summary of everything that you've selected is just to check that you're happy and again so here's an example where it's taking all the different model parameters just confirming what value has been set I can go back and look at the individual models that I've selected and all the information about those and so on and so forth I can check the data sets before I click go now you might notice here there's an object ID so that's kind of the core currency if you like of the metadata model so every model that's loaded up every object that's loaded up will be given a unique ID and that ID will only ever refer to one immutable object so if as you want to look at here you want to be absolutely certain exactly what this model is I can go and click to find out more information about it and see the metadata for this individual model but I can go and look at the actual individual commit or the code that that model corresponds to and that object ID will always point to this exact commit of this piece of code so I'll just show you to the bottom of the screen so I can pick look at my data sets that are going in and then you are able to allow the user to set their metadata before they click go on the job so if you want to say this is a test run this is me checking out what happens if my new parameters are used you allow the user can just enter that in it's attached to the job that's run and then if you view that job in the user interface you'll see all the metadata and who did it and kind of what the reason for running it was and if you click run then it would go so that's kind of a brief overview about how you would actually use track and what some of the benefits are of using it but I wanted to use the last few minutes if I can just to show you some of the tricks or some of the deeper possibilities there are so one of the things that you can do is if track has your entire model history loaded up and that model history can never change and if it has all the data sets that have been used in runs and those are always available never changed then you can go and load up any job that has been run look at its results and re-run it at any point in the future knowing you will get exactly the same result out so here in the UK there's a regulatory requirement on banks who do stress testing to go back and be able to run last year's stress test that then water fall through changes in the book positions changing in model calibrations changing in the scenario and banks find that requirement extremely onerous to be able to go back and rewind your systems to what it looked like last year in production and then be able to step through to demonstrate you can meet this requirement is extremely onerous but with track you can pick to view a job you can go down to the bottom of the screen here and I can click re-run job and that has now selected the exact commits that we use for that job the exact versions of the data sets and set it up with exactly the same parameters to run if I click go I get the same result which may not be that useful but if you've got a new scenario to step in I can pick that data set instead if you've got a new model that you're running now in governance you can pick that and then you can run that and so something that is traditionally quite onerous today is actually very simple in track because of the immutability of all the different objects that it has similarly if you want to you can also go back in time so track has every version of everything that's ever been loaded up so you can reset the system to look at it as it would have been at any point in history and if a bank has an operational event it wants to understand the root cause of why a particular operational event happened then one of the things that means that you can do is you go and look and see what did the user see at the particular point that they made a mistake or a particular model wasn't loaded up in the right way that is perfectly possible within track today and the final thing I wanted to show is that we can integrate the user interface with the GitHub repository that we're using here for our model development and model repository and so I can go and allow users so the front line risk modelers to go and access there in GitHub repositories I can go and pick particular branches and particular commits within those branches if necessary to be able to pick individual files or pieces of code they want to run so and I can set the metadata associated with that model right here and I can load that up into my production environment and as you saw earlier I can load that up as an alternative version of that model to run and so there's an example where the boundary between IT and model developers in a bank can fundamentally change you can allow modelers to do something traditionally which is a back end support team to load up new models to shorten that time span to be able to see what their impact will be for a financial institution so I'm out of time but hopefully that was a stop tour of the track back end and also the track front end thank you very much