 Thanks everyone for coming. So, I'm Richosa. I'm the head of data engineering at Dojo. So, I'm responsible for setting the technical and strategic direction of how we process and manage data at scale. Hi, I'm Sandeep, I'm a Dario platform engineering lead. I'm responsible for architecting and executing data mesh, architecture by building self-serve data platforms in Dojo. So, you probably haven't heard of Dojo, but were one of the largest fintechs in Europe by net revenue. We power around 1 in 10 face to face transactions in the UK every day. Mainly for the experience economy. These are bars, pubs, restaurants, maybe even your corner shop. We also have a consumer facing app that allows you to virtually join the queue for high street restaurants, things like that for up to two kilometres away. Dojo was born around two years ago. It was the move as a business from being an independent sales organisation. mae yn ei ddim yn rhan o'r cyfrifio'r ydyn nhw, o'r sgwrs 8-10 oed, o'r cyfrifio'r ymdaint, ac oes i'r cyfrifio y Dyn nhw, mewn cyfrifio'r cyfrifio'r cyfrifio. Felly, gan y peth yw, yna'r gweithio'r cyfrifio ychydig ymdill yn y gwaith. Felly, gallwch chi wedi cael ei wneud i ddweud yn y ddeddlu i ddwy'r cyfrifio yma, bach y gallwch chi'n cael ei ddweud, felly rydw i'n gallu ei ddweud ar hyn, because it will help frame some of the challenges that we've had to overcome on the data platform side. So the first thing that happens when you tap your card on the card machine is that it takes details from the card and sends these to the authorization gateway. This is something that we have running in the public cloud. This is then decrypted by a hardware security module using keys provided by the card schemes. a ble mae'r cwun arwain yn y gwrthoedd y gallan cael y taith. Ac mae'r aelod yma yn y bwysig y cwrdd. A cael bod y Cymraeg yna yn I learningu ymlaen, iddyn nhw'n ein maes, yn Wythydd Am Sigwr. Mae'n ddau yw'r gwrthodau yw'r cyffredig, ac mae'n gydy'r cyffredig y taith, ddim oes os yma cwysbeth ar gyffredig, a'r cyffredig sy'n adnidog o'r cyffredig yw'n cyffredig sy'n gweld. A yna'n gweithio'r cyffredig, Mae'n hyn o'r sydd yn cyfnod ar y cadw ddau sydd yn gyrddio'r ddweud a'r gweithio o'r ddweud ac y gallw'r ddweud ychydig sydd yn ychydig. Mae'n ddweud o ddweud am ychydig yn y blynyddu i gweithio 500-600 miliwn. Mae'n ddweud yr unrhyw unrhyw o'r proses yn debyg o ddweud. Mae'n ddweud i gydig i'r ddweud yn gweithio'r ddweud. Oherwydd yn ddweud, rydym yn gweithio'r ddweud. sy'n meddwl ei hunain ac felly ymgynnu rydych chi'n gweithio'r corffwyr lleiol cyntafol ac mae'r cyhoedd, cîn allan yn cael ei wahanol i'i meddwl, sy'n meddwl i'r pantol iawn a'i mewn ffind. Rwy'n meddwl ei wahanol nhw'n mynd athaeth geario a'r gwaith ffint. GŴtydd bwrdd nesaf wedi gweldio eu ddeudol i'r mewn architectiol ac mae'n meddwl i gwell o'r cyfrif Jesgrinid. O'r ffordd, mae'n cyfrifio'r cyfrif sy'n meddwl i'r mewn architectiol ac yw ychwanegwch nifer 3 o'r pilarau yn y cyfnodau ac yma. Gwysbeth yna'r llwyshau cyfnoddol. Mae'r gweithgloedd, ymhygwmpuol ac'r hynny'r cyfnoddol. Felly, yna'r gweithgloedd i'r gweithgloedd, mae at dojo gyda'r cyfnoddol yn hynny yn ei gweithgloedd, mae'n effigio mae'r cyfnoddol. Mae'n gweithgloedd â'r FCA, Rhywodraeth Cymdeithasol. Felly, sy'n meddwl yn gwneud o Sutill-gwaith Fyglathau i'w gwirioneddau. Mae uneddyn nhw'n gwneud yn cynnig, y gallwn y gallwn gwneud yn amddangos, ac mae'n gwaith y ddechrau a gael gwirioneddau ac mae'n gweithio ddweud yn eich cyfnod o bachol iawn iawn i gyd broses ymddangos. Yn gynghori, mae'n digwydd yma y nifer mlynedd ac mae'n gwneud, a wedi cyfnoddiech chi'n gwisio, mwyno ychydig i walch y pwg yn cerdyn ac y gallu gweithio'r clywed yn y cyfweld mewn erbyn yn yr hynod yw'r gweithio'r gwahanol felly mae'r cyfweld â'r hwf-penny ei fyddion, felly mae'r mynd ei gweithio yw'r £100. Ynna'n lle gydag mwy o PCI DSS. Felly, ac rwy'n meddwl i chi'n meddwl y rôl ar hyn o'r nyfoddiant, mae'n sgwrdd i'r hoffi'r cyfrifiadau sy'n edrych bod ym mwy o'r gwahanol. ond mae'n meddwl i'r creu'r cyffredinol, ac mae'n dweud ymgyrch yn ymdill yn y ffordd i'r ymdill ac yn ymgyrch yn ymgyrch yn ymdill yma. Yn y dechrau ar y cyfleidio cyfleidio. Felly, mae'n cyfleidio cyfleidio sy'n gwneud o'r wych, o'r ffordd o'r ffordd o'r ffordd o'r ffordd o'r cyffredinol. A'r ffordd o'r ffordd o'r ffordd o ffordd o'r ffordd o'r ffordd o'r ffordd o'r ddaeth cyd-ynghylch yn y blynedd yn gyfrannu â'i cyfnodau. I hynny mae'n bwysig o'r ffyrdd o'r FCA. Felly mae'n cael ei cyfeirio iawn i'r ffyrddau o'r pethau a'r rhaid o'r ffordd a'r ddaeth yn gweithio yn y ddau i'r hwnnw. Mae yna'r holl ffyrdd yma yn y dda. Felly yn y ddaw'r holl ddefnyddio ffyrdd yma yn y DU. Mae'n ddau ddau ddau i ffyrdd o'r holl, wrth gwrs, mae hynny yn hoffa'r proses o'r ddaf yn y gweithgol yn y gweithio arlaed. A fyddwn ni'n meddwl i'w ddechrau'r bulwch ychydig sy'n hawddol i'r hyn o'r gweithio'r gweithio. Ond yna'n ymlaen i'r ysgol yn y gweithio'r ysgol, ymlaen i'r cyhoedd ymlaen i'r cyffredinol. A rydyn ni'n ffrindio ymlaen i'r gweithio ar y station ymlaen i Newketer Power, ond nid yw'r unrhyw. Yn ystafell, mae'n yn cyfleu'r tynnu decisions heb. Yn ystafell, mae efallai ysloes gan y cyfrifroedd yma gwybod cael ei wath ar yllaw'r epicyniad. Roedd y gallwn i'n deall yn ei wneud maed o dypa ar y ffrifreidau yn ei drafod ar�a'r cardfod wedi fod yn y rhaid, sy'n ddangos gylio'r amser o muniol meddwl ar penlawn lle iawn o gyfrifryd yn gweithau ar y ffrifreidau. Yn ystafell, mae'n amser, yw'r prosesing ac mae 30 ffifreidau o ffaf. ac yn y cyfnod o'r ffordd. A'r gwleidydd yw'r wahanol i gyda'r gyfnod ddiddordeb. Dw i'n bwysig gyda'r cyfnod o'r ffordd. Felly mae'r cyfnod o'r ffordd yw'r llaw o'r FFF, oherwydd mae'r FFF o'r llaw LL o'r FFF. Mae hi'n defnyddio'r ffordd o'r ffordd. A'r wneud y gweithredu yn ychydig. Mae'r ffordd o'r ffordd o'r ffordd. Felly mae o'r llaw o'r ffordd o'r Llywodraeth yw'r ysgol iawn. this complexity and transformed these files into a consistent file format. We decided on Avro for various reasons we're going to in a minute. So Sandy, it's now going to go through the platform offering that we got to and various other things that we had to consider. Thanks Richard. So platform offering, platform as a service was the best way to go forward to solve this problem. So we built a file processing platform which was self serve, fault tolerant, and highly scalable. We divided the platform into four components. First one is connectors, which is responsible for to get the data from multiple different external data points. Then we have PCI platform, which is responsible for masking the credit card information and then sending that data to non PCI platform, which is responsible for validating that data and doing the transformation and generating the chunked ever files. Then we have destination, which is to stream the data into data warehouse like Snowflake, Google BigQuery, or any other external cloud storage. Let's jump right into connectors. Connectors, we have the main important piece of software which we are running in connectors is ArcLone. It's an open source software which is really, really good in terms of syncing files between two target and a destination. Most of the workload is files, and that's why it works really good for us. We also built a custom webhook platform, which is to get data from webhook posting. This is also the part on Kubernetes, and this is also the part on Kubernetes. The third one is serverless functions. So this part is to get the data from APIs. So sometimes we have to download reports from any API, from any kind of external vendor, and to have to process that multiple times a day. So these functions are doing that. We also have a vendor which sends the data into Gmail attachments. So we actually have to crawl that Gmail account and go to that attachment, download that attachment, and send it to the object storage. We use these connectors to send the data to PCI platform and non-PCI platform. But before jumping into PCI platform, let's just talk a bit about PCI compliance. PCI compliance is a set of standards, which are to set there to protect the credit card information for a card holder. When you're building a PCI platform, PCI compliance brings its own challenges. There are three key points which we had to really, really carefully look into it while we were designing the PCI platform. One was that the credit card information has to be transmitted in a secure channel. That means it has to be encrypted in transit and the channel has to be not public. The second thing was strong encryption. So you cannot store clear pan information just like that. It has to be encrypted or masked. And the third one was that the platform has to be audited every year for security reasons and you have to comply to those set of standards and make sure that your platform is running through those standards. So we created an application which process the input file and mask the credit card information and the application workflow is something like that. So you get a file, you decrypt the file and then you encrypt the file with your own keys and put that into archives and that is for auditing purposes. The next part is we can also have files coming in a zip. So we can have 10,000 files in a zip file or we can have just files. So if it's a zip file, it might contain non-PCI and PCI files both together. So if we open the zip file, we check whether this file contains pans or not. If it contains pans, we do the masking. If it doesn't contain pans, we straight send it to non-PCI. And at the end of the day, this file outputs into a non-PCI bucket. Now, this is how it looks like in end to end. So we have deployed all of our applications and workloads in Kubernetes and the connector part, we are only using RClone in this PCI scope because most of our data is coming from files and all of these files are in PCI scope. And this runs as an RClone Kubernetes jobs, Chrome jobs, sorry. And they are like every five minutes, every two minutes based on the need and they sync these external data points or buckets or SAP servers to this object storage. The moment the file starts getting created here, there will be a file creation events into the queue. So in our case, we are deployed in on GCP Google Cloud. So we use PubSub as a queue. And we have all the events of file creation start getting appended into the queue. Then we have deployed our master service, which is a PCI service, which we just talked about. And that is subscribed to this queue. And it's reading one event at a time from the queue. And the event is like file creation event, the metadata information like file name, where the file is that information. And then it opens the file, it performed the masking, it sends the mask file to the object storage. And also it archives the files. As you can see, we also use HPA here because we have very strict SLAs. And we have sometimes we have like 250,000 files at any given point in time. And we have to process that as soon as possible. So for that, we are using horizontal port scaling and cluster auto scaling to hand in hand to reach to reach the scaling of the platform. And scaling comes with its own challenges. And one of the biggest challenge, and it's a very small thing, but it it was very, very difficult to to come to to a point where we can decide, okay, what should be the quest amount of a resource like CPU and memory? What should be the maximum we should give to a pod? And also, like how many ports we have to run to in terms of to process like say 250,000 files? And what should be the cluster capacity? Like how should how many number of nodes we have to run? We cannot just let it go run all the time. So we had to we had to do a lot of try and error here to come to decide auto scaling configuration. There's a bit more about auto scaling here is like there are two types of auto scaling. One is vertical, and one is horizontal. Vertical auto scaling is when you just add more resources to a pod, give more juice to the pod saying, okay, you know, if you are throttling, I'll give you more CPU. If you need more memory, I'll give you more memory. And that's vertical scaling. But horizontal scaling is to add more workers, more ports within the same deployment. So based on that, if you want to increase the throughput and based on the traffic, you can increase the number of pods and they will all share the workload. And we're using HPA. HPA can be triggered by three different types of metrics. Let's say resource usage. So for example, if your CPU and memory is going high, the pod is using a lot of CPU, a lot of memory. And you say, after 90%, I want another pod in the work in my deployment. So you can trigger like that. You can have custom metrics which is within the Kubernetes. It can be anything. You can have external custom metrics, which is like a PubSub queue. And we're using PubSub queue to scale our platform. And this is how it looks like in end to end and how it looks like how we have configured. So as you can see, this is again object storage. You're getting your events here. And then you have HPA configured. And the way we have configured HPA is number of un-delivered message or not consumed message by the service or by the deployment divided by four is equal to pods required. So it's easy, but we have sometimes 250,000 files to process and we cannot have thousands and thousands of pods running. So we also have a feature here to set a hard limit. For a particular deployment, you can have, let's say, 700 pods maximum. So it can scale up to 700, but with this formula. So we are scaling the pods. We are asking Kubernetes to run more and more pods for us, but we also need to scale the cluster. We cannot have a cluster running with 200 nodes. So it has to be also autoscales. And we use cluster autoscaler for that. A cluster autoscaler is a tool that automatically adjusts the size of Kubernetes cluster by scaling up or it scales up or scales down by adding number of nodes or removing number of nodes based on two criteria or two conditions. One is if there are pods in the pending state, as you can see, if it doesn't find a place for this pod on a particular node, it will spawn a new node and put that pod there. Also, if a node is being underutilized, that means, let's say there is only one pod running on one node and after some time it will remove that pod, put that pod back into some other node, and delete that node. This very one, one, one very important thing here is that if you have started your pod with a very few resources and all of a sudden your cluster is on full capacity and your pod is running and your pod wants more memory or more CPU, that that time cluster autoscaler is not going to help you. That's why you have to be carefully configured the initial amount of memory and CPU when you're running your pods. We have talked about scaling, we have talked about challenges of scaling PCI platform and now it's non PCI platform. Non PCI platform uses mostly all the connectors because it can literally process anything, any type of file. It is deployed on the same principle, object storage, file events, and the same deployment, same HPA. The main job of the non PCI platform is to translate these files into every chunked files with the help of schema registry and configuration. Now Richard is going to explain a bit more into detail how we have set up that configuration to process and translate these files into every files. Thanks Andy. So as you were saying, we have to process a lot of files and therefore we need a source of truth about the state of a file and exactly what it means. So the schema registry is a critical component of this entire architecture. So in the PCI environment, all we were doing is purely masking that file in its native file format without any further transformation. That all changes when you get to non PCI environment where there's a need to convert this file from its raw file format into something that's consistent. As we said before, that's Avro. So at a high level, the schema registry contains metadata for that specific version of the file that's received by the platform and the instructions on how to process it. So you could therefore say we're using the term schema registry in quite a loose way. It's also the config and the instructions as well, along with the schema for the output of that file. So it outlines not only that file but the specific version of the file and where it sits in the lifecycle. So how the schema revolves over time, any transformation that's required in flight and more important when we expect to receive it. So it's our central source of truth on the end-to-end lifecycle and it's utilised by multiple components of the platform. So it gets exported out to Data Hub, it's used by the monitoring service and it's also used by the destination connectors as well. So how does this work in practice? So the first thing you have to appreciate is, Sandy was saying, we process over a quarter of a million files a day, but underneath that is that there's 450 to 500 different versions of those files in production at any one time. So we can't be sure that we're going to receive the same file every single day. So as you can see here, we may receive version one, version two, version three on subsequent days and that's fine. But then we may receive version one the day after that and that's a problem because it's a breaking change and because we deal with multiple external providers, they all have differing maturities in regards to data engineering and the monitoring processes that they've got on their export processes. So it could be that they miss some data in a file, or it could be that it's quite a while and they need to, you know, upsert some data, that type of thing. So we had to consider, you know, three things. So firstly, logic to determine the version of the schema, version of the file that we're receiving and the fact that we have no control, you know, over the quality, you know, of the data that we're receiving. And there's no consistent file naming taxonomy here. And what that means is that the version is not specified in the file name. We need logic to look inside the file to work out exactly which version we are receiving. Second is the ability for auto schema migration and evolution. So we need to ensure there's always backwards compatibility. And this is important for our analytics engineering teams because it means that then there is not 20 different versions of that table in the data warehouse. We minimise it to as few as possible. And adding a column to a table, for example, doesn't result in a brand new table, you know, being generated that then has to be union to all subsequent previous versions in the data warehouse. And that gets very expensive. It's not managed properly. And the third thing is very much on the self service nature. So if you go back to the data mesh principles that we started off the session with, you know, it really is critical to scaling this type of operation. And we don't want data engineering expertise to be required just to make a simple change, you know, to an existing schema, or even, you know, to import a very simple, you know, new Excel file that's been received. So the schema registry, in essence, just a collection of JSON and, you know, YAML files, we're a fast API front end. So the fact that the schema registry is completely isolated from the core platform code base allows, you know, the processing of new versions of files without having to modify that core code base. And it means that we can open that up a bit wider. So going back to the data mesh principles, this is, you know, also touches on, you know, the federated computational governance concerns. And it shows that, you know, we're providing that capability to teams that make it across the entire business. So jumping into the file, what do we have here? So the first section is very much metadata. So you have the schema version, the schema name and your description, you know, what is file actually used for as a business? The second section is very much, you know, where are we going to receive this file? So it could be in multiple buckets, as Sandy touched on before. And then we've got the parser that we want to use to actually process that file. So it could be, you know, your standard CSV parser, your fixed width parser. But what we've got on the screen here is the custom parser that we've developed for a file type that's specific to visa, the card scheme. We've then got the capability to conditionally process a file. And this is really important because we have some very complex edge cases. So we have one particular file that we receive that has millions and millions and millions of rows. And the first three characters of every single row donate the schema that we need to use to pass that row. And we then basically embark on a one to many relationship. So we take all of the rows of that specific schema, put them into one data set. And therefore it means that our downstream consumers data have a nice clean data set to work with. And they don't have to worry about any of this complexity in terms of, you know, the upstream raw files that have been received. Then we go on to the processing side. So it's a rule of thumb. We try and limit the processing, you know, or the transformation to be as minimal as possible. So here what we're doing is this could range from just simply renaming columns. So you could receive them a snake case and convert it to camel case, for example. Or it could be, you know, the requirement to add new columns. So we refer to these as dynamic columns. So we have certain providers that will, you know, send us dates and times as separate columns without a time zone. So what we can do here is take these two columns, add the time zone to it and output a time zone aware time stamp, which means by the time it hits the data lake and the data warehouse, we don't have to think about those types of things anymore. We then come to where we want to, you know, output the files after we've done this processing. So what you can see here is the output bucket with dynamic file names based on input parameters from the raw file. And also on the data warehouse side, the ability to, you know, have different configurations for each data warehouse. So here we can export to snowflake specific data set and table. We can also export to BigQuery with a different set of parameters as well. But then you've got the schema side of the scheme of registry. And this is probably one of the most important components. So here you have a full avro schema that's defined. And this basically decouples the scheme of registry from the rest of the platform. And it means in the future, you know, we can do things completely independently. So replay all data, you know, from its raw bucket. And we can deploy, you know, additional workloads directly on top of the data lake as well, because the schema is directly, you know, bundled with the file as well. Finally, we have, you know, the governance side of it. So we have compliance flags here. So did the file originate from a PCI environment? Obviously when it gets to this stage, it won't have card details. But it's important that we can show complete lineage throughout this entire process. Does the file contain PII? So personally identifiable information that will need to be masked. And then on the monitoring side, you know, a cron tab style declaration that says when do we expect to receive this file? So it could be we, you know, by five AM every single day, or it could be on the 15th of every month at two o'clock in the afternoon. And this is really important to keep on top of given the number of files that we're processing. So attach this, we've got the minimum and maximum number of files that we expect to receive. So it could be a file for every single customer, which is a very large number. Or it could be we just expect to receive one file. And that therefore informs the monitoring service, which allows us to highlight any gaps in data before basically they hit the downstream teams. So there's also a breach time attached to this, which is triggered off the schedule. And the SLA. So it could be it's a critical file that's, you know, in essence, you know, related to our safeguarding procedures. And we need to wake up a member of the team at two o'clock in the morning to investigate that further. Or it could be that, you know, we don't, we haven't received fulfilment files from a warehouse that's dispatching carb machines. It's important, but it's something that can be done in business hours. And therefore this provides the end user with the ability to specify this in a central place without having to get data engineering involved. There's also a final component we haven't listed on the slide, which is the data validation component. And this allows us to validate files as far upstream as possible. So for example, checking that a column is a primary key. The expected range of a column is within certain parameters, that type of thing. Because we're operating in a data mesh architecture is really important. We catch this as far upstream as possible, and we prevent this dirty data entering the data mesh. So you could be thinking, well, you can do some of this in cloud functions. Why didn't you go down that route? Well, it's for two main reasons. So we started off with cloud functions, and some components are still run on cloud functions, but it doesn't scale. And we realised very early on that we need to change our approach. So firstly, from the cloud agnostic point of view. So we need to ensure that we had a cloud agnostic solution, and that there's minimal vendor lock-in. You know, also this is important for business continuity reasons. So deploying a Kubernetes-based, you know, solution gives us the flexibility to deploy in any cloud without the constraints that may be applicable of the vendor provider solutions. So we also have the complexity. So as we said before, you know, we're dealing with a lot of different versions of a file in production at any one time. And therefore, we would have to deploy 450, 500 different cloud functions. You know, that's not scalable. It's something you can manage in Terraform, but then you have the additional constraints in regards to the different resources that are required for each different file type. And it's something that we really don't want the downstream consumers to have to think about. We want an agnostic solution. So it also kind of limits our self-serve capability. You know, we don't want people to have to open, you know, PRs against Terraform or this kind of thing. We want a really simple solution with a web front-end, which we do have, to allow people to ingest a new Excel file and to get the data engineering team involved. So that being said, we are exploring, you know, CNCF projects like Fission, to allow us to deploy cloud functions on top of Kubernetes, to providing us with that flexibility, you know, where it's appropriate. So something that's going to cover how we kind of manage state throughout the entire process. Thanks Richard. So as you can see, the file is going through a lot of stages, a lot of journeys like connectors, PCI, non-PCI, and the state management is very important when you're designing a data pipeline. It helps you to be fault-tolerant. It will help you handle the errors, and it will also support the live monitoring of the file, where the file is at this point, and it also prevent duplicate processing, because queues are prone to have duplicate data. Depends on the queue because we're using PubSub. If we don't process an event within 10 minutes, the PubSub will republish the event in a queue, and we have to reprocess it again. So state management was very important to us. Now how does that work? When the file gets created, it shoots an event, and once the event is published, it's consumed, it goes to to-do, and then after when a pod picks up that event, it checks whether the particular file is in progress, or it was processed already. If it was processed already, it moves to the next event. If it is in progress also, that also it moves to the next event. If it is successful, then your state is success. If it is failed, it shoots, it puts the state to error, and then it goes to DLQ, then we have error playbooks to deal with the messages in DLQ, or the files which are in failed state. There is also an edge case where it fails, but we want it to go back to to-do, and that is because of Kubernetes. Kubernetes pods are ephemeral. They live for a short period of time, and they can be killed at any point in time, and restarted. And because we are processing files, we want the application to reopen the file and finish the processing. And in between, if Kubernetes send a signal, so how does that work? Kubernetes will send a signal to the pod saying, I'm about to kill you right now. You have 10 minutes, do whatever is needed. So we have, we call that 10 minutes a graceful period, and in that graceful period, in those 10 minutes, we quickly, quickly do any cleanup if we have to do, and we send a warning as well within our logs, so we know there was a graceful termination is happening, and then we put the status of the file again back to to-do, and send the message back in the queue so that another pod can come and pick the message and process the file again, and then the pod dies it. Now, monitoring of the file events, or the journey of the file, is also very crucial to, because we are processing a lot of files, and as Richard said that we have a data mesh architecture, that means domains, owners can actually, actually can see the processing of the file, where the files are, if the files are failed, if the files are processed, things like that, so to do that, all these events are being transmitted or published into a metric store, and that metric store is used by a file monitoring service, and this file monitoring service actually used the configuration which Richard just explained in the schema registry, the monitoring configuration. Based on that, it performs the aggregations, it performs, it validates whether it needs to send alert to Slack message, or it's a pager duty alert, or it needs to send all the aggregated information to Gryffana Cloud, where we can have dashboards to do a bit more, you know, heavy debugging if the issues happens. And this is how it looks like in end-to-end file monitoring. You can see that there is a file type which is called SVXP, and see that stage one, the file was in PCI, stage two, file process in PCI, stage three file received in MDM, MDM is non PCI, and then stage four schema validation and transformation failed. So here it failed at this stage, and the person who is on support or the domain owner or as a data platform team exactly know where the failure happened. We have error playbooks written for every single stage. Generally 90% of the failures can be solved with those error playbooks. If not, we can just open the file, see what's the issue, and then we work from there. And handling replace is also very important for the platform, because there's so many use cases where we want to reprocess the file. We missed a column to process in a file, and now we need it. So we have to process, let's say, last six months of data to get that column. Or errors are happening, and we have to reprocess the file to make sure that the file is processed successfully. So we have created a command line utility which actually divided into three parts. First part is initialization in which you set which bucket to target or which file name to target or which BigQuery table, which environment, et cetera. And then you have the cleanup where you list everything and you delete the data. So you delete the output avro files. You delete the BigQuery data if you need to. And then you say, okay, and also the state. State, you need to delete the state. And then you say replay the file. And replay the file is not actually moving the data back again from the source. You're just actually sending a mocked event in the queue again. And when the mocked event comes in the queue, the platform thinks it's a new file and it just processes the file again. We're also thinking of putting a UI on top of it, so it becomes a nice feature for a self-circ capability of the data platform. The last bit of the platform side of things is the infrastructure observity. As we have everything deployed in Kubernetes, a lot of pods and a lot of workloads are running, a lot of services are running, we need to make sure that we monitor that from the infrastructure side. So we take the metrics from Kubernetes through Prometheus, we put the metrics into Grafana Clal. Then we have AlertOps, which is an alert manager, which actually we have configured some alerts here, like if the pod is not healthy, if the node is not healthy, if something is wrong, send us a page of duty or a Slack alert. And that's pretty much it from the platform side. Now Richard is going to go through the summary of the features and everything which we provide in the data platform. Cool, thanks Anit. So let's quickly go over what we've covered today. So platform as a whole is the ability to process from all your common sources, so this is your object storage or even that one provider that can only send you data by email, even in the body of the email. We have the ability to pass those different file formats with ease and we've abstracted away all that complexity. We have the ability to dynamically manipulate that file as is required, so this is your dates and your times to a time zone aware timestamp, for example. The schema registry is completely isolated from the rest of the platform offering to open up those self-serve capabilities to as wide an audience as possible. The whole platform is fully event-based, so the only scheduling that is mentioned is obviously for the monitoring components, but it doesn't matter if your files arrive one or two minutes past the deadline, they're going to be processed as soon as possible. There's also a high concurrency side, so we will lastically scale as a result for the different business requirements in terms of the transaction volume. That's an increased amount of data that we're going to receive. That's all abstracted away. No one shall have to think about it. On the governance side, we've got full monitoring alerting, so complete data lineage from source all the way through to that data going through the data lake and then into the data warehouse. That's really important, especially when you're running a FinTech. We need to ensure that everything is auditable. We've got data validation so we can check that the data is correct before it even hits our data lake. Really important when you're running a data mesh architecture that we're doing here. We also got full regression testing and test coverage across the entire platform. This is important to ensure that we're confident with the changes that we're pushing into production and that any changes in regards to transformation is not going to result in bad data coming out. The other end that's then copied across the entire data mesh. Of course, as Sandy mentioned, it's PCI DSS, level one compliant, so we can process that card detail as well. So as we've covered, you can probably see that we were built upon the contributions of the open source community. And therefore, we're constantly exploring ways that we could give back to the community and it's ultimately why we're standing here today. So one of the things that we are exploring is maybe in the future the possibility of open sourcing at the platform that we've just spoken through. But what we want to do is get expressions of interest to see if this is a viable option. So if it's something in the future we did embark on, that there is a market for this moving forward. So you can call code at the top left of the screen there. We'll take you to a quick form. If it is something of interest, please do fill it out. Alternatively, do reach out to us on LinkedIn. We're happy to take any questions. Thank you.