 Hello everyone, and welcome back. Our next EDW session is called Data Driven Agile, Embedding Data Quality in Software Development. We're gonna be joined by two representatives of JPMorgan Chase for the presentation. Victor Bosco and Shula is Data Quality Manager and Jennifer Epoliti is the Legal Chief Data Officer. I'll remind you that all audience members are muted during the sessions, but the Q&A section on the right-hand side is a good place to leave your questions. So let's just jump right into it. Victor, Jennifer, please take it away. All right, thanks, Tony, and thanks for joining everyone. I'm Jennifer Epoliti. I'll let my bio speak for itself. I'm joined today with Victor, my Data Quality Lead, and we're gonna talk to you about how we have embedded data quality in the software development lifecycle of JPMorgan Chase legal department and are hoping to make everyone a data steward, so to speak. So Victor will walk us through the agenda and then we'll get on with it. All right, hello, my name is Victor. I work as a Data Quality Lead at JPMorgan Chase. Essentially my work consists of running different projects or initiatives to identify poor data quality and then work with the different teams to remediate that. So moving to the agenda, what we're going to cover today is four main sections. We will start with an overview of software development frameworks. Anyways, we are not going to spend a lot of time there on what the frameworks are, how they work. We will focus more on what are the challenges that due to the nature of these frameworks, the shell frameworks, like SP or Scrum, due to the shortened development cycles, it's often happened that data quality is left behind. So we will work on those challenges. Next, we will explain how we are dealing with that and we introduce our data champion role and our data-driven asset model. Second, after that, we will cover what are the concepts that we train our data champions? How do we prepare them for running this model? What data governance concept we explain? What are the design principles that we train them on? So they can better produce user stories, accept criteria, in general, improve the data quality from the beginning. And lastly, what are the benefits of doing this data-driven shell model? Now, Jennifer is going to cover an overview of this model. Okay, so I want to tell a quick story that goes back to five years ago at Enterprise Data World. I think it was in Boston. And of course, back then before COVID, we were in person and a bunch of us were talking in the hallway about the challenges of agile and how everyone's gone agile. And data just got left by the wayside. The agile process moved so quickly. It was all about quickly turning out a widget this week and a widget two weeks later and another widget. And nobody had time to think about data quality or even data management at all. And I felt that the data quality was beginning to suffer as a result in the industry in general. And we all were kind of moaning and complaining about, what do we do about this? And how do we make agile data-driven? Well, it took me five years and the right team, but I think we finally cracked it at JPMorgan Chase and Legal. We've been working in this mode with two of our applications very successfully. And we are now in the process of rolling this model out to the entire legal department, legal technology and our product owners that are working in the agile environment. So the key to this method is the data champion. We're gonna tell you a lot more about the data champions who they are, how we picked them, what they do. But the key is that they're not part of my team. These are people out there in the application teams who have been selected by their teams to be the data advocate, if you will, for their respective applications. So they're the ones who are looking after the data quality and the data governance of their respective applications. And Victor and his team at the center are more advisory coaching, stepping in when they have issues. And so it's absolutely vital for this model to work to have that data champion role. And then finally, on top of that, stopping data quality issues from happening in the first place with preventative design principles. That's something that we've developed training classes on that are mandatory for all of our developers that really talk to in things like data modeling in UI design, and even as something as simple as picking a slider versus a radio button, these are things that we train our teams in that can keep data quality issues from creeping into applications. And we're gonna share some of that with you today as well. Back to you, Victor. Right. So we're development frameworks. We have the two most known, Scrum and Extreme Programming, XP, and not going to cover all that, but the question here is, we have different roles in those teams, like Scrum Master, for the owners, developers, et cetera. But that comes to the question, who is responsible for the data quality within the application? And the answer, even though it's simple, that is, we all are, it's something missing there, but at least it was for us. We need to have one key contact or one person that is our partner or another set of eyes in all the teams, because essentially we cannot be everywhere. So we need to train those partners to have a data mindset and look for the things that we as data practitioners, we will be looking at. But we need to train those people to do that. And that's how we introduce the data champion role that Jennifer will cover now. So first and foremost, we think of the data champion as almost a mini chief data officer for an application, which means anything that I'm worried about, anything that keeps me up at night, should be keeping them up at night for their applications. So data sourcing, data use, data classification, privacy, quality, all of that is the responsibility of the data champion, but the primary focus is the data quality. So before I get into what these people do, let me talk a little bit about where they came from. We're fortunate in LIDL in that we have a legal technology team and their parallel universe in the Office of the General Counsel who are all people who are basically product owners. So they're the ones that face off with the technology and the business and bring the business requirements to technology and work on prioritization and user experience and so on. So we have basically business and technology representation for every application. So what we did is in classic agile fashion, they are a self-formed team, so we let them decide who their data champion should be. We ask that that person be either, they don't need to have the ability to do things like write a SQL query and pull data themselves, but if they can't do that themselves, they need to be able to ask someone to do it for them and to know enough about data and data modeling and data structures to know what to ask for. So that we let them choose, do you want your data champion to be a technology, a developer, or do you want it to be somebody from the product team? Once that person is identified, we had already trained the entire team, the whole group of the product owners and the technology teams on the data, embedding data quality into the application lifecycle. That was a training that Victor developed and trained everyone on. So once we had the data champions identified, that's where you start to look at these broader responsibilities. The first one is just to know what is a data project or a data story and what is thought. So everything that the team is doing, that the developers are doing needs to be assessed. Is it data or not? Is data heavy or not? So if data is being created, consumed, supplied, modified in that story, that's a data story. We also asked them to be aware of any data privacy concerns, any data use concerns that we might need to take to our forums or councils. And then we wanna make sure that the requirements coming to business are clear. And in some cases, we asked them to fill out a request template, which is a simple Excel spreadsheet, which asks questions about any new data that's being requested to be added to the application or modifications. So things like, is there a set of valid values we can use for it? Are there data quality checks that we should be doing on it? Where should we be getting it? What should the label be over the field and so on? We then asked them to be responsible for what we call enhanced data governance in the application. So that's for our higher risk or higher criticality data. We look at where are they getting the data? If they're creating data, is that necessary or is it already available somewhere else? Again, are they following those preventative data quality measures in the application design? Is the data sourced from the right place? If the data is new, do we know what the data quality requirements are for that data? And generally just being responsible for elevating the level of data quality in the application, identifying those data stories, making sure that the data stories are given an extra layer of testing and data validations that wouldn't normally be done beyond just UAT. And then making sure that those data stories have acceptance criteria specifically around data. So the acceptance criteria can't just be, the field is populated, it has to be the field is populated with one of the valid values or what have you. So that's the approach that we took. We're now gonna get into a sample use case, how we put this into practice. All right, in here what we can see is a sample process flow that can help you to understand and see how the data champion is interacting with the different groups or different parts of the process. So it always start with the spring planning. As the future team is working on determining what should be included in the spring or upcoming sprint, part of the data champion role is to review those stories and make sure that the right questions are being asked. Challenge the requirement to make sure that nothing is forgot about data impacts or something that can lead to data problems in the future. So essentially data champion is reviewing the stories, challenging what is there and then ensuring that there is a data source evaluation and make sure that the data is coming from the right place and there is an operating model for that as well because it's very easy for the end user to request data. Okay, we need to add this new field or this something else but how is that going to be updated? Who is going to be being accountable for that and taking the responsibility to keep the data updated and make sure that it's good. So all that is part of the data champion. Make sure that the data management, governance aspect of that requirement is covered. Make sure that the data consumers are engaged. Who is going to use this data in the future? How is it going to be used? What kind of validations are done? And lastly, make sure that our enterprise data catalogs or the documentation is updated. We are going to cover a little bit more about what tools we have and we train for data lineage and all that kind of things but that's part of the data champion. Once we start on the screen review, let's say that the requirements are ready to be tested in QA or UAT, the data champion needs to make sure that specific validations are done in the data like data profiling, validating data loads. And this is also not replacing unit testing or normal QA, it's adding an extra validation, extra layer of validation. And you will see that it's helpful because sometimes in UAT, you don't have maybe good enough data or it's productive or it's not covering all the edge cases and some things are missing. So part of the data champion is to make sure that those things are covered. Going to the findings, is there are things that are identified in those phases. Part of the data champion role is to work with the product team to understand the impact of those and evaluate if that needs to be fixed right away or if it can wait for next print but we consider that very important because in addition to the functional sign off for the UAT before production, we also consider data sign off. Make sure that this will not have any negative impact in the existing data or in the data that we are adding. Once we are in the production release, right after data champion need to make sure that validations that are applicable are also executed in production to make sure that there are no any surprises or findings there. And of course, if they are, again, the part of the data champion responsibility is to bring that up to the product team and the stakeholders, this is what we found and evaluate if that is critical or not and if it can wait or it can be fixed right away but that's essential for us because we have visibility of data quality issues and the stakeholders know about them before they find that in production. So, and lastly, when we have the ongoing data quality monitor, if applicable, of course that it's always ideal to prevent data quality issues from the first place in the user interface, adding checks or control, but sometimes it's not possible or that will be added in later sprint, more complex validation or check with an API in another service, et cetera, it's key to create some kind of mechanism or tools to monitor that in an ongoing basis, identify some data quality rules, report findings and make sure that this is also feeding the application backlog for future enhancements. I mean, if we find something in data quality monitoring, let's see how we can prevent that and work with the product team to retrofit everything and continuously improve the application data quality. All right, so that's kind of a quick overview of a process that this is what we follow on every sprint. Now transitioning to how do we train the data champion and on what concepts do we train them? These are the key concepts that we have training for them. For the data champions, product owners, application leads, for us, and to follow this model, it's key that they know about data governance, data use, data lineage. So we train them in all these concepts, including reference data, and lastly, design principles to prevent data quality, all that Jennifer was mentioning. What kind of even small things as selecting a dropdown versus a radio button, et cetera. All that kind of things is we train them and we have them to think about it. That's part of the data champion to challenge the decisions of maybe a user interface that maybe it's really nice and looks modern, but it's not preventing issues. It will cause issues in the data later. So all that we are going to cover more those design principles. Some of them are basic, but we consider this very important to refresh every so often. That's something we're going to cover later in life. All right, I just want to mention we are starting to get some great questions in the chat. I'm just going to knock off one question because it's a really quick, easy answer. We're on two weeks sprint cycle and we'll cover the rest of the questions toward the end in the Q and A period. So fit for purpose, this is one of the really important concepts that we train our teams on. What does it mean for data to be fit for purpose? And to put that into context for you, imagine that you're pulling data from some transactional reporting system, maybe a number of widgets sold and you're really interested in how many widgets have we sold and is the pace of widgets sales going up? Is the price increasing? And there happens to be a sale price and there happens to be a sale currency, but that's not really important to you. You're really looking at the volume of sales. Well, if that transactional system then sends data to a reporting warehouse that does a report on how much money are we making selling widgets in Yen? And there's been an error with the currency codes and you've got three different spellings of Yen. You're going to have a really inaccurate and maybe even catastrophic report. So that's what I mean by fit for purpose, that data, that currency data that was fit for the one purpose of looking at your sales is completely garbage when you're looking at your profit. So we ask our data champions to really think about data consumers. If your application is producing data then somebody is consuming it and when you're designing new features or new modifications, you really need to be thinking about where is the data going? Not just on your screen, but after it leaves your screen, who's using it? What are they going to do with it? And are they likely to have different purposes with different fitness and different data quality requirements than your original set of users? So that's one really important concept that we share with the teams. Another thing I wanted to share here is helpful tools for good governance. We're very fortunate at JP Morgan Chase to have records of our systems of record and our authoritative data sources firm wide. We've also spent a lot of time documenting not only what is the application but what are they the system of record or the ADS for what type of data, what kind of data. And we've also created a set of responsibilities around being an ADS or a SOAR, including data quality requirements. That may be something that you view in your data lineage tool if you have one, but if not, that's a really, really valuable tool to have in your arsenal. Another is an enterprise data catalog, which is a single source of knowledge for at least your data definitions, but maybe also additional classifications like privacy, confidentiality, other data related questions that you may want answered. So we use our data catalog to confirm when we're not sure what something is or what it means or which field to pull or how it needs to be protective, we can get that information from the data catalog. If you don't have a data catalog, a lot of that information can also reside in the data dictionary. Let's go to the next slide. The dreaded default value. So what is default value, first of all? A default value is when you have something like a dropdown or a radio button, and it has defaulted to a value for you. Somebody decided in the design process that that was the value that people were most likely to pick. That could be something mildly annoying, like if you're going to a travel website, you might be spontaneous and you were hoping that that date of departure defaults to today or this weekend, or you might be somebody who plans ahead and you're thinking, well, why doesn't that start a month out? Now I have to change it. So that's not a terribly critical example of a default value. Something important might be like is a vendor approved for business. If the person's filling out new vendors and approved for business is defaulting TS and somebody gets sloppy and forgets to click that radio button, then they might be saying and a vendor is approved for business that isn't. Another place where that can get really hazardous is in web design, where if you refresh the page, the default values reset to their defaults, well, people might not realize that and might leave them and their default values. So there are two risks around default values. One is just the sort of laziness factor where if you give someone a default value, they may not look at it, they may not bother to correct it and set it to the correct value. And you're gonna end up with incorrect data. That could even lead to legal risks and all sorts of downstream consequences. For example, if a default value has a dependency, if one value, if selected leads to additional follow up questions, well, you can miss those. So imagine, for example, a date of death field that defaults to today's date or imagine a diversity field that defaults to white, which by the way has potentially inherent racism to it, but then let's imagine that if you skip that field, there was a whole bunch of follow up diversity questions around diverse status that you never got to because you never clicked off of non-white. So that's some of the things that we ask people to think about in the default value space. All right, I'm going to cover free text fields. We know that free text fields are kind of the, one of the biggest enemies of good data quality. Assuming that we have a free text input, we can type anything there, copy, paste, add blank spaces, leading or trailing the white spaces, or even for the more technical audience on these calls, it can also have SQL injection and can cause a security risk. So how we can prevent this type of problem, right? The first one will be always include input sanitization. Make sure that we don't save the data as it is, that it has some kind of cleaning. And here it's going back to the data champion, challenge the need of the free text fields and avoid that when there is a risk that they will embed key information. Because sometimes what happened is that free text fields, they are used as a substitute for requesting new fields just to give you one example. We have information about our vendors and there is a common field and we did that kind of analysis using some techniques, text mining, et cetera, to identify what the users were using the comment for. And to our surprise, there was a lot of really good insights there. For instance, if the vendor was suspended, they just typed the suspended reason in the comments because there was no suspend reason attribute. They just changed the vendor status but they didn't have where to put why. So they added in the comments or another example just with that, when they want to add who is the relationship manager of that vendor, they only can indicate one relationship manager. But if they have more than one or even other roles, like who is the chief data officer or who is the, I don't know, billing contact, et cetera, they were putting all that in the comments instead of having that information in the proper place and being used for other things. So that's why free texts are dangerous. Of course that sometimes they are needed and there is no other way but challenge that so there can be another way if possible and try to keep the field size at minimum because the bigger the field is, it is easy to overload. Suppose that we have a zip code and the zip code side is 100 characters. That's unnecessary and it can introduce a lot of risk. And also evaluate what they are going to enter in the free text. Maybe there are just a list of possible values. So it can be a dropdown or free text or something that it's not needed to do a free input. And lastly, if we need to really use free text, evaluate if there are any kind of mechanism to validate that, maybe regular expressions or something like API or services that can be done to evaluate more complex information. Just like an address in Amazon, for instance, it auto corrects the address. We also have that internally, a type of modern chase where we can use our own services to validate address, make sure that it's the right city, et cetera. So always having that kind of centric tools to provide that for all the applications. So we have a reference data for a geography and we can leverage that across all the applications. Moving to data validations. In this part, we are going to introduce that, like I said before, always try to put in the UI as much control as possible to avoid any kind of issues. This is similar to the free text, but also considering a range of values like H, for example, or just to give one recent example for COVID vaccines. There was a man that he applied for the COVID vaccine and he was selected, but he didn't understand why because he was not matching the criteria because one of the parameters to be eligible for the COVID vaccine was the body mass index. And what happened is that when they entered the information, instead of using another unit, and basically that's why he was eligible for the vaccine because the calculation was making a different score. So using that kind of range validations for known values, making sure that units are expressed in the UI so they know what type of data they are using. And lastly, try to leverage mandatory fields where practical. If we are going to rely on some information that is input, we will use that for doing reports or group buy, and we really need that, make sure that it's mandatory. We are going now moving to drop downs. Here we have one simple example that we have Stelica preferred fruit and you have a different type of options. And then you have a chart and it says that Apple is the number one. But here it comes to the question, are they defaulting to the first option? Is it true? So here we have some recommendations for drop downs. One of the first ones that is a really, really bad idea is to allow the users to edit the drop downs. Let's say that I come here, I don't find the fruit and I add it. That's a really, really bad idea because they can maybe don't see the option and they enter Apple without S or in a different way. So that ends up being a nightmare for the values. So it's always important to have some kind of governance and control of the drop down options. And as usual and always sourcing for the right data store. In addition to that, if the options, the drop downs are a lot or not clear, it's always useful to use questions, type ahead, conditional or some kind of cascading mechanism to guide the user on the right selection. So they don't just pick anything. Well, like I said, sort of an adaptive data sources. And if the drop down have a few options or too many, try consider other options, maybe slicers, play your buttons, cascading. There are really nice UI designs that can replace this and they can lead the user to select the right data. See, we talk about authority data sources. Now, Jennifer is going to cover ADS. Right, so first and foremost, the obvious use for an authoritative data source is to source your authoritative data from it. But you can also use it to validate. You can use it as a lookup or to check in data quality validations against a set of valid values that are in your authoritative data source. Another thing that we ask our data champions to do is to really think about when you're sourcing data for new build, what's the exact data set that you want? For example, if your business users tell you that they want a list of workers or employees, ask them why they want it. What are they going to do with it? Because if they're using it to organize a party and they wanna know how many cupcakes to buy, that's one thing. But if they're using it to determine salary raises for employees and they don't need to see the contractors, then you're probably gonna give them the wrong data set or maybe you're using that data set to restack your cubes after COVID and you need to know who's got a permanent work from home assignment going forward and who doesn't, right? So making sure that you've thought through exactly what filters need to be applied or not applied when you're pulling your data, that's really important. And that's something that we see all the time. And I'm sure those of you who run data quality teams have as well where someone says the data is no good and actually it's like, no, you asked for the wrong data. Right, another thing of consistency is trying to use consistent data labels that isn't always practical because different user groups call the same thing different things and you want them to work in the vocabulary that they're comfortable with. But to the extent possible, your data dictionary should say, what is the right label for this term or maybe alternate acceptable labels and then try to keep those labels to a minimum so that when people look across applications and see the same data, they recognize that it's the same data. We're in a process right now of taking some data that our third party oversight team primarily uses and making it available to all the attorneys in legal. And what we're learning is that there's a lot of third party oversight jargon that is the absolute bread and butter to that team, but the attorneys have no idea what it means. So you need to either translate it into their language or do a tooltip or something to make that label meaningful so people don't get confused. And the same is true on the backend. Even when you're coding or if you're designing a database or a data model, striking that balance between using the standards, try to be consistent wherever possible but don't go so far as to lose the meaning of the data. All right, we have a couple more design principles that Victor's gonna cover and then we're going to share a success story. All right, shall I close this section of the design principles? Just all the recommendations that we have. The first one is that do not assume the user will do things correctly or that they will know how to enter data or that they even know about data. If a process is seen as a checkbox, they will just check the box. So if we are collecting data, we need to guide them through the right process and make sure that they don't make mistakes either intentionally or by honest mistakes. And part of that, it's all comes with the design of the application and also ensuring that there are a negative testing and see how the application will react to unwanted inputs or unexpected behavior of the user. Another recommendation is have bulk loads. In general, if they are a bad idea, of course that they are needed and bulk loads are there. But here we are talking about bulk loads that are done by the users. So this means allowing the end users to do bulk uploads on their own. Maybe loading a spreadsheet in the application or database. So in here, we're recommending that typically we should avoid that unless there is a very solid process because opening that door of propulating databases with spreadsheets, CSV, it can lead to a lot of errors and data quality issues. If there is no other option, make sure that the process is solid, also do some kind of validation of the file and prepare for the unexpected because we have all sort of problems when we do these kinds of things like allowing the end users to do the bulk uploads. And lastly, make sure that as part of the application, there is some kind of test automation for data management, test data management. So having all the different edge cases even with synthetic data or be able to reproduce all that automatically and make sure that as part of the testing, there are no other issues that are found in regression testing or some kind of problems that are introduced by new features or by changing something in the database or in the application. All these are just to give you an idea of what kind of things we are asking the data champion to think about it, to challenge the need of the users, work with the application team to keep all this in mind. And for us, it has been helpful and we have prevented a lot of problems doing this. So that's why I'm now I'm going to cover another section that is a success story with some numbers. This is one of our applications where we introduce this model. Essentially here, what we have is the number of release, how many stories, how many stories require an additional data quality review. Again, not only the unit testing, unit testing, et cetera, additional validations. How many data quality checks were executing, the number of findings in UAT and the number of findings in production. You can see that there were findings in UAT, but the numbers in production slightly decreased. You can see that what happened is that almost 80% of the data quality findings from UAT of doing all this model with the data champion, they did not impact production. And the other, they were considered low impact, but they gained visibility and they were fixed later. But the good thing about it is that the stakeholders, they know about those issues because we are the ones informing them and not them finding them in production. So this is just some numbers of how we track all this. We do track other things as well, but in general, this is how we have been keeping track of all we do. As you can see for us, it has been resulting very well and we are working on expanding this to all the other applications in the department. Okay, moving to a summary of the benefits of doing this. One of the key things for us is avoid costly data cleanups. Here, the idea is that most of the data quality issues, they can be prevented from early stages in the application design. If all those things are considered in advance, we could have avoid a lot of problems. And keep in mind that cleaning the data once the application needs life, it's in production, it's very expensive. It's time consuming in each of those ways. And also typically people don't want to change something while it's in production. It's reluctant to do it. No, let's better not change that or do not touch it or be careful. And maybe something that could have fixed in a couple of hours in the development, it takes months until it's decided to finally go in production and clean it or define a plan, a lot of meetings to define the impact, et cetera. So it's very expensive to clean up after. And all those UI validations in application screen can prevent all that. And lastly, the user experience and reputation. Especially when we are building new applications, we are launching a new application and it has already data quality issues in place. That's something in our department we have been introducing a lot of new apps. And for us, it's key that the user have a good experience because as soon as they see bad data, they start losing the trust in our apps. Second, we have data-centric development. All these having this mindset in the data champion in the teams, it helps us to prevent all this by thinking how data is going to be used, what data do we have, what do we need, going back to the fitness for purpose. That's often a hard thing because they say, okay, how do I make sure that data is fitness for purpose if I don't know what other purposes do I have? So that's why it's key to work with data architects that they can have a view or they can design a database in a way that is extensible and not just for that specific purpose. And lastly, all these data-centric help us to make sure that issues are visible and they are acknowledged. So we know how to prioritize them, we know what's there, we have full visibility of our data problems. All right, some additional considerations. The benefits of high quality data coming from your authoritative data sources or your systems of record. Now, obviously not every single data quality issue can be prevented by sourcing from an ADS because of the fit for purpose issue, right? But in an organization like ours where there are expectations around the application teams that own authoritative data sources and one of those expectations is an elevated level of data quality, you should be able to expect that out of the box you're gonna have less data quality issues than you would if you were building that data set yourself from scratch. Another consideration is when you don't have authoritative data sources you end up with data stored in multiple places that maybe get slightly different over time, create synchronization issues and confusion for users. Even something as simple as a timing issue. Years ago I had a data quality complaint from the prime brokerage that some of our trade prices were wrong and it turns out it was just as simple as they were using an overnight batch and everybody else who they were comparing with was in a real time update. So of course the data was out of sync because the data was different data. And then lastly, using that trusted reference data that you get from the ADS instead of creating your own means that updates will flow in automatically which actually will save time for your developers in the long run. And I know that's a point of contention. Oh, we only need two countries why don't we just make our own table instead of building connectivity to the system of record for the country for the country codes, right? But down the line when things change when you get that new consumer who has different data requirements that's when it becomes a problem that you just made your own table. Or God forbid you're the person that made their own table right before the Euro came along, right? And you had to rip everything out and start over. So those are some considerations on the ADS side. And then benefits of having this additional layer of validations that we've added to the agile process. First of all, putting that additional layer of data testing on top you find other things. You find things that your users wouldn't have found. You have people sitting and thinking about edge cases that maybe you would never would have come up with if you're just sitting and staring at a screen during UAT and saying, oh, the data looks all right to me, right? For example, you're unlikely to find duplicates unless you happen to be sorting in let's say alphabetical order. But if you have somebody thinking about all the possible different kinds of duplicates and testing the data for first name, last name reversed as an example, you're much more likely to find those duplicates. Another one is that having this data-driven agile model really does make it so that data quality is everybody's job because every team is gonna have that data champion going, hang on, we need to put a data quality story in this sprint because we need to fix that low priority issue that made it into production last time because we don't want it to linger and get worse, right? Scripting and test automation, again, the benefit here is especially with larger data sets, it's not a human trying to test all that, it's a script, it's automated, you can really go through everything instead of just doing spot checks and hoping that you catch a problem before it goes live. Another benefit is your business users will start to think about data quality because an agile they're involved in UAT, they're gonna be right alongside with the data quality team and they're gonna be seeing those errors that issues be uncovered and resolved. Just one thought on data-oriented testing, we talked earlier about bulk uploads and we're dealing with a really hairy one as early as this morning, which is one of these edge cases that you never would have guessed, right? But we let a manual upload happen for expediency and some of the users overwrote some IDs, they copied and pasted and we didn't catch it and then the next time the data came in from the authoritative data source, it took that correct ID and it overwrote the good data from the bulk upload. So having everyone sitting and thinking and that's a rush example, right? That wouldn't have happened if we weren't rushing, but having people sitting and thinking with each story, what can go wrong with the data is really invaluable. So I'm excited to get to the questions because we have some really good ones. I'll just wrap up by saying that we have this model, like I said, up and running the two applications. It's been fantastically successful and we're really excited to roll it out to the rest of the legal department and I hope some of you will take this on as well because I'm confident that it'll improve your data quality and increase the number of people who are data stewards in your organization. So let's go to the questions. All right, maybe I can read some of them. There is one that say, are these each of the sprints, the spring teams or at the project level? Well, we are doing that on every sprint and the thing is that some sprints, maybe they have less stories that are data intensive. So it really depends on each sprint. Sometimes it happens that there are sprints that are mostly UI or some kind of other changes that they didn't require all these. So it really comes to what's included in the sprint. I hope that answered that one. Right, I'll take another one. This is a great question. What kind of resistance did you get from the product engineering teams to assign data champions and how did you overcome that resistance and demonstrate value? That's such a great question. We definitely did get some resistance up front because this just feels like more work, right? We're just dumping more work on you. So you do really have to sell them on the benefits. I think the training helped with that, showing the success stories to the extent that you can get end users involved in your sales process, that really helps as well. And when the end users are the ones saying this is a worthwhile process and we're glad we're doing it, there's also the carrot and stick approach that says, look, this is something that we're rolling out firm wide and so do you wanna be part of the solution or part of the problem? And I think also it's an opportunity to sell people on the value of learning a new skill set. Data quality is a very in-demand skill set. High quality applications are in demand and so positioning that as you're going to learn a really valuable skill and get recognition for it helps as well. It's critical that this be part of people's goals and objectives so that when they're successful at it, it becomes part of their overall success. I don't know if you wanna add anything, Victor, to that question. The other thing is that we typically work with them when they're doing it for the first time, maybe for a few months, we help them, we are there with, it's not that we say, okay, we need to do this and they are alone, it's that we kind of work with their teams that we embed ourselves in their team and we slowly start doing that thing with them. And another question it says, is this focus on application development only? Is there any connection with data engineering teams? Well, the same concept can be applied on that. In fact, one of the application teams that we have is essentially a data warehouse team. This is a team that is building our department warehouse. So yes, it can be done on that thing. Of course, that's some recommendations that they are applied to UIs, maybe they don't apply to a data pipeline, but there are concepts that they apply as well. So the same concept applies for that. Great, and I would add that we also work very closely with our chief information architecture team and make sure that data modeling is part of major sprints as well. And data modeling is one of the best ways to get people thinking about fit for purpose. The data maybe needs to be a certain way now, but what does it need to be like in six months? What could it be like in the future? And data modelers are greater saying, well, what about this? You think about Victor's example with the comments where there are multiple relationship managers and the extra ones are stuck in the comments. A data modeler probably would have, when that application was being designed, would have said, well, are you sure you only wanna have one relationship manager? What about other roles? What about a one to many relationship here where you assign a role and then put a person in the role? If you'd modeled the application like that from the beginning, you wouldn't have ever needed to have people sticking their additional relationship managers for other regions in the comments. Here's another good one. What process do you have for agreeing or approving the use of an ADS to avoid unexpected performance impact? It's basically mandatory for us. It's part of our firm-wide data management policy that you must use authoritative data sources for critical data sets. That being said, we do give people a pretty long window so that we're not saying if you're not compliant, you have to fix it overnight and have a performance crisis. But we do ask that application teams provide a timeline and a project plan to become compliant with the use of the ADS. I think a couple of minutes left. Is there anything you would do differently next time? Well, that's a great question. I guess we must be perfect. No, I will say, I wish I had engaged some of the product owners a little earlier in the process to make it a bit more collaborative so it didn't feel like my way or the highway. We're doing that now, but I definitely thought about this as something that we needed to just figure out and solve this problem. And so we went in a room and figured it out and came back and said, here's the solution. I think we could have called on some of our colleagues to help us with that process. Pete had a follow-up, Pete Rivett had a follow-up. My question was related to the impact on the ADS of increased access. Ah, okay. I don't know that that's been an issue. A lot of our ADSs, especially for reference data, are fairly robust so they'll either have a message, a messaging layer or something that will protect them from too many calls. Some of them use APIs, but I'm not really the right person to answer the question, but I would say that it hasn't been an issue so that being fairly robust, I believe is part of the requirements to be an ADS in the first place. Okay, we will need to wrap it up there, but Victor, Jennifer, thank you so much for sharing your story. This has been a really wonderful presentation and I can tell from everybody's response here that they got a lot out of it. So thank you for that. For everybody else, appreciate you joining us. We will be starting up again in 10 minutes for our next round of sessions. Please stop by, connect with some folks from this room or speakers or the sponsors. And we look forward to seeing you. We've got two more conference rounds this afternoon, so hope to see you again. Thanks very much, everybody. Bye-bye. Thanks everyone.