 Hi, I'm Judy Lewis at Vanderbilt University Medical Center, and I'd like to tell you about the Harmonist Data Toolkit, which is a shiny application we developed to streamline data harmonization in a global research network. So first, let's talk about the need that we were trying to address with this application. When you have a large research network, you have hundreds of clinics and sites around the world that are collecting observational data on patients in a large variety of formats, but it all needs to be mapped to a common data model or a consistent format before it can be combined for research. This is a challenging process that can introduce errors, and it's important to ensure data quality and transparency before conducting any kind of analysis. So we saw a need for a reproducible data workflow that minimizes the burden on these users. Our solution is the Harmonist Data Toolkit, which is a shiny web-based application created for a global HIV research network, but we designed it in such a way that it's generalizable to other research consortia, and we developed it with the open-source tools of our shiny and red cap, the electronic data collection system, and designed it to evolve with changes in a common data model because there's always new medication codes and new variables of interest to investigators. The toolkit checks data sets for conformance to the common data model and other data quality issues, generates reproducible reports, and can also transfer a data set to the investigator or the investigator retrieve from a secure cloud storage. So the common data model, we just talked about that, that is that defines the table names, variable names, variable definitions, the code list for data sharing in a research network, and it's something that changes over time, as I just mentioned. The key to the Harmonist Data Toolkit is defining all of that information in red cap because once you've done that, it's in a machine-readable format and that enables software tool creation. Here's a snapshot of the template that you would use in red cap to enter in details about each variable in the common data model. Is this variable, what is format as expected? Is it a coded variable? Is it a variable that should be unique in each table? That is all exported into R, the use in the red cap API. This is a JSON that then can inform the data quality checks in the software. And rather than talk about software, let's watch it in action. If I were a data manager with a data set to check, I would come here to our project management portal and choose a data request. And this is a test one that I created. And I would say I would like to upload my data and it automatically passes me along to the Shiny application. And so here we land in the Shiny application. It knows what data request I was responding to. And all of this is information that's in red cap that is being pulled from into R using red cap API. And so if I were the data manager, I could go here, browse, choose the files that contain my data set. And that uploaded file, that set of uploaded files is compared with the common data model from red cap. And so we'll see now a summary of what was found. It found all of these tables and all of these columns that match variables in the common data model. You can see right here that I included a few variables that have been deprecated. And those were marked as deprecated in red cap. And I can upload, I could correct my data set and upload it because obviously I'm missing some of the information that was requested. But if this is all I have, maybe my region doesn't collect certain kinds of data. I can go on to the next step, which is checking the quality. Again, this is all informed by the data quality checks are written in a very generalized way where it goes through that common data model in red cap. Is each variable, you'll see here's the data quality checks. Is it a numeric variable? If it is, is it within plausible limits that were indicated in red cap? Is it a required variable? Well, then are there any missing entries? Is it a date or all the dates in the right format? Are they in the correct order? Are all end dates after start dates? Were there any duplicate records in the table? And after the data quality checks are complete. And one thing I want to point out is if I had just added a new variable or a new code to the common data model, that would automatically have been included in those data quality checks. So now the data quality checks are complete and we can see this interactive table where we can browse the results. I could also download a spreadsheet to review the results offline. If I wanted to see specific details about a particular error, here's the loss to follow up table. I can see that I had four invalid codes for the reason for dropping a patient from a cohort. And I might think I don't think I had anything wrong. And I could look here, I had the code 99. And I might think I thought that was the code for unknown. And I did that too quickly. But there was the link here in this shiny modal to take me to the common data model to see what were the valid codes for that variable. And the valid code for other is actually nine. It was not 99. So I can go back to my application here. And I could go and correct it as a data manager. I should go back and correct any of these errors that are correctable. Certainly some it's difficult to track down and correct. I can continue on to step three and create these reproducible reports that summarize the data set, the quality of the data set, and include some useful information for data managers that can help them spot any data quality issues that can be addressed. So here you can see the report, of course, generated by our markdown summarizes the data set that was shared, the patient characteristics. You can see that I've changed all the clinic group names to names from the Harry Potter series. There's a series of histograms that could help spot something like this. Why were there almost no viral load measurements in a quarter where there were a lot of visits? These are the kind of data quality issues that should be addressed before any analysis is conducted and conducted using this data. Here's a summary of all the types of errors that were found, a heat map that helps to spot gaps in reporting. There was absolutely no visit data from this particular site. So that's something that should be addressed. We can also generate a quality metrics report, which gives a visual summary of the quality of the data set. And if I'm pleased or satisfied that I've corrected everything that I can, I can go on to the final step of submitting the data set to secure cloud storage, which is the AWS S3 buckets to be retrieved by the investigator. And what kind of impact has the use of this tool had in our HIV research consortium? Well, in the first 18 months, there were 507 uploads and checks of data sets. And 41 of those data sets were then eventually transferred to the investigator for these international studies. We surveyed our users and found that 91% of the data managers recorded that they revised their data sets based on the toolkit results. And we can see that because there was a 61% average decrease in the number of data set errors in between. This is a decrease in error from the initial upload to the final submission. The average decrease in number of data set errors is with 61%. And 100% of the data managers, investigators in our network said that they felt that the toolkit use has improved the quality of data sets. So why does this matter? Of course, high quality data is essential for meaningful research. We think that using a web-based application is important because it does not require any local installation or maintenance of software by the users. This generalized design that we implemented using RedCap and R simplifies the software maintenance and allows it to easily be adapted to other research domains. And increasing research quality and throughput is just a really important goal. So I appreciate all my team members, our collaborators and funders. And you can find my code on github at github.com slash idea slash harmonist. Thank you so much.