 And so just to give you a little bit of framework for what I'm going to focus on during our short time together is that first, I want to convince you that administrative data can and should be managed earlier in its lifecycle. And second, and I'll spend a little bit less time on, but I want to present some ideas about how doing so requires both what I call top-down and bottom-up approaches. And the examples I'm going to use today are specific to administrative data in the United States, where I am from, though I'm really interested to hear what parallels you see in your all's national contexts. And I hope you find me around the conference these next day or two to let me know what you see there. So to start with, I want to give a little bit of a definition of what I mean when I say administrative data. So as distinct from sources of research data, administrative data is collected in the context of ordinary operational processes and procedures, so in order to do a job of some sort. Often within a government context, though certainly in industry and other contexts as well. And although it is not collected with a particular research question or use in mind, it can and often is used downstream for research purposes. And a great example of this is something like hospital charge codes, which are, of course, collected within the hospital context in order to carry out hospital operations and to do customer billing. But they can be a very, very rich source of research data because they allow for the analysis of what sorts of treatment interventions were done within a hospital and can be used in multitudinous contexts like public health for research. And administrative data is a great source of research data because it is often very large scale and is also collected across long periods of time. So it can be a very good longitudinal source of data. It's often less resource intensive to collect because it is already being collected, though making it useful for research purposes may be resource intensive in its own right. And it is also less obtrusive to participants to collect, something that can be particularly important in situations in which the research subject involves very vulnerable or sensitive populations. And the act of going in to collect research data in the traditional way can be very obtrusive. However, there are some well-documented challenges with administrative data, and particularly its use within research contexts. One is gaining access. Open government data portals are improving this. But there are lots of sources of administrative data that are still not in portals and that you have to know someone, the processes of who you need to contact and what data is and is not allowed to be accessed and under what conditions is often very opaque. This information is also distributed across a number of different systems and often individuals are not identified uniquely in order to link them across these systems. And there's often regular or missing definitions of variables, as well as quite the lack of documentation in many cases. But above all, people when they talk about the use of administrative data for research purposes are very concerned about data quality. And we'll come back to this quote a little bit later. But this is from a research study looking at human services organizations across the entire United States, doing a bunch of qualitative interviews with them, and finding, as we all know, that poor quality data collection processes cannot be readily corrected after the fact. There are many challenges with working with administrative data that can be overcome with creative research design, but poor quality is one of those that really cannot be accounted for. So as I promised, I'm going to be talking about how we can extend some ideas of research data management into this space. And so these are existing practices that have been developed over the course of the last, about a decade and a half, in order to improve the secondary use of research data. And so just for a quick definition, research data management is usually applied to this lifecycle concept of data. And I don't have time, unfortunately, to go through all of the stages of the research data lifecycle and how it improves the secondary use of research data. But I do want to highlight two key phases. The first is this beginning one here, plan and design. And this is an important phase, because within research data management, the way that these processes have developed is that the research team themselves are responsible for developing a document called a data management plan that explains how they are going to actually manage and organize and document their data over the course of the lifecycle, such that it has enough context to be useful in a secondary context. And this supplements more formal data governance practices, because it focuses on the on-the-ground realities of what the actual research team needs to do in order to make sure that these steps happen. And then second, I want to draw your attention to kind of the far end of the lifecycle. So after a scientific project has gone on, the data has been analyzed, collaborations have happened. And the point here is that the research data management lifecycle starts with the idea of sharing and reuse in mind. And this is significant because the reuse value of data is greatly enhanced by management practices that begin as early as possible in the lifecycle of a project. And you manage data somewhat differently if you're contextualizing it for subsequent reuse than you do just to facilitate the use within the original context. So to try and make this a little bit more concrete, I'm going to dive into a specific case study that's very close to my heart. This is the context that I originally started working with data in, which is child welfare in the United States. And with this case study, I hope to demonstrate a little bit what kind of an administrative data lifecycle might look like. So to begin with child welfare is the term that is used for the systems that are used to investigate and prevent allegations of child maltreatment and neglect. So this includes things like foster care, though also supportive services for families where the children still remain in the home. As such, there is a wide array of connected service systems, because it is not just the case workers that exist within the child welfare system itself, but also an array of supportive services. And as I mentioned, this is the context in which I first began managing data. I moved on to work with earth sciences data at the federal context, and now I work with all sorts of researchers within a university. But I've continued to think about how the skillset that I have developed could have applied to the work that I did at this time and how to still reach back out to those communities and start to see that change. So I'm going to start with a bird's eye view of this child welfare data. And this is because the national data is what is most visible to the public, and by far the easiest to access. So this national data takes the form of statistics, and this can be found in numerous governmental reports. And since 1988, so quite a ways back, you've also been able to access the underlying data in a national archive called the National Data Archive on Child Abuse and Neglect. And this data that makes up the federal statistics and those underlying federal data sets is collected by the federal government from all of the 50 US states. And they are incentivized to participate because they receive federal funding if they contribute this information. And the archive that collects this data does a really good job of normalizing across very different practices in the different states because they do not all define child abuse the same way, and they certainly don't have a service system that works in the same way across all of the 50 US states. But it's very well documented. You have great link the code books and a lot of professional service staff. They have staff statisticians that provide support, as well as robust communities of practice. So this is an example of administrative data working really well. It's clear how you access it. It has the documentation necessary to reuse it. And it has well established communities in order to help you work with it. However, it represents a strategy that is sometimes referred to in the literature as research ready administrative data in that it's an ad hoc process. It takes data that's already been collected and it cleans it up and curates it and presents it for research use. But as I mentioned, the federal government is not collecting the information itself. So that is very much an end of life cycle ad hoc after the fact process, whereas the state actually originates in the 50 states. And to support the individual states and their different child welfare systems in collecting this information, the federal government in the US also in the end of the 90s with implementation in about 2000 developed statewide automated systems for collecting this information. So they have some technical infrastructure. Moreover, they've also updated those statewide information systems recently with a new model that makes it easier for states to get information back out after they've entered it in. So they incentivize the use of it by actually making it more functional for those frontline workers. As well as by allowing the systems to be kind of like a linked system of modular information systems, which also makes it more practical for a distributed service system. And this reflects kind of a general trend in the last decade of a federal investment in both data governments as well as local data capacity building initiatives. Nevertheless, the access conditions for the state data are very opaque. There's a lot of documentation in the literature about how hard it is, as well as there are some shortcomings with these capacity building programs in that they don't often provide a very clear definition of what capacity they're trying to build when they say data capacity, whether that means the ability to analyze data, the ability to use or create technical solutions, or something else altogether, like as I'm talking about data management, which is kind of a third data skill separate from those two. And finally, we've gotten even more broken down into our map. That state data is not actually the original point of collection because this child welfare service system is highly distributed across not just smaller geographic subdivisions but individual offices where people work and collect information. And sometimes that program data is a state run office, but this is also a highly privatized service system such that a lot of these organizations are actually small, not for profits. And much as the state information is aggregated into national data sets, the state data itself is aggregated from all of these small program offices. And the expectation when you read about the service system in the literature is that you can kind of just drill down to the state level and that that's enough. You don't need to look at all of the offices where the information's collected because they just use those statewide information systems that I was talking about on the previous slide. But this is the context in which I worked. And I know that to not be true because often the actual operational needs of these program offices are different than what was designed by those information systems. And so shadow systems are created, alternate information systems that represent the actual on the ground needs of the people working in these offices. And how that information is controlled or organized or managed is something that there is basically no documentation on at all. No one is looking at it and no one knows the quality of it. But overlooking that program data ultimately means that efforts to improve child welfare data quality don't start at the beginning of the life cycle. So this is basically just a different visualization of a research data life cycle like the one that I showed on the page where I was defining research data management. What I've done, however, is I've just placed where the different types of child welfare data that I was talking about kind of fall on this. And the program data is really where the information is collected. Another way of putting this is that there's a long tail, so to speak, of data management need and child welfare. Taken on their own, the program offices may collect kind of small amounts of data. However, cumulatively, the effects of not building data management capacity within all of those program offices affects the quality of both state and national data. In other words, poor quality data collection processes cannot be corrected after the fact. And to close, because I only have a couple of minutes here, I wanted to offer a few reflections about how this need that I've hopefully identified can be addressed through earlier intervention. And I think that there are some lessons to be drawn from research data management here as well. So on the top downside, data management in the United States really originated from a memorandum from the White House Office of Science and Technology Policy in 2013 that required federal agencies that provide research funding to develop policies for data management and sharing that researchers would comply with. This memorandum has actually just been updated in August of 2022. And this is what caused kind of a wave of data management development in U.S. research. But the more meaningful changes that came was the acceleration and shifting cultures and how research data is treated. So for one, this spurred the professionalization of data management as a distinct competency that is a skill set that is unique from what is needed just to analyze data and is also unique from what is needed to develop software infrastructures or technology. It also spurred the development of disciplinary communities who have worked together to rise to meet the needs of managing different sorts of research data. And so what I propose is that this development of top-down and bottom-up approaches is kind of its own sort of lifecycle when there are top-down mandates that prioritize the development of these sorts of processes that also spurs the development of communities who go on to inform the next wave of policy changes as we saw with the recent update to this memorandum that was first issued in 2013. So to summarize, administrative data should and can be managed earlier in its lifecycle. We saw that within research data. This is a concept that can be applied in other contexts. There is no reason to point out administrative data as being inherently lower quality. And doing so will very likely require both top-down and bottom-up approaches. So I'm happy now for the last couple of minutes to take any questions. My email address is also here and I would be happy to talk to you later at the conference. Thank you. Thanks very much. Do we have questions in the room? Thanks for the talk. There you go. I want to ask you about two things that might be different between administrative data and research data. What is it about and how do you manage consent when you're talking about administrative data? And also costs like, I bet that these offices of child welfare, they have the data budget and tight budgets perhaps. And so going to them and telling, okay, you need to plan and administrate your data differently in a way that is not focused on what they do, but on the downstream effects of the data. How do you manage that? Yes, those are great and I think largely unanswered questions. There was so much I wanted to fit into this presentation and I settled on trying to outline a problem. And I think that there are many other issues with enacting more data and management of administrative data at the source and funding and consent are two big ones. The thing that concerns me is that no one seems to be asking this question. They seem to be just starting so late in the process that there are many issues that in fact deserve a lot of attention and research that deserve debate within the community that just isn't happening at all. And so I don't have a very pat answer to either of those issues right now so much as I've just been speaking about this issue everywhere I can trying to find like-minded people in order to start advancing some of these conversations. We might have time for one quick question if anyone has one. Thank you. To your answer. I have a comment because I also work for the statistics bureau here in the city of Buenos Aires and we do manage a lot of administrative data and we have some protocols on how to treat like when you want to make the data anonymous so we have like a law and a regulation that covers all that so we can talk later. But my question is about what do you do with the data because you talk about your research and I'm really interested in about like what type of conclusions are you making based on the data that you are managing? So that's an interesting question. I don't directly analyze this child welfare data myself. So kind of my career trajectory, I worked in kind of like a program administration support role helping to manage so kind of organized provide documentation for work with researchers who we might be sharing the data with in order to provide context and I largely didn't know what I was doing at that point. I went back to grad school, transitioned to the management of research data and I now work as a librarian and I work across disciplines helping researchers figure out how to manage and contextualize their data for open data sharing largely under the kind of federal policies that I was talking about within the US. And so my research is kind of like a step removed from the direct analysis of the administrative data itself, I'm interested in this particular question here which is how can we manage this data? And so that is the topic that I'm researching rather than analyzing the data itself. I think we need to change. So big applause for Kelsey Badger for a great talk.