 efficient and productive economic institution. And they made arguments about the interior life of the enslaved based on statistical calculations. And that's the primary problem with the text. And nobody pushed back or vociferously on that text than one of their seemingly friends, Herbert Gutman, who wrote Slavery in the Numbers game. And Gutman was no stranger to quantitative methods and social history. But he wanted to argue, and rightfully so, he returned to all of their data, that the efficiency that was achieved in enslavement in the United States was achieved through pervasive presence and threat of violence, not through collaboration. And so this analysis of economic systems surrounding slavery couldn't really yield knowledge about the inner thoughts, feelings, or motivations of the enslaved people as they performed their labor. Regardless of how productive they were. So in the wake of the widespread reaction to Cleometrics, historians have generally been a little bit more private about their work with data, presenting only end products and narratives and summaries. Work is data driven. It is tended to be kept pretty close to the vest. And for many people, it's a small part of a larger interpretive process. And so they do minor work with data, and they don't really share it with the world. This tendency masks the role that data collection and analysis plays in contemporary scholarship. So public or private, larger, small, scholarly engagement with data demands that the same kinds of critical interrogation that we bring to all other kinds of sources comes to our data. Scholars can never assume that the meaning of their data is self-evident, rather than surrendering to the easy assumptions that quantification, or even simply the collection of data, results in a simple reflection of the world in which it was made. Historians would do better to focus on the constructedness of the historical sources at every stage of their existence, including as that of data. So in doing so, they might embrace Joanna Drucker's call to reconceive of all data as capta. And so Drucker explains the differences in the etymological, boy, that's a hard word, etymological roots of the terms data and capta make the distinction between constructivist and realist approaches clear. Capta is taken while data is assumed to be a given, able to be recorded and observed. From this distinction, a world of differences arises. Humanistic inquiry acknowledges the situated departure and the constructive character of knowledge production and the recognition that knowledge is constructed, taken, not simply given as a natural representation of pre-existing fact. So in resisting the naturalization of data, historians can create a situation where they can be much more careful about the claims and conclusions that rest on that data. The history of enslavement, transportation, and forced labor of African-descended peoples is once infused with violence and destruction and dehumanization. As the history plays out around the globe over several centuries, the parties involved created a massive trove of historical documentation of the institution and its impact, families ripped apart, human beings bought and sold, cruel corporal punishment, financial systems developed and proliferated, legal justifications codified. That documentation can serve as a basis for far-reaching data-driven scholarship on the history of slavery. But these methodological approaches are not without their ethical pitfalls. And historians of slavery have a duty to pause and consider their obligations in creating, publishing, and drawing conclusions from the data that arises from these sources. So treating data as capta places in an important trajectory in the lifecycle of historical evidence. That trajectory includes the initial creation of the record, its elevation to the status as a piece of information that should be preserved, its preservation, its preparation for research access, its review by historians, its transformation into structured data, and its publication in perhaps its publication in a digitally accessible form. So in scrutinizing this lifecycle, historians can come to a renewed awareness of the constructiveness of the data that they work with and of the individuals who help shape access to that evidence about the past, including the record creators, the archivists, the historians, and the technologists in the process. So let's start with the events. Oops, wrong way. Down. Come on. You can do it. Down. There we go. This is the peril of using slides as your slide platform. So this is Thomas Mulledy. He is a Jesuit in Maryland in the early 1800s. I believe that this portrait is from by the time he makes it from what was Georgetown College to Holy Cross a little bit later in this semester, a little bit later in the century, excuse me. So Mulledy ran Georgetown in the 1930s. And he is the individual who is responsible for signing on the Articles of Agreement selling the enslaved people who lived on the farms that were owned by the Maryland province of Jesuits. These farms had been owned by the Jesuits since their arrival in Maryland in 1634. They had sort of acquired them over the centuries. And at the point of this massive sale in 1838, there were 275 people who were owned collectively by the Maryland province of Jesuits. They didn't mostly live on campus at Georgetown, but they lived arrayed across these sites in southern Maryland. And the ones with the red boxes by 1838 are spaces where enslaved people lived. Through the 125 years or so before the sale in 1838, they were scattered across other plantations. And so Mulledy and his friends are responsible for the creation of an awful lot of documentary evidence of both the sale, which has been sort of part of a national conversation about universities and their relationship to slavery and the profits of slavery and their responsibility to descendants and things like that. My work has been to try to figure out what the lives and experiences of those enslaved people had been prior to the sale while they're living in Maryland. There's a whole bunch of research also going on. On the receiving side, they all get the massive sale sends them to Louisiana. And most of the work of the descendants in the descendant community has been to trace their genealogical connections to those sites in Louisiana. This is a massive inventory that was made in 1837 that lists all of the people who were sold to Louisiana and their family relationships and things like that. So we've got a lot of different kinds of records over the course of about 200 boxes of material in the archives of the Maryland province, Jesuits. And I have been trying to think about ways and to construct some ways to extract some capta from that to help us better understand the larger experiences and interactions of these folks. And so if we're to begin with historical events themselves, we begin with a problem of perspective and the events participants each come to their recording of those events with particular perspective. This is historical method 101, that there are a whole set of cultural conditions that shape the kinds of observable evidence that they create. Degrees of literacy, the strength of the oral tradition, access to materials for recording and making documents, storage, relationships with the people who participated in the actual events. And so comparatively, the surviving evidence about slavery in this case and in others and the conditions that perpetuated it has been overwhelmingly created by those in relative positions of power and dominance and not by those who were enslaved. The records they create are often imbued with the sense that enslaved people were first and foremost subjects of commerce and control, people to be bought, sold and used instrumentally in the service of the interest of their owners, even at the same time as they record personal relationships, personal relationships between owner and enslaved, but also amongst enslaved people themselves. And so historians are left to assemble the fragments of not entirely trustworthy sources to develop a narrative interpretation. And additional questions and complications arise when historians begin to represent those sources as data that can be categorized, quantified, and perhaps visualized. Referring to the documentation of the transatlantic slave trade, Jessica Marie Johnson has explained that, quote, in slaving conventions across the African coast, compiles of slave ship manifests participated in the transmutation of black flesh into integers and fractions, end quote. This process suggests the gross dehumanization through quantification that happens in the process of slavery. But it can also happen in the process of doing work about slavery. And so those integers and fractions are not necessarily, it turns out, also any more trustworthy than any kind of narrative account. Though enslavers might dutifully record their enslaved persons as property for tax purposes or for inheritance, their attention to the individual characteristics and details about these people belies their dismissal of their human integrity. Sharon Block's recent book, Colonial Complexions, Race and Bodies in 18th Century America, is overflowing with examples of the ways that Anglo-American colonists described people's physical appearances in the period just before skin color becomes the overwhelming indicator of racial status. Block's work documents all sorts of unstable meetings about the description of physical markers in thousands of missing persons' advertisements for colonial newspapers. Furthermore, this instability is not limited to descriptions of complexion, not limited to narrative descriptions. Block explains, quote, age might seem like one of the most obvious features of bodily description. But it was not necessarily an exact count of a person's years on Earth. In a phenomenon known as age heaping, runaways were much more likely to be listed at an age that was a multiple of five, for example, twice as many runaways were indicated as being 25 as 24 or 26. So this evidence of semantic variation is just one of the facets of a larger need for researchers to question the seemingly self-evident elements of historical records, even when they're numbers. And then we've got to move on to the archives themselves. These records, their partial and subjective, created at the point of original events, resided for decades in a variety of sites that predated what we would commonly recognize as archival repositories, tucked away in church basements and office storerooms and courthouses. The ongoing threat of destruction and decay due to natural disaster or climate or neglect, all of that was substantial. The records' care and stewardship was often not in the hands of someone committed to record retention and preservation for the majority of their existence. Nonetheless, historians would be remiss if they discounted the ways that archival professionals have shaped the historical record and our access to it, also certainly in the context of the history of slavery. So much has been written over the last 50 years about archives as sites of power and tools of state definition and institutional definition. With Foucault and Derrida, this work stands as a theoretical interrogation of systems of power and desire that arises around the creation of knowledge. But with Steedman, Stoller, Trio, and many other perspectives collected in edited collections, the work centers much more closely around the place that development of record-keeping bureaucracies have played in supporting imperial projects in colonial contexts. The power of these accounts is so strong in the thinking of historians, it cannot be underestimated. So we even see, in contemporary discussions of digitization projects in South Africa, Keith Breckenridge is arguing that the narrative of the imperial archive is standing in the way of digital preservation efforts in former colonial spaces. So while the archive, in quotes, continues because of the archival turn to stand out in the way that historians envision themselves and the process of writing history, an increasing number of archivists have noted that the practices of actual archival professionals rarely enter into the conversation. So records reside in archival repositories, have been subjected to a whole range of processing and shaping and framing practices at the hands of archivists themselves. So the core duties of an archivist, if you're not familiar with what archivists do, include record selection and appraisal, appraisal and selection, collection and arrangement, description, and preservation and access. That's about as baseline as we can come about what archivists do in the world. But modern archival practice in the Anglo-American world has been heavily shaped up until the 1990s, let's say, by the influential work of Hillary Jenkins in the wake of World War I, Sir Hillary Jenkins, and T.R. Schellenberger in the wake of World War II here in the United States. And Jenkins champions an approach to the work of archival professionals that centers on the importance of neutrality and objectivity. This imagined neutrality in the archives. And so this approach to record collection and creation gives us this impression that there's a passive stance on the part of the archivist, which turns out not totally to be true. On the other hand, we have Schellenberger, who is at the National Archives in a variety of other important posts, and he writes really a guiding textbook in the wake of World War II. And his real contribution to the theory of the fields are guidelines for appraisal and selection of archives based on their perceived secondary value, which is an argument on the part of the archivist about what their use will be once they're preserved. And so this is the ultimate piece of judgment here, selection. What gets kept and what does not. So the selection gives the archivist the purview to assess the materials, decide whether or not they're worth keeping and providing resources to keep. So the ongoing practice of winnowing and selection and appraisal is really important to thinking about how archivists have shaped what we have the possibility to encounter and make data from. But once those materials are accessioned into a repository, the processing archives spend some time arranging them. And we can't at all suggest that that arrangement also doesn't influence the way that we understand the whole trajectory of these materials. So they may organize records into themes or order them chronologically that can predispose a historian to see some things and really, really not see other things. But the other really primary piece that we see from archivists is the creation of the finding aid, where the archivist offers a narrative description that surfaces some work of the, that surfaces some aspects of the archives and submerges others. For example, archives of religious orders, which the papers of the Maryland province Jesuits, who owned enslaved people, might be primarily described in terms of their organizational structure, their temporal affairs, and their religious duties. And as a result, the material about slavery is utterly subsumed in the finding aid. And really, the material needs to be re-described to surface these other voices. And the argument for that re-description really comes from our recent engagements with a whole body of activist archivists who are engaged in the work of critical archival studies. And it draws, their whole mission in the world is to draw our attention to the power and implications of archival work. And so they define critical archival studies as those approaches that explain what is unjust with the current state of archival research and practice. Two, positive practical goals for how such research and practice can and should be changed. And three, provide norms for such critiques. In the words of Michelle Casswell, Sanghwan, and Ricky Puzelan, critical archival studies broadens the field's scope beyond an inward practice-centered orientation and builds a critical stance towards regarding the role of archives in the production of knowledge and different types of narratives, as well as identity construction. So the result has been an increased attention to archival work with underrepresented communities and to post-colonial possibilities for preservation and access and things like that. And so some of this work involves the process of what Anthony Dunbar calls creating counter-stories in creating and recreating funding aids. And so he argues that the first counter-story approach within the archives is to develop a counter-narrative that surfaces those issues that we can't see originally. But in an era where lots of archival work is dominated by the very popular answer to few resources and less time, the notion of Meisner and Green's approach to more product, less process, just get the stuff out, the resources for re-describing the archival material are slim. Let's put it that way. And so that leaves us stuck with having to do that work of resurfacing that material ourselves as historians. So we move on to the process, the thinking about creating data sets. And just as the cumulative labor of archival professionals actively shapes the historical record, the training theoretical and methodological predispositions of historians shapes the way that they engage with those records. Much of our graduate training, as it exists now, centers on narrative choices that historians make to frame and deliver their interpretive work. Much less attention is paid to the everyday research practices and activities that move the scholar from research question to interpretation. These practices, however, represent the key decision points in the process of transforming historical records into data sets. Regardless of the source of the content of a record, scholarly engagement with new sources is an interpretive process of reading, questioning, contextualizing, and comparing. And that deep meditative focus on a source or set of sources is ideal, but it's often at odds with the way the contemporary historians conduct their research. We know most recently from an Ithaca report on the research practices of historians, and this is from 2012, that limited time and budgets suggest that most historians go into an archive and take as many pictures as they possibly can without processing those materials in situ. So they lose to some extent the arrangement that's there. Maybe that's a good thing. Maybe that's a bad thing. But that they don't necessarily have the time to get the overarching sense of the materials that they're working with. This is really, really detrimental to the process of getting a data set. Because given a significant body of material, historians grasp of the full scope of the material needs to develop slowly. And eventually, they have to stay back and achieve some bird's eye view of a partial depiction of the historical events and circumstances described in those sources. And that survey is really hard to do when things are mass chaos in a digitized facsimile realm. Some of that summative view can take part in a note-taking process, an outlining process, and those sorts of things. But for data-driven historical work, the summative view comes in the creation of the data set. And so we move from a record that looks like this to a table to the act of actually capturing the data. And that can happen, as I said, in research notes, digital images, and so on. In research notes, digital imaging, transcription, or the creation of structured rectangular data. So in my own work, the derived data that I've been working on has stemmed from hand-generating a set of document transcriptions and then extracting these individual relationships amongst people who are named in these sources, and then identifying those people in relationships to the events that they've participated in. So births, baptisms, marriage, death, inventory, records, sales, punishment, legals, things, stuff like that. And so that research data ends up being, in some way, shape or form in the work of almost all historians, some sort of spreadsheet. It ends up as research data as rectangular data. But eventually, that can get transformed into some summative view, which provides me with the fact that I've got somewhere between 1,100 and 1,200 people over this period of time. I've birthed years for a lot of them. I've got a sense of who the class of owners are in their relationship and how the interaction is going on with the free blacks of the community. And then we get sort of the more complicated relationships in the records. And so we have these mostly inferred partnerships, which we'll talk a little bit more about later, 13 sacramental marriages, 400 identified parental relationships, some baptisms, lots of births, some deaths, very few deaths with specific dates. So in this sort of overview, similar to researchers working with large numbers, medium to large numbers of people in events, the creation of this structured data represents very selected entities that are present within the sources. And unfortunately, the skills necessary to create well formed structured data are often not a part of historians' methodological training. As a result, folks find themselves constructing a data model more by chance than by intention. However, the logical structure for a data model is sometimes no more self-evident than the meaning of the information being represented. And without a thorough initial survey of the sources, it hands it super difficult to create a model that is sufficient to represent the material. And so researchers may find themselves beginning to address a set of records with a preconceived notion of what their data model should be, and not realizing until they invested a significant amount of time that their data model is not sufficient to represent the material in their sources. And so then they have to go back and revise their data model and remediate the initial records and all of those sorts of things. For most historians, the only formal exposure to structured and standardized data might be bibliographic and collections metadata through mark records and Dublin core metadata that might be associated with aggregated collections like the Digital Public Library of America or Europeana. And so creating a data model to represent historical information that is gleaned from primary sources is much, much, much different than the work that a metadata librarian or a collections professional does to describe collections as data. Catalogging metadata refers mostly to the process and the context of the creation of the source. And in developing a research data set, a historian is called upon to read a set of varied primary sources and model that data about historical people, places, and events. So we're modeling the things described in the sources. We're not describing the sources themselves. And this is a set of meso-level derived data that's very different than collections data. So we're not capturing data about the record itself, but we're capturing data about the history that is represented. And the process of designing a data model requires a kind of structural thinking that is more akin to systems design. And the methods training for contemporary historians often doesn't involve a lot of specific discussion of what those practices might be. If there's anything under conversation at all at the moment, with some smiles in the crowd, it is this particular formation from Hadley Wickham, who is a statistician who was the creator of the R package for statistical work. And he describes the creation of tidy data sets. As tidy data sets are easy to manipulate, model, and visualize, they have a specific structure. Each variable is a column, each observation is a row, and each type of observation is a table. So recording data using this very simple rectangular structure dramatically increases the possibility that it can be reused as a data set. But these principles really only begin to touch on the very basic factors that historians have to consider when they create a data model. They offer instructions about the form, but not how to select the variables to capture. And because setting up this data model is a rigid structure for the data, it fixes a host of representational choices that not only in the selection of the variables, but also in formations of the observations in the individual cells themselves. And the most difficult choice, in fact, actually lays in what to include and how to represent it semantically. For historians, often the easiest way to collect the data from an archival source is to copy the information verbatim from the document. The risk in this approach is that scholars reproduce the ontological assumptions of the record creators. And the process perpetuate the oppressive regimes of power and control that were imposed to subjugate enslaved peoples in the first place. If a historian chooses not to transcribe their records exactly, we've got a whole host of other choices. As Digital Humanities Librarians, Kitty Rawson and Trevor Munoz suggest in their SAI against cleaning, there's no underlying order to be uncovered with the work of data. Rather, each effort to shape the data for use results in the creation of a new data set. Selecting a controlled vocabulary to describe a person's racial identity cannot offer the degree of complexity necessary to fully render that aspect of identity. Indicating the relationship status, using contemporary heteronormative nuclear family representations, fails to capture the contingency and complexity of family life and fictive kin under enslavement. There are lots of other examples here. But for data to be computationally actionable, it requires a level of normalization that's not common practice amongst record creators. And so the scholar needs to normalize sometimes spellings of names and places in creating fixed dates and concatenating fields. They might undertake the process of extending data beyond that of the record creators. Beyond that which is clearly rendered within the sources themselves. So they might augment their data with external information. And they might impute fields by calculating them. Say I've got an age in one document that I can impute a birth date back from. And all of these choices bring us to a level of abstraction above the layer of the source that is really important. So these data sets can't stand on their own under any circumstances without clear and thorough documentation that accounts for every decision point along the way. This kind of documentation needs to be more than a provenance statement. Can't just say here's the link to the document I got it from. It's not a situation where footnotes are the appropriate answer. As Anthony Grafton noted in his effort to trace the origin of the footnote as a documentation technique amongst historians, quote, some of the new forms of history, this is 1997, some of the new forms of history rest on evidence that footnotes cannot accommodate. Like the massive analysis of statistical data by historical demographers, which can be verified only when they agree to let colleagues use their computer files, which is absolutely true. They're not local files anymore. So this documentation for this work needs to be significantly more thorough than the data that is contained in a traditional footnote. So this brings us to the issue of link data. For decades, historians have done this work to construct systems and capture information about their sources in the service of developing more full and insightful interpretations. And now we have the possibility to share that material on the internet. And that has fundamentally changed the way that we think about the creation of these data sets. So big picture of Tim Berners-Lee coming. Sorry about that. I couldn't find a better graphic for link data. In 2006, Tim Berners-Lee, who's the British architect, basically, of the World Wide Web, articulates this vision for the modern web that was made up of a vast mesh of truly linked data connecting information across domains using a simple set of principles. Those principles include giving each entity a stable URI, Uniform Resource Indicator, so a stable URL that can be served over the web so that users can locate those entities. And when they did, they'd find useful information that is served and structured using standardized principles. And whatever possible, that information should link to other URIs rather than plain text entities or things like that. This vision for this semantic web has been slow to develop. But it portends for us a web that is human readable and machine readable. More importantly, it holds an enormous degree of promise for bringing together scholarly work that was much siloed and disparate. The creation of linked data lets authors be explicit about the relationship between the sources they're representing on the web and a growing set of connections and elaborating a knowledge base, link by link. Berners-Lee is not thinking about historical data when he sets forth the principles of the semantic web. They're designed to be general enough to represent any type of quote unquote thing. But of course for historians, the thing matters quite a lot. Having the ability to describe faithfully the people places and events of the past and how they relate to one another based on historical evidence is basically what historians are trying to do. And for historians are new to working with data, the principles of linked data create a functional and a clear data model, well create the possibility for a functional and clear data model that lets them draw on a larger set of patterns and existing vocabularies to represent the data they're trying to capture. So RDF, Resource Description Framework, is one of those standards. RDF specifies that linked data is set up in a sentence in a semantic form. We have subject, predicate, and object. And thinking about what Berners-Lee wanted to tell us about linked data, that's a URI for a thing, a linked open data property, connecting it to a URI for another thing. The beautiful thing is that out there in the world, and right here in this site, in fact, the linked open vocabulary site has over 700 different vocabularies to describe relationships amongst entities in the world. So in some cases, the choice of the right vocabularies and properties to describe the historical system is the most complicated part. But the standards for expressing this data are constraining the sentence form produces a level of simplicity and fixity that doesn't always align with the messiness and uncertainty of historical knowledge. So the linked data model itself is descriptive. The linkages are semantic. They represent not just a relationship, but a particular kind of relationship. And the choice of the predicates is paramount. So if we look, that's going to take a little while to load. So this is the project that I'm working on now, and it is the ongoing building site for the work. You can see that we've got a relationship between Isaac Hawkins II and he is the spouse of Katherine Harrison. I actually happened to know that he is the spouse of Katherine Harrison because I have a documented marriage. But for the other partner relationships in the group, it's a much dicier thing. This idea that we use the notion of spouse of in the relationship vocabulary to describe those partnership relationship is way too simplistic. It's what we've got, but it's way too simplistic. And it fixes things in the world that suggests to a visitor, a reader, a user of that data who is not steeped in the literature that all sorts of things are happening that are not. But you can see once this loads that this is the record for Isaac Hawkins II. So he is the first son of Isaac Hawkins I. Isaac Hawkins I is at the top of the inventory. He is the first person. So Isaac Hawkins II is born in 1831. And we see here is his parent Isaac Hawkins I. We don't know yet who his mother is. But he is married to Katherine Kitty Harrison Hawkins. And he lives on White Marsh. But because of linked data, we can trace out even in this rectangular form all of the relationships that we have material for related to Isaac II and the events that he's a participant in. So this is not a dynamic looking demonstration of those kinds of relationships. But it's semantic and it's machine readable. And it represents that mezzo level data in a way that we can start to do other kinds of interesting things with. And so while I may not feel comfortable doing a network visualization of the relationships based on that spouse of predicate, the parent-child relationships I feel much more comfortable about. And so we've got some sort of quick visualizations of the parent-child relationships across all of the farms across the evidence for 125 years. There'll certainly be some narrative interpretation to go along with that in the long run. But we get to start to see a macro view while being able to drill down to the individual kind of event, the birth, the baptism, the marriage, or even the inventory listing. And some of these relationships are imputed from the inventory listing so that we don't have a birth and we don't have a marriage, but we have an inventory that says these people are family. And so to be able to document that is all super important. But we have another problem here, and that is the problem of the replicants. If we are all making data sets, and we're sharing data sets in the world, and we're sharing them on Fig Share, or a Dataverse, or a GitHub repository, or even as CSV files on our own site, if you're lucky, you're using a RESTful API. You won't have the problem that I'm suggesting. We're anticipating this thing that Anthony Grafton couldn't imagine really in 1997. And it's that those freely shared data sets start to travel and multiply and replicate. And so for each historian who shares their material, there's another historian who may try to use it. And so they grab your data set, and it comes disconnected from any documentation that you might have created about those imputed values, about the way that you constructed your data model, about the way that you have made your choices. And so that historian might take that data set and then combine it with another one. And as Ross and Munoz suggest, we have yet a new data set that has been manipulated, augmented, republished, and it gets further and further and further away from the source material that is the basis of its reality in the world, or not reality, as the case may be. So these data sets lose from their context of individual sources, from which they've been derived, and absent of clear documentation about their formation have the potential to create a clear break in the methodological agreement that professional historians have to be transparent about the materials on which their interpretive work is based. So we move on to think about the world of digital preservation. And one of the strongest foundations for how we think about digital preservation has its roots in a 1999 project out of Stanford called LOX. LOX is based on this idea, lots of copies keep SUF safe. That's true, right? Everyone tells us that if you're working on a paper and research and you want a local copy and you want a remote copy and you want a copy in the cloud because your hard drive might blow up. Well, that's the principle here with LOX, is that we get at least four copies of redundancy of the digital materials to keep it safe. But when we have this proliferation of copies in this freeform way, what we really have is a proliferation of variants, right? And the provenance is not embedded in the data. And so we need some sort of syncing system, hopefully, to keep these things safe. Because there's a question mark here. These copies are not actually copies. They're new entities in the world. And our friends in the world of bibliographic work have a system for this, functional requirements for bibliographic records, Verber. Verber is the way that you know that an author has written a book and there's a first edition and a translation and a secondary edition and all of those things are all part of the same work. There are expressions and manifestations and items. So if we've got lots of copies of the same data set, we've got to be able to find a way to stamp and version and mark them along with their documentation so that we can see them as replicants and not new variants. So aggregation turns out to be a problem. Which is scary, right? Because we've got lots of big aggregation projects in the world. Many of those aggregation projects are predicated on the notion that the data itself, a copy of the data, is being ingested to a new system and served mixed in with everybody else's data and divorced from the data set's documentation. And so there can be a link back to the original data set. And that's helpful only if the users follow it. So this trail of duty of provenance is kind of a tenuous one. And so what I would like to argue is that link data is actually a much better answer here. And that allows us to integrate material while it stays at home. The data set gets served as machine readable, human readable, link data. And you and I can link to each other's. We can reference each other's because that's exactly how link data is supposed to work. My stable URI is supposed to be connected to your stable URI with a particular predicate. And so I would argue that one way to do that is to have more tools that make it easy for people who don't have high programmatic skills to publish link data. And my pitch for our next session is that we should all use OMECA-S to do that. It was all just a big advertisement. That's not actually true at all. But one of the things that makes OMECA-S different than OMECA-Classic and makes it different from some of the other possible ways of publishing link data in the world is that it is built to be all link data underneath and to produce link data. And so the idea is that it's got a regular user interface, a form driven user interface, where you don't have to write the JSON yourself or anything like that. You don't have to encode the link data by hand, that you can do it in a regular user interface. And part and parcel of working with your data and your collections and publishing them on the web, you can participate in the effort to grow the possible inferencing and linking of the universe of link data. And we get to fight the separation of the documentation from the data and fight the proliferation of the variance. So with that, we should have some conversation. So we have questions. You have a question. Is it going to be about you being a witch? I know you're going to get me back. I just figured I'd remind you. I just keep the list. Try it when I least expect it. So I'm going to talk. I think one of the things that's really interesting here, knowing a little bit about your teaching responsibilities at MSU is there seems to be a weight here in terms of graduate education and how we're personal approaching educating humanists. And I think about this in the context that I had a discussion yesterday with a colleague. And we were talking about digital history. And we're approaching the moment because there are a lot of tools available. Everyone's a digital humanist because everyone can do something that's digital. But the real question here is what real changes are you making in terms of the digital thing that you're making, the digital part that you're doing? And it seems that if I think about how graduate students are educated around DH, what you just described, what you just talked about, I don't believe that that's really happening in graduate schools. And that seems to be the place that it probably has to happen the most. So what would you say, how are we going to get to the place where this kind of discussion and this kind of training? Because I don't think that, wow, how are we going to get there? Yeah, I think that we're sort of in, for transparency's sake, Julie and I taught the intro to digital history, digital humanities in the spring together last year. And one of the temptations of that course is to try to cover it as a buffet. And that has worked up till now. And now I'm convinced that it doesn't work. It has worked because the buffet was small enough, frankly. It was small enough for a while that we could do a range of approaches. And I think that maybe the way to, I can never think of an approach that doesn't involve adding more courses, which nobody wants to do because that increases time to degree and all of those sorts of things. But the idea that there needs to be a really much more clear attention to critical approaches to engaging with digital tools and methods and research questions. And then you can go off and take your special topics in geospatial work that then brings those critical questions to that, or to text corpus analysis, or those sorts of things, or digital public history. That it's just too much for one course. And we're all trying to say, this is our one shot to prepare everybody to have at least a baseline of knowledge. And that is an indicator of still how isolated the field is in the larger departments is that here's your one deposit of digital methods and those sorts of things. And so it's got to be, I think, kind of a dueling, more attention to critical, methodological, field driven work on one hand, and more integration of the tools, methods, and approaches, on the other hand, into the everyday work of our colleagues, who are also having those conversations about research projects and approaches and methods. But that would require all of us turning the ship at the same time. But so does link data. It's going to be slow for another decade before this vision of inferencing comes into the world. It's been slow for a decade and a half now. It's going to take a while. So no magic answer. Can you do that? Yeah? Yeah? You've given me a whole lot of stuff to try to unpack in this. Well, for one thing, there are aspects. On the issue of what graduate education is, we've been struggling with this for at least four years, because when I was in grad school, we were still dealing with a lot of variations of the same sort of things. And what you've struck on here is kind of a challenge to a very longstanding process of attribution, citation, contextualization, all those kinds of things. And one thing that strikes me on this is there's a lot about what you've shared that's a function of our own scholarly discipline. And there I don't mean discipline as in the area of study. I mean discipline as in this is the methodological process I follow because I can document this and because I can be transparent about this, et cetera, et cetera. So on the one hand, we've got a whole lot more gadgets or tools to use. Some of them provide structures that might keep us staying in our lanes, so to speak. And others don't. It's not that they aren't useful things, but even if we are magically going to have linked data in place, there's nothing inherent in that that addresses some of the concerns that you've raised in terms of methods and behaviors and processes. Yeah, the documented stuff is technology agnostic. I mean it's just sort of thinking about what are the implications for the data sets once they get out in the world, but if the data sets don't get them out in the world, they still need to be documented. I mean, Fogel and Eigerman published an entire second book on their method, right? So maybe it's an appendices if we've got a traditional monograph that comes out of this kind of work. But for digital projects, the question is, is there any documentation? And how easy is it to sever the connection? And that scares the bejesus out of me. But yeah, so it is in part a really traditional concern about how do we do the work that we do, regardless of what our tools are or are not. And on the other hand, what's the amplifying effect of the web on making that a dangerous proposition? This is not usually my approach in the world. I am like to a fault kind of techno-utopian. And so I am not usually the doomsayer in the world. But that's how this piece has turned out. And so maybe that's a good thing. So I'm just wondering about the problem of link rot, right? Especially with digital projects that, as the technological change continues, it's easy to have them in formats that are no longer particularly accessible. And the problem can come back in. Yeah, it's a huge problem. And one of the things that's implied in linked open data is that those URIs are stable. Stable for a value of what? Right? OK. And so are they handle URI mentors that you can sort of replace a regular URL with something that can be attached to the spot where it lives in the world, so if something, the URL still is the same and it's a pointer. And so there's those kinds of things. The folks at LOX have made something that I think is super important that we don't do it now. I'm all anxious about this too. They've got something called Permacisi, which is a tool to basically grab your links and grab a copy of them. Or WebRecorder to capture the entire site and put it together as a work file, which is basically the undergirding technology for the internet archive. Those are all work files. So if you go look up a site, those crawls are all work files, that not only for those of us who are doing this kind of work is that we also have to have a shared responsibility for preservation for the LOX of that work. So I'm working on something about doing digital public history, and I'm starting to feel like I need to be able to serve the work files of the public history sites in the project as opposed to pointing, because they're going away and some of them are captured in, the base level is captured in the internet archive, but I need the full architecture. So now I'm responsible for, I've convinced myself I've got to go crawl all these projects. And do the project producers want me serving an archive copy of their project? Probably not, unless theirs goes away. So it's a hard question. Yeah. Layers and layers of hardness. I have a question about perspective and where you started with Time on the Cross, some of the critiques now with the voyages, where people become numbers, but also to the practices, how I do data sets on my work on African-American history versus the other, I saw it in your thing, it's the enslaved person comes first. Never mind, yes. So even the placement of the data sets of where those categories come in, how is the perspective and response the archive by the placement of your columns? So it was a functional way for me to put that data together, but I was also intellectually concerned with the reason, one of the reasons linked data is important for doing this kind of work is that each human being gets a URL of their very own to accumulate all of the knowledge that we can possibly accumulate about them. So it was intellectually, it was really important to me that the people come first because that's embedded in the methodological approach and the theoretical approach to the materials. That is also true of the big aggregator in the field that is underway right now, the enslaved project, that's supposed to be the people of the global slave trade. And so in fact, the people are the dominant, that's the dominant set of things in their data model. And events are much less richly described. And I'm starting to wonder about whether we need more descriptive depth on the events to go along with the descriptive depth on the people. But the people turn out they end up to be synthetic items in the way that the material comes together from lots of different sources. And even in my own instance, if there's not a particular event that is tied to the particular information, so it's basically not in there twice, there's a break there. Like I have determined that this person was a blacksmith because I saw it in this letter over here. And so unless that gets in an event that is about labor, the connection's lost. So yeah, no, it's absolutely, but people have, I think, helpfully tended, particularly in the work on enslavement, to lead with people. And I think that's the right choice. Not economics. Which is basically that's the way Fogel and Engerman, they were interested in economics and intuited back from the economics as opposed to being interested in the people first. Yeah, history is about people. Soil and green, it's made of people. So, good. There's an image. Yeah, no, that's probably not the right note to strike with this particular conversation, but it came out, so. Yeah. So to return to that sort of original relationship, accessing information or pop to the data sets, the researcher and our professor. So how, given the scenario that you're qualifying, what do you envision as the ideal relationship between the researcher and the archivist as gatekeeper and organizer? Yeah, that's a really good question. As somebody who's spent a lot of years working on sort of collaborative glam projects, I'm always a little bit shocked and surprised the degree to which historians don't talk to archivists. And in some cases, they're the people who know the records best. And in some cases, they're not, but they certainly know who does, right? I would love for all of them to have the time to do the work that Michelle Caswell suggests or that Anthony Dunbar suggests and to sort of refit those finding aids for different topics and those sorts of things. My hope is sometimes that there are people like me who are doing the kind of work that is reading against the grain of the finding aid that the work itself can act as a new kind of finding aid for the material. Because in this case, this project sits on top of a digitization project. So each of the events is linked to a digitized source that's out of that larger collection of Maryland province archives. So I think that there are lots of ways that in fact, they can start to be in conversation with each other. I think the SNAC project, which is built on encoded archival context for corporations, family and people, I always get that in the wrong direction. But it's an archival description protocol. That's sort of linked data, but it's not using linked data technologies. But that is really about bringing people to the fore out of our archival description. And I think that those kinds of efforts, thinking about how we can bring these technologies and missions together will be a fruitful thing. But that again means we all have to commit to it. The whole issue of the finding aid thing about that is we're still very much, I think stuck in the print environment in the sense that there can only be one finding aid. That's right. And rather than not sure the right words to use here, but manipulating what exists, let's add to it because kind of an extension of what you were saying earlier about all of the sort of archival process piece in this is that fundamentally many of those methods developed in the context of physical items. So you have them ordered in a certain way because the object can only exist at one dimension at one time. But the finding aid is, it's an entry point space or something like that. Right, right. But even that still is very much based on the physical. Based on the physical and this notion that there is one finding aid when in fact we're in an environment now particularly with one data where you could have a hundred different finding aids. Right, right. It can be topically driven and they should be layered on top of each other and I think all of that is true. But there has to be one sort of key to where that thing lives. Right, but the ways that it can be accessed is sort of a different thing. When I think about this, I think my friend Trevor Owens when he was first at the Library of Congress was working on a set of born digital archival papers and he and Ed Summers undertook the mission to use topic modeling to create the first finding aid. I mean, like there are lots of ways that we can do this without hand processing them to just give us a sense of what's in the material. So there can and should be lots of different ways. Well, thank you. Thank you for your time. Thank you.