 Thanks very much, that was wonderful and I'm really pleased to be here, although I got to say I am speaking today with a certain amount of trepidation primarily because of the audience. As a scholar and as a historian I look back to now kind of 40 years of professional work and realize that not just in retrospect, just in general, how naive and how stupid most historians are when it comes to understanding quite how libraries work and how they shape the systems within which they work. I hope you'll excuse me talking to professionals about libraries and systems for ordering information from that rather disadvantaged perspective. Now my generation of scholars inherited what I would think of as a mature system of ordering the products of the human intellect created by librarians and manifest in libraries. A system of selecting and preserving what was felt to be valuable and turning it to account in the present as history, as literary criticism, as philosophy and all the other subjects. For what we inherited, the image that always brings to my mind is the old round reading room at the British Library in London, an architect design model of the sum of human knowledge with all the contemporary categories of knowing written gold letters around its enclosing dome. To consult one of the 450 volumes of the manuscript catalog was to be silently reminded of both the vastness of human knowledge and your own tiny puny place among all the rest. The topic of my doctoral thesis took up some three pages of the catalog from among approximately 180,000. For me, the irony of that inheritance, that library associated archives and all of its international fellows is that it was so perfect, so seemingly natural to me as a young PhD student that it almost disappeared from consciousness. We became so accustomed to its ways that we let it shape our thoughts without interrogating its purpose and for all the Jeremiah's of scholars such as Michel Foucault, warning of the workings of the archive as a system of power. We seldom described our own historical work or scholarship more generally as a product of that power system. The archivists and the librarians occasionally received a thank you and the acknowledgments of this book or that and arguably much of postmodernism formed an inchoate critique of the power of the library. But our blinkered eyes and directed thoughts shaped by that machine for knowing created by librarians and archivists remained and remains, it seems to me, the dirty little secret of humanities research. And what I want to do this afternoon is describe my response, my attempt to essentially expose the workings of that dirty little secret. To describe work a colleague of mine, Ben Jackson, who's a postdoctoral research fellow at the library at the University of Sussex. And I have been doing recently in an attempt to address that relationship and to both perform historical research practice, which desperately needs reform and expose to scrutiny that direct relationship between the library and the scholar. Now, I don't know how many of you read grant applications, I've read too many of them, I've written a few too. When humanists are confronted by research grant application and asked to describe their methodology, they are frequently flummoxed. Most historians simply want to write, I'll sit in a library and then an archive and read some stuff. Well, most literary scholars want to say, I'll sit in a library and read some stuff and then I'll think about it for a while. Well, the philosophers of course want to say, I'll read some stuff very, very closely and then I'll think about it for a very, very long time. And the reason they respond in this way, the reason they're flummoxed by that question set by every funding council is not for the lack of a methodology, but precisely because their methods have been built into the very structures of human record keeping, libraries and archives. The shape and limits of their every project are predetermined by the system of archives and libraries upon which they rely. I suspect that observation is blindingly obvious to this particular audience, but it has only recently entered my way of thinking to see humanists and scholars as half library, half human cyborgs. Their thoughts both controlled and acted out through the library, that machine for knowing. And the important thing it seems to me is that that machine has fundamentally changed from a desk in a humanities department. It is not so obvious, but anybody who actually interrogates anybody obviously in the field of libraries, knows the revolutions that have overtaken this sector over the last two to three generations. And in the old school physical library, at least the systems of power were on display, even as scholars chose not to acknowledge them. You could see the relative strengths and weaknesses of a collection by simply counting the aisles of stacks dedicated to each topic. But in the last 30 years, or 40 years or 50 years, depending on how far you want to go back, as librarians and academic publishers have charged ahead with a new system of power, scholars have both been confronted with a rapidly evolving library interface and increasingly denied the cues and physical evidence of how that system is actually ordered. Arguably the decline of post-modernism as a critique of orders of knowledge over the last 20 years is a simple reflection of the extent to which the controlling structures of libraries are increasingly hidden. For my cyborg scholars, their machine have changed dramatically and in very distinctive ways while the wet half of that equation has been increasingly distanced from the process. It is not only Google that has adopted that box in space, that single seemingly universal search box in the middle of a blank screen. Almost all major libraries have moved in precisely that direction. It is the new rabbit hole, or perhaps it's simply the newest wardrobe entrance to a fantasy world land beyond, but it hides ever more completely the shape of the data being searched. When you read an article on the screen in your office, instead of getting it down from the shelves, you lose all knowledge of where it sits in the system and how it relates to the million of other articles, or perhaps even the article next door. The problems with Google are well known and well attested. Google lies endlessly about the volume of data out there. It misrepresents the world of human knowledge with every click, but libraries are increasingly doing exactly the same thing. And for me, this is finally getting to the subject of the talk, by the way. It is because of those changes, that hiding of the shape, the physical layout, the structure of knowledge that libraries hold, that change in the nature of the machine side of that library human side work, that the wet side, to me, looks increasingly foolish. Our research methodologies have not kept pace, and our acknowledgement of the systems of power has become ever more muted. Most human scholars, for example, lie in every footnote about the digital nature of the sources they consult, how they consulted them, and how you would go about finding them. Now, to illustrate that problem, I could point to the 12,300 results Google claims from my most recent search on the phrase 18th century crime. Compared to the actual 89 results it then delivers. It is lying with every page of search results. But more interestingly, it seems to me, our instances such as the British Museum's website, a colleague of mine, James Baker, is leading a project to map the distributed text associated with different types of images in the British Museum's print collection. And it turns out that one variety, one variety of print of image, 18th century satirical political prints were much more fully described than any others. There are simply more words associated with each print than any other type of museum object. The result, this resulted from the work of one person, Dorothy George, who is the founding historian of the history of 18th century London, and her descriptions are beautiful and I would recommend them to anyone. They were groundbreaking when she wrote them in the 1930s, 40s, and 50s. But when those catalogs were digitized and added to all the others in the British Museum and made accessible through that single keyword box in a blank page, something rather weird happened. For every search on every word, whether it was relevant or not, 18th century satirical prints started showing up more and more in the results, regardless of what you were actually looking for. And this simply because more individual words were associated with each print than any other sort. Now, in combination with the two new technologies of imagery production, that silently affected the variety of scholarship undertaken by, on all subjects. And you can literally observe the impact of that transition from a catalog to keyword search in the kinds of works that come out over the succeeding decade. In other words, for me, it feels as if humanist scholars and scholarship have become ever less aware of the nature of what they are searching and how. There is just no good representation of the ecosystem of knowledge in which we operate. And this, in turn, challenged me and Ben to look for new ways to represent that context at both the smallest scale and at the largest. And what I want to spend the rest of my time doing is describing the progress we've made so far. And you'll excuse me for using a whole bunch of slides that actually have scatter plots on them. It's a terrible thing. There should be just joyous images of some sort, but never mind. And what we've been working on is an attempt to create a macroscope, a term and idea proposed by Katie Burner. There's just a computer facility that allows you to see an object at all scales at once, from the most distant to the most granular. I think of it as a form of radical contextualization, a way of surfacing the environment in which I as a scholar work. And in pursuit of this idea, Ben Jackson and I have spent the last few years trying to create something in that spirit. It's not right. It is not particularly pretty. And it is not yet finished. In fact, some of the stuff I'm going to talk about today was finished last night at 8.30. But it is an attempt to reconfigure the tools to match humanist methods, and at the same time reconfigure our representation of the library to help us understand the knowledge systems with which we are working. And we're going to demonstrate it by way of a journey into a specific piece of historical material and then a journey back outwards from a couple of library and archival catalogs encompassing just tens of millions of items down to a single word and from a single word back outwards to the fullest context I can find. Now we started with two different sorts of catalogs. First, the Discovery catalog created over the last 30 years or so by the National Archives in the UK. And second, the catalog of the University of Sussex Library. And thank you, Jane Harville, for permission to do this. We are eventually aiming to encompass all marked records and all of WorldCat as a way of standing back and saying, what does that body of material actually look like? But we had to start somewhere. This is a Discovery catalog. And for that, we are creating a simple interactive scatter plot where each dot, and this is just 35 million or so, reflect both the number of words in each entry along the y-axis while the collection and the archive information is recorded along the x-axis. At a glance, what this does is allow you to see just how the records of the British past are distributed both between archives and collections, mapping record type with repository. It shows, for instance, the two-thirds of the records of the National Archives of Military Archives. And that this material is less fully cataloged than, for example, the Home Office material. It begins to give you a lens and a filter that allows you to see a kind of context. It also shows that the National Archives holds a substantial majority of all archive records in Britain and that there is a large disparity in cataloging detail from archive to archive. You can trace the boundaries between archives but simply the number of words that are allocated to each catalog entry. Some are made up of brief notes and others are endlessly verbose. So, starting with a vision of the whole, we can use the graphic itself as part of a search procedure. We can zoom in to a specific collection and again see how the records are distributed both by collection and the number of words in each catalog entry. And more than this, we can start to select against the series of facets to drill down ever further into that material while maintaining a sense of where it sits in the whole. In the first instance, I'm actually interested in a record series that's already been mentioned that I spent much of the last 25 years digitizing and making available for historical analysis the old Bailey proceedings, of which I'll talk a little bit more in a minute. But before I do, I want to kind of shift catalogs and look at that very different sort of catalog that you find in the context of a library. And in particular, the University of Sussex Library. Though, because library and archival catalogs are so fundamentally different, we need to take a slightly different approach. So, that was one version set within an archival frame and this is where we get to with something like a library catalog. Our first discovery was, of course, that all major libraries are complex amalgams of collections and sets cataloged supposedly in beautiful consistency but in fact likely to contain a whole series of varied subcollections that are cataloged to different standards in different ways with different keywords and arguably with five or six different cataloging systems. They are built over generations and that for all the aspirations to universal consistency, they aren't consistent. My favorite element of this graph is, let's see, isn't it work? No. Is the small collection of inherited materials on education that remain cataloged in Dewey despite the fact that the catalog as a whole has been Library of Congress for almost ever instead of that adapted Library of Congress system. But just dealing with the Library of Congress alone, you get down to another mapping of information, of elements. Let's us begin to represent this collection in new ways. This is by the number of words associated with each catalog entry or each mark record. This is by year of publication and reflects the evolution of both the Library and its collections and more faintly traces of historical events such as the First and Second World Wars. Along the way, it also represents the development of each individual discipline. When things were published, how they were published, how they were thought to be in one thing or another are all there if you look hard. This is then the subset of history organized by the number of words in each entry in each catalog entry and then the same information organized by year of publication. The blue represents British history, DA, just in case you're interested in that. For myself, I'm overwhelmed by that. I'm overwhelmed by the patterns that it reveals and the extent to what you can actually stare at and hope to identify specific patterns. It's not about simply applying an algorithm or indeed a methodology. It is about staring hard at data and making library catalogs into data so that you can stare hard at them. But as far as I'm concerned, this just gets you through the library door. You can see the shape of the building. You now have to find a seat and get to work. And this is where I get back to the old Bailey proceedings. I've now spent too long on them and they form an important source for the history of crime, poverty, and the law. They contain detailed trial records for 197,745 trials held at the old Bailey between 1674 and 1913 and include some 127 million words of text. My claim has always been that they represent the largest body of text detailing the lives of ordinary people ever published. But the important thing for the moment is that they can be accessed either from the Discovery catalog or the catalog of the University of Sussex Library. And from either of those models of knowledge, you can go to here. This is really just an interactive scatter plot of each one of those 197,745 trials ordered both by the date of the trial and by their length in words. It's a long scale on the y-axis. And from here, you can interactively drill down, facet and select the categories of trial chosen by offense or gender or verdict. And the important thing is that as with the graph for the Discovery catalog and the Sussex Library catalog, you're continually confronted with the context confronted by either the typicality or unusualness of your object of research. Does it sit in the middle of a giant blob of dots? Does it lie? Is it an outlier at the top or bottom? And for this instance, we're simply going to drill down to a single period, the 1870s, to a gender, female defendants, to a verdict, guilty, to crime, theft, until we get to a single trial. In this instance, that of Sarah Durant, a 61-year-old widow who was charged with stealing and then handling two banknotes worth some 2,000 pounds in 1871. That's there. And now we can get to work. We can read the trial line by line. Sarah appears at the Albany Loan once on the 9th of January, 1871, charged with pickpocketing and receiving. We can go from that dot in a graph to the full text of the trial, and we can read the trial in just the way one has always read a trial as a form of close reading, sitting there imposing a narrative on your reading as you go through. She claimed to have found two banknotes on the floor of a coffeehouse she ran in the London Road in 1870, at which point she did what most of us would do. She pocketed them. I would anyway. We could contextualize this trial in any number of ways, and I know what the street was like, who Sarah shared her prison cell with, and a great deal more. But as importantly, I can now begin to subject that trial itself to a precise close reading. In the first instance, we can simply view the text's dialogue, and if you look at the graphic at the bottom of that screen, what that does is take that back and forth of the trial and represent it as a series of conversations. Or we can listen to it as a computer-generated version. This particular trial is an earlier trial of a woman named Mary Dyer, and what we have created is a graphic representation of how people converse back and forth, what in linguistics would be thought of as programatics. Or you can view this trial in the context of 40,000 others, and once you actually have a graphic of how the back and forth works, easy enough to move to a situation where you're looking at 40,000 trials all on a single wall. Again, to stare at, to look for, to identify patterns that aren't simply validated through statistics, but validated through the pattern that they create in a visualization. Or if we wanted to treat the text of that trial simply as a text, to use the tools of corporal linguistics to analyze it, we can simply automatically transfer it into something like Voyant Tools to give us access to a series of linguistic approaches, word clouds, trends, context. This is Sarah Durant's trial, automatically exported to Voyant Tools and exposed as, if you like, a collection of words, both with context and frequency indicators. And out of this comes a sense of what the trial is about, but also where the significant words might be. And for me, the word that springs out from that is detective. It's there a fair amount, it's a relatively common word, and you wouldn't necessarily expect it. It seems normal enough, given that this was the decade where the first Sherlock Holmes novels were set, or stories were set, but it nevertheless seems unusual. And let's see if I, yeah. And we can then highlight any word, that word or phrase, and it will automatically form the basis for a series of contextual analysis. This is where you get down to a different form of computer-assisted close reading. First, we can put it into an engram viewer. I know all the critiques of the engram viewer that you have ever heard. And then into the Oxford Historical Thesaurus, both of which begin to reflect the extent to which the word detective was in the argolism, a new word in 1871, and that Sarah's trial involved a relatively new series of systems and organizations, police detectives, among other things. Suddenly our close reading becomes something slightly different. We have drilled down from the highest context to a single word. It feels, like to me, like that traditional process made better, more powerful. As if you like the journey downwards, that is if you like the journey downwards, down to that single word and outwards, into a series of tools that are emerging rapidly as standard within digital humanities. We can then move in the opposite direction. And that's actually what excites me more. We can go upwards. We can go back to where the word detective actually appears in that 127 million words of text in the Old Bailey. And upwards from that to where it appears in the university, in either the discovery catalog or the university library. You'll notice that all the red dots, all the trials with detective are really post 18, well, they are, I'll have to take my word for it, about post 1860, although the first detective force was, of course, established in 1843. This is where detective appears in the university catalog, mainly in literature associated with detective fiction, but also history, law, and sociology. And with a source like hierarchy of synonyms, we can move upwards again. Where would you want to put detective in that set of synonyms? And crime is what occurs to me. This is, of course, crime in that same catalog, reflecting a wider distribution of works that might be relevant. And perhaps up again to the area of law. This is law as it appears in the marked records of that collection of that research library. Mapping, it seems to me, into an ever larger context, across an ever growing number of disciplines. For from that single close computer assisted reading of a single trial, into a context that actually should make more sense of how you read it. To a point where a reading of the trial is both close and distant at the same time. For me, satisfying the ultimate meaning of a macroscope. And when you do so, the typicality of that trial is put in doubt. And it turns out that in almost all respects Sarah Durant was an unusual criminal and an unusual convict. And her experience was simply atypical. By the 1870s, only approximately 15% of defendants of the Old Bailey were women. Down from over 50% at the beginning of the 18th century. And of these, over half were between the ages of 18 and 30 years old. Sarah was one of only three women over the age of 60, tried at the Old Bailey in the year 1871. While the language of detective is itself clearly new and different from the catalog data, it's also clear that it entered the public consciousness via literature, if we look back at where it appears in the catalog, if it entered that consciousness at all. And of course, we knew so much more. The context grows ever larger. And exposing it as part of the process of research and close reading gives new meaning to every word. I suppose what all of this for me is about, is about trying to figure out how I as a scholar, as a historian, as a researcher, navigate that endlessly changing body of material which you as librarians and archivists are responsible for organizing. And I don't have an answer. But I don't know what to do next about that or how to do that with my students. But what I do know is that that hidden system, the systems that are going ever further down, below the surface, behind that single search box, make it more difficult for me as a historian, for my students, for users of your systems to actually understand exactly what they're doing. Most people over the age of 35 or 40 were introduced to old school library systems and forms of catalog. People under 35 don't know what any of that is. And without some form of context, some way of representing the structures that we are presenting and preserving for them, it becomes impossible to make real sense. We haven't gotten there yet, but having started with a close reading, the context for every word is emerging as an extension of a given text, from word to trial to the old Bailey proceedings to the university catalog and hopefully to eventually next to WorldCat to a model of what a complete catalog might contain, even though it will never exist. We're moving, we're attempting to find ways of actually visualizing that broader context. For me, what it feels like is sitting in the round reading room of the British Library with all the tools of old fashioned scholarship available at the stretch of a hand, but with the power of the archive and the catalog laid bare and made explicit. The worlds of research and libraries are intellectually set on a journey towards new forms of search and analysis. And I guess where I want to end is a plea to remember that context is all, that unless we expose the limits of our collections, expose the structures of authority they reflect, the limits of our own ability to know, unless we stop lying like Google does about the infinite archive, we will fail to serve the purposes of effective scholarship. Scholars can't and shouldn't seek to escape their cyborg state, but we should seek to know the machinery in which we're encased and use it as a shed load better than we have up till now. Thanks.