 OK, good morning. I think we're going to go ahead and get started. So this is the session on digital forensics and cultural heritage. My name is Matt Kirschenbaum. I'm from the University of Maryland. I'm an associate professor of English there. And I'm also the associate director for the Maryland Institute for Technology and the Humanities, or MYTH. And with me today is Rachel Donahue, who is an archives doctoral student at the iSchool at Maryland and also a research assistant at MYTH. Can people hear me in the back? OK, is the sound good? OK, good. So the project briefing for this morning is really, in some sense, about a report that is due to be published today. How's that for timing? This will be out sometime during the day from CLEAR, the Council on Library and Information Resources. I'll say more about it in just a moment. But in addition to the report proper, I think in some sense the sort of more important context for the session is to open up a wider discussion more generally about the convergence of these two topics, digital forensics and cultural heritage. I'm hoping we should have ample time for that towards the end. So the report itself is co-authored. Well, I'll get to that. My co-author is in a moment, too. I should say we're funded by the Mellon Foundation. And the report will be available both electronically online from the CLEAR website. And it will also, in another couple of weeks, just in time for the holidays, maybe, be available in hard copies so you can order it in the usual way from CLEAR if you would like a hard copy. So the authors, in addition to myself and Rachel, we are joined by Richard Ovenden, who is the Associate Director of the Baldwin Libraries and the Keeper of Special Collections there. And also Gabriella Redwine, who is an electronic archivist and records manager at the Ransom Center at UT Austin. So we are your four co-conspirators. In addition to the authoring group, there were six consultants who advised us on the report, who read drafts, and who also authored a series of side bars on more specialized topics that appear throughout. You can see them there. We've tried to be as interdisciplinary as possible in our representation, both in the authoring group and amongst the consultants. So we have digital forensics specialists. We have information specialists. We have archives people. We have people from the scholarly world. Just I'm going to move through these first few slides, very quickly, but just to give you a little bit of the backdrop behind the project. This was initially proposed to Mellon back in early 2009. They funded us for a year to conduct the actual research and writing. In this past May, there was a two-day symposium, also sponsored by the Mellon Foundation at the University of Maryland. And so we convened this symposium as both a way of garnering feedback on an earlier draft of the report, but also as a way of providing some focus to the community. And I'll say more about the symposium in a moment, too. And we move forward through the revision process. Here we are December 2010. And at the end of this presentation, balloons will come down from the ceiling and the report should be live on the web. The audience for this includes certainly specialists in cultural heritage, archivists in particular, particularly those working within manuscript collections. We also hope to reach some people in the technical forensics world. I think, importantly, the audience also includes scholars, humanities scholars, textual scholars, really anyone with an interest in the transmission of documents and textual artifacts. The audience includes funders. And I should also add that it includes donors, originators of born digital materials, who might find it to be a useful reference. The purpose, again, is largely to introduce the field of computer forensics or digital forensics to cultural heritage community. It's to identify some points of convergence. And finally, it's to create a basis for further contact and collaboration between people in our world and people in legal settings and computer security, the commercial forensics industry. I won't linger over the table of contents either because I think, again, later today, I'll be able to peruse it for yourselves. I'll note that the heart of the report is this long middle section, which we've entitled challenges. We have a section on ethics. We have conclusions and recommendations, which I'll touch on in the presentation. And we also, I think, crucially have a, we have two very data-rich appendices compiled by Rachel, which survey the hardware and software that is currently available for various kinds of forensic analysis. So it's a useful holiday shopping guide for that archivist in your life. Maryland meeting, these are some of the people who spoke. We had about 60 people this past May. One of the key reasons for holding the meeting at the University of Maryland just up the road in College Park is the concentration of government expertise industry expertise in this area. And so we had, in addition to people from cultural heritage, we had representation from NIST. We had the Department of Defense there. We had the National Archives there. We had several commercial vendors. So the location proved to be enabling. So I'm going to move from the, sort of the overview of the report proper into a discussion of digital forensics itself. I suspect there are probably varying levels of familiarity in the room, but to at least give everybody a kind of common basis for a discussion, I'll start with this definition, which is a few years old by now, but it's one I still keep coming back to. I like it. Computer forensics involves the preservation, identification, extraction, documentation, and interpretation of computer data. I particularly like the inclusion of interpretation there. This is CNI, it's not CSI. This is something I found online on a blog by a professional forensics analyst. And it's a fairly actually, sort of personal and moving account of what it means to be, to live one's life on a daily basis, peering into the murky depths of people's hard drives and the kinds of material that one finds there. He has this very sort of bracing injunction for you to think about everything that you might have stored on your computer, everything that's there. Seriously, think about it. I'll give you a moment. I think one thing that we tried to emphasize, both in the report itself and in our thinking about the digital forensic space is that it does not exist in a vacuum and I think particularly a humanities perspective furnishes us with some important contexts. There's the field of diplomatics which originates in the 17th century as a kind of science of document authentication, much better known in Europe and Canada than it is here in the States. There is questioned document examination which is one of the forensic sciences and revolves around paper, analog materials, authorship attribution, detection of forgeries, the sort of thing one might expect and I think crucially there's analytic and descriptive bibliography from the world of textual scholarship. I think over and over again in our work on this, we were struck by the parallels, the analogs that exist between thinking about books in particular as physical artifacts and thinking about born digital objects in the overtly material ways that forensics asks us to. There is a domain, a branch of forensic science known as trace evidence. It's a field that was really pioneered by this gentleman, Edmund LeCard, back in the late 19th century. This is the person who really pioneered techniques behind looking at things like paint chips and sort of the inevitable physical material remnants of actions in the physical world and his core dictum that every student of the forensic sciences knows is every contact leaves a trace and we've certainly found that this holds true in the digital world as well as the analog world. There's some interesting work around treating the computer as a crime scene, not necessarily the sort of framework we might want in an archival setting, but I think what is useful here is the way in which the perception of a computer as a kind of complete environment is reinforced. So potentially everything about the machine is of interest not only to a forensic investigator but also to an archivist and to a future scholar, certainly features of the hardware itself. Simple sort of questions like did the user work with two displays in tandem? If you're interested in an author's composition and revision process, it actually might make a big difference to know that there were sort of two large screen displays that she was working with as opposed to, if you think back to the original Mac classics where you had that really tiny screen with only a small portion of text that would be visible at once. Certainly the complete environment in the sense of the operating system, potentially everything that is only the hard drive. There are several different sub-branches, sub-domains of computer forensics or digital forensics. We focus in the report very heavily on file system forensics, which is essentially the analysis of file systems on storage media, be it a hard drive or some other kind of media. There are other areas of forensics though. A lot of them revolve around operating on live systems and sort of detecting and fending off hostile intrusions. Not of, I think, such immediate relevance for us. The last two, however, I think are going to become increasingly important. There's the space of web forensics and mobile forensics. Both of these, I think, are of obvious significance for this community. So a little bit of sort of technical theory, if you will, to just present a few kind of baselines. One of the key terms that a forensics specialist would be acquainted with is the idea of remnants, of data remnants. And you can see a definition there. Data remnants is the residual physical representation of data that has been in some way erased. What you're looking at on the screen, at the top you have imagery from a technique known as magnetic force microscopy. You're looking at bits on the surface of a hard drive or really more precisely, you're looking at a series of magnetic flux reversals which collectively constitute one or more bits. But the key thing to notice is the, there's that kind of blurry area along the top. That's what's known in the trade as an erase band. And what you're seeing there is a result of the phenomenon that the read-write head of the hard drive never writes in exactly the same physical space on the drive twice. We're talking about tolerances that are measured at the nano-scale here, but nonetheless with sophisticated instrumentation, you can detect those discrepancies. And theoretically, the keyword here being theoretically, use imagery like this to reconstruct a file. There are no known cases of that actually having been done in the literature, but it's a kind of exotic and exciting claim that one often sees proffered. And then on the bottom here, that's actually, I believe that is shredded hard drive platters. I like to include that image because the ultimate DOD standard for sanitizing magnetic media is to destroy, disintegrate or smelt it. And so I think what's notable about that is really a worldview in which the only sort of true safeguard against the remnants phenomenon is to physically destroy the media. So we have sort of the physics of storage media on the one hand. We also have file systems. I'm not going to linger over explaining what a file system is for this audience. Suffice to say that it is that which translates between the experience of a document or some other electronic object that you sort of work with in the real world on your screen and the actual stored representation of that document or object on some piece of media where what happens down at the level of storage is not nearly as sort of coherent and homogenous as you see presented to you as the end user on the screen. I've listed some common file systems here and a skilled forensic analyst and increasingly digital archivists who still need rich familiarity with many of these. I have a, I've written about this at more length in a book called Mechanisms and there I introduce a distinction between forensic and formal materiality which essentially corresponds to these sort of twin areas of interest, both the physicality of the media itself on the one hand and the workings of the file system where what's happening again at the level of the disk is ultimately very different from what an end user experience is. I discuss these in terms of forensic and formal materiality as one way of making that distinction. You can in the published literature on forensics you can find sort of scary cautions like this that it's essentially all but impossible to erase a file system in its entirety and that's a function of the way in which data is propagated through the internal operating system. Another key concept from the forensics literature that I think is especially useful in an archives setting is the idea of an order of volatility and this is something that you'll find presented in one form or another in most any forensics textbook. Essentially what we're doing here is we're looking at different states that data can be in either on a network and or within a file system and so the order of volatility is making distinctions between data that is stored say in RAM memory, data that is in swap space, data that is on some kind of external media and I think that this is useful as a way of underscoring that when we're talking about digital data it's hard to generalize the kinds of analysis that one might do with something that is on a floppy diskette is going to be different from trying to reconstruct traces of something that was happening in RAM maybe by looking at the machine's registry. Let's see. So this is one example of the kinds of findings that forensic analysis can reveal. What we have here is a disk image of an old Apple II game, it's called Mystery House and it's being presented here in an Apple II emulator. If we use a hex editor to examine that same disk image we find what appear to be instructions for some other game and from this we can begin to make inferences about the history of this particular individual piece of storage media, this individual floppy diskette and we can for example go ahead and find out things about the residual traces of this other game and when it was published compared to when the Mystery House game was published and again in ways that I think are not ultimately unlike what an analytical bibliographer would do we can begin to reconstruct histories of individual electronic artifacts. Another central technique in the forensics world is what is known as data carving. This is an outgrowth of the way in which data is stored on your drive, data is stored in clusters that are of fixed size so if you've ever wondered why you have for example a very tiny text file that's still taking up 4,096 kilobytes on your disk it's because of the way in which the file system is allocating clusters and as a result of that what often happens is that you have unused space what's called slack space at the end of a cluster because the data that you're storing in that cluster is not exactly the same size as the cluster. Data carving is a technique that a lot of the forensic software packages that are out there will offer as a way of leveraging that phenomenon so it's essentially a way of discovering data that is at the end of a particular cluster on the disk. Particularly useful for allowing us to reconstruct files often what we find is that a file may be deleted in part but not completely so data carving allows us to reconstruct the file by finding a piece of its header here in the slack space of one cluster and that gives you the file type and once you have the file type you can begin to take steps to again reconstruct it. We seem to have dropped an image but this was a slide about the registry and particularly on Windows systems these sorts of things one can learn from looking at the registry. Chuck sums, I've lingered over sort of data recovery but one of the sort of I think the most important way that forensics is currently being used in archival settings is not for sort of CSI style data recovery but for more mundane practices around documentation and authentication. Chuck sums are a familiar concept that any of the standard forensics packages will provide functionality for so this is essentially the algorithmic hashing for purposes of verifying that a file has not been altered or tampered with. One of the fun things about working in this space is that there are great toys. This is Fred or the forensic recovery evidence device. This comes in both desktop and mobile incarnations. Essentially what you have here is a tower that is loaded up with lots of different drive bays for either imaging multiple hard drives simultaneously or various other forms of storage media. I think those are currently retailing for around 15K which is certainly expensive but it's not necessarily out of scope for a larger institution. Stanford has a couple of these. Emory I believe has just purchased one. You also have portable drive capture devices and these are designed for field investigation. In the archives world it would be useful for visiting a donor and being able to image a file system on site as opposed to taking possession of the computer itself. Moving into software just again trying to give you a quick overview. What you see here is a screenshot from NCASE published by a product of guidance systems. This is currently the sort of Cadillac Suite for forensic software, very high end commercial. And then we have at the other end of the spectrum we have SleuthKit which is open source. It's a set of UNIX shell scripts for various kinds of disk image analysis and data extraction from the disk images. It also has a sort of somewhat more user friendly HTML layer that's called autopsy. So actually what we're looking at here is SleuthKit autopsy. Again this is open source software and it comes with a much better price tag than NCASE does but you really need to be comfortable sort of messing around at the command line in order to install and use it. So in the archives world as I was saying a moment ago I think the core uses for forensics are really in the areas of authenticity and integrity. Certainly discovery, a software package like NCASE is very good at allowing you to essentially bookmark individual data clusters and so you can bookmark them, you can annotate them. You can see why this would be useful for a forensics investigator trying to build a sort of case presentation but it's also potentially I think a powerful scholarly environment for the analysis of born digital materials. Redaction, commercial forensic software is very good at finding things like pornography, credit card numbers, social security numbers, personal identifiable information and it turns out that that's often the sort of thing that donors want archivists to redact before their material goes public and then finally there is the data recovery techniques. Some places in our world where this software and the hardware is currently being used, the British library, Jeremy John there has been a real pioneer here and he has some discussion of forensics in the digital lives report that they just released about six months ago. The Bob Leigh Inn, Richard Ovendin again is one of the co-authors on our report, Stanford and Emery who I both mentioned, UT Austin where Gabby Redwine is and also Mythat at Maryland. No doubt there are other places as well that I've not included and so I see boxes and arrows so that must mean it's time for Rachel. Not sure what to think about being known as the one with the boxes and arrows but at least it's easily identifiable so this is kind of fun because usually when I go to conferences I'm the one that gets up here and says so everybody else up here has really practical things to say but I live in research la la land and I'm going to tell you about my dreams but this time I actually get to talk about things that are practical and the way you would really use these in the archives world. So I have up here what if there is a typical archival workflow could be considered the typical archival workflow except that really I could have put appraisal down the bottom pointing at everything like I have preservation pointing at everything. Basically you identify stuff that you might want, you get the stuff in, you look at the stuff and identify what's worth keeping again. You process it which is the sort of getting intellectual and physical control and describing it and arranging it in such a way that doesn't disrupt the way that it might have been originally ordered and while you're doing that you also determine what you want to keep and what you don't and then you do reference for it and references basically search and access you're finding things that people want and you're giving it to them. So the question here is where can forensic methodologies and tools come in handy? Everywhere, everywhere. But before I get into how specifically it can be useful I should point out that it's not a panacea. I might describe some things that sound like magic and indeed they look like magic the first time that you do them but they only look like magic when they actually work and they often don't. So the first thing usually where you will get, this will come in handy is when you are actually getting the collection. In archives we call this accession in digital forensics. You might find this referred to as acquisition or evidence collection. And you might also hear acquisition in the archives world but the difference is that that's just like the stuff came in whereas accession is we have legal and intellectual control over it. And at accession you'll have both hardware and software tools that will come in handy. Up there we have a couple of things that do two special things. One is they can take hard drives and attach them to your computer whether or not you have a day for them. The other is that they do physical write blocking. Back in the days of floppy disks we could flip a switch and make sure that no matter what we did to the disk on the computer it would not affect the original data. But now there's really no way to make sure that what we do to a hard drive that we have attached doesn't affect it except to use a physical write blocker or hardware write blocker which will prevent anything from going back at whatever you're looking at while you're gathering your data. And then of course you don't just push a button and have the data get acquired. There are ways that you have to do that. And the methods for imaging vary greatly. From down the bottom we have the DD command in Unix which might be a little scary. This goes up to a sort of pseudo graphical interface. This is an image of a live CD called clonezilla which is actually really, really useful. And it walks you through how to use the commands but doesn't actually give you anything to click on with your mouse. And then you get into appraisal. And this was actually almost the hardest slide for you to put together. It's like, well, how does forensics really help you with appraisal? And in forensics this would be sort of the examination and the analysis phase. And I got to thinking about looking at big collections. If you're appraising them you don't look at every item. There is no way you can do that. You might do some sampling. You might look for specific kinds of records. And forensics packages give you a variety of ways to really automate that. They can make it useful for finding the things that are really worth keeping or the things that are worth flagging for evidence in a trial. And up here we have screenshots from the autopsy forensics browser. The forensic toolkit and end case. And you can see that although they are really technical they have pretty friendly interfaces. Processing, which in the forensics world might be called case management or reporting. This actually is a place where in addition to the imaging and the capture of the data, forensics really shines. Because doing digital forensics as a criminal investigator is all about creating the dossier and figuring out what evidence you have and arranging it in a way that's useful to prosecute. Whereas processing in archives is all about identifying what you have so that people can find it and putting it in your archives in such a way that also people can find it. So case management is, it gives you a variety of tools. Now I was talking about annotation and bookmarking. You can collect things by individual or which is maintaining the provenance of the fawns like we would in archives. And it gives you a lot of opportunity to add metadata for a description. It's not exactly EAD compliant. But it is all there put together for you and would do in a pinch. And they do some of them export in XML. So if you wanted to you could do an XSLT conversion to EAD. And then there is reference. Which I think is sometimes an under-examined aspect of archives but is really important because kind of what's the point if nobody's gonna use them. And in forensics this would sort of be the discovery aspect that would be used as much by the prosecuting lawyer as the criminal investigator. And I have here a screenshot from the Windows 7 search box which I don't know if you've used Windows 7 but strangely it seems to me that the Windows search box has gotten less complex as it goes on which maybe is easier to use but also means you have a lot fewer options for how you search which means you get back a lot of irrelevant results. And when you're dealing with big data sets or entire computers full of information and you're trying to find something for a researcher or for yourself or an exhibit that can be really difficult. But if you look at the screenshots that we have up here from P2P Marshall, ParibnP2 Commander and I think another one from FTK you'll see that they give you many, many more complex options that you can search by which makes it much easier to find something specific that you're digging for. And these vary from your ordinary things like keywords in the file name and file type to keywords in the document itself too because a lot of digital forensics is about identifying pornography searching by how much color of a specific sort is in an image. And then there is preservation. And I have in the background there the MD5 hashtag check sum from this presentation as of 10.25 a.m. Which can only be generated using a copy of this presentation as of 10.25 a.m. So if we were to run it again we would know that Matt tampered with this presentation right before everybody came in and he introduced us. So that it goes to show your authenticity although in reality if you were going to do this you should really use like SHA-256 because people can backwards engineer MD5s. And sort of the other aspect of this is packaging and storage which there's a lot more to preservation than that but forensics is really good at packaging and storage. They have, if you look at the hardware they have really nice setups where they keep their database on a separate hard drive from the software which is on a faster hard drive to maintain the integrity of the database and also just make things run much, much faster. It's a nice setup really and makes it easier to access and find things. And then they also do packaging if you're familiar with the idea of metadata wrappers they do that and they also do that other kind of archiving that I don't like to call archiving we'll call it file serialization. If you want to put all of your files in one sort of uncompressed zip or something like that they automate that very easily. And I think that's it. So these are the things you can do but I guess the question is do you want to? So this is something that Cliff was at our May meeting at Maryland and in his way he often gets to the heart of an issue and here's what he said at our closing discussion back in May, how many of you would want to be the subject of a forensic investigation? And I think that speaks to not only a kind of public relations problem for digital forensics in an archival setting and also more importantly it does really capture I think really what's at stake when a donor entrusts particularly a hard drive to a collecting institution. Hard drives really are sort of the reflecting pools of our digital lives and their ambient environments that capture all manner of information both things that we overtly save to the drive but also all manner of again sort of ambient events that we may not even be aware of are taking place within the operating system. So there are some cautions I think that we should present with respect to using this technology in a cultural heritage setting including terminology first of all there are issues around expense and training which is also in and of itself expensive. Not every institution that wants to begin utilizing forensics in their workflows needs to go out and buy one of the Fred machines that I showed earlier but there's still going to be some sort of investment again in both hardware, software and training. The key sort of observation we might have is something Rachel touched on a few moments ago what we think of as the smoking gun fallacy which is of course what legal forensics is all about it's about finding that one really incriminating piece of data that one file whatever it might be and I think in the scholarly world we're going to want something different we're going to want more flexible ways for discovering information that is potentially of interest to us. We're going to probably embrace ambiguity rather than seek to resolve it. I think again the sort of smoking gun modality from forensics is not necessarily one that will translate well to our world and finally there are all manner of ethical questions which we spend some time on in the report. This is I think really the takeaway slide and I'll leave this up as we open discussion but in terms of next steps to summarize what we present in the report I think there was a lot of energy and momentum in the room at Maryland back in May and I think people felt strongly that momentum should not be lost and so we've tried in the report to very briefly outline some tangible next steps forward these include policy frameworks at the administrative level amongst different collecting institutions so this is with regard to don't things like donor agreements, best practices, the legal implications of forensics work if you were to for example discover child pornography on some piece of media that a donor had turned over to you you actually have a legal obligation to report that and that's something that I think repositories are going to have to contend with. So our first recommendation then is for coordination at the administrative levels amongst different collecting institutions in terms of various sorts of policy frameworks. Coupled with that we had a strong sense of the value of regional networks of collaboration so more at the level of an individual archivist who is processing a collection. It's not the case that every individual institution needs to have a high end forensic setup. There are possibilities for sharing resources, for sharing of expertise and so these kinds of regional collaborative networks we think are also going to be very important to sort of compliment what's happening at administrative levels. Requirements for tools, something I touched on a moment ago I think as our usage of some of the available forensics packages become sort of deeper and more widespread, we'll begin to see points of friction, things that we want to do that those packages do not support and things that they do support that seem sort of out of kilter with our world. I think the way forward here is going to be an open source and sponsored research as opposed to the commercial vendors providing what archivists need to do their jobs but we will need to scope out requirements I think for additional tools in this space. I think crucially there's a need to both articulate a research agenda. I think we have some fuzzy ideas of what scholars may want to do with more digital material, particularly when coupled with forensic analysis. I think we can predict that data mining, social network visualization, these sorts of activities will become much more important. If you're a historian or a literary scholar looking at gigabytes of data on somebody's hard drive it's not going to be possible for you to manually examine all of that material so you'll need to rely on automated tools but I think we still only have very sort of sketchy ideas about what scholars might really want to do with this material and that's I think itself related to the next bullet point which is a real need I think to collect as Seamus Ross was found of putting it better stories. Where are the really compelling case studies? Where are the stories about things that researchers are actually doing with born digital material and the kind of tools that are available to them. Training I think is self-evident. A lot of the corporate training programs are going to be beyond the financial reach of people in our world so I think we'll need to provide alternative venues for training in forensics methodologies. Cross publication of research, there's a real gap, a real divide in the sort of professional literature in legal forensics which tends to be I think sort of under theorized and then sort of what's happening in the archives world and so I think even things at the level of special journal issues and so forth could help with the sort of cross fertilization of the research space. And then finally terminology mapping it was observed numerous times that so the legal world and our world share concepts like chain of custody and so providing a kind of common mapping for people to talk back and forth to one another was seen as essential. That's what we have for you today. I think we're happy to take questions and as I was looking at my email on my iPad while Rachel was speaking I did get a message from Clear that the report is now up. So how's that for timing? Thank you. And so I think we're both happy to take some questions, comments. I am not an archivist but it seems that when you're accessing paper, once you've created the box and put it on the shelf the archivist role is mostly done. I mean at that point the researcher is going to examine the documents and do what they want. It seems clear that perhaps making the full disk image available for researchers is more questionable. Is it going to require sort of much more continuous intervention by the archivist between the researcher essentially facilitating access going forward? Clearly I'm technologically challenged and can't turn on a microphone. There you go. Oh is it on? Yeah. Oh, exciting. So will there be more intervention required than with paper? The answer is yes but maybe not for the reasons that you think. Sure giving somebody a full disk image might be a bad idea. I think it's pretty rare that they're going to ask for a full disk image at least at this point. But the real issues of resources and personal intervention that you get is just because of the fact that they're digital materials and you can put paper in a closet and it'll kind of do its thing if you keep it in good temperatures, low humidity, no bugs. Whereas with digital files, Matt showed a mystery house and an emulator because you can't take an Apple 2e disk today and put it into your PC or your MacBook Air and have it play. And the same is true with most digital files in that you need to do something active in order to keep them accessible. And that is where your biggest intervention comes in. I do think that there will be more redaction necessary than there has been with physical work. But I also think that that redaction will be a lot easier because unlike with paper where I would have to sit there and scan through looking for social security numbers, if I were trying to clear things of personally identifiable information, I could write a script and purl and have that automate it and be pretty confident that it would get most of them. In fact, it would be more likely to get things that are not social security numbers than to not get all the social security numbers and feel confident that that was safe to give to a researcher. A lot of what you talked about was dealing with locally stored content, content stored on local hard drives or local media. It seems to me that there's, I wonder if there is work being done on the forensics of content coming from, or evidence coming from the cloud. This is really important because a lot of human rights organizations like Human Rights Watch, Amnesty International and some of the citizen human rights organizations like Witness are capturing evidence from the web and from various social media. And that will come into their archives eventually. And I'm just wondering if there is some, if there's application of this to that kind of material. So again, we made a very deliberate decision to focus on file system forensics, partly because of, or really largely because of the sheer volume of legacy material that's currently coming into archives. But you're absolutely right that obviously the cloud web 2.0, this is where a lot of current and future activity will be concentrated. I think the key thing to say there is that the barriers there are as much legal and societal, if you will, as they are technological. Anytime, as soon as you start talking about Facebook or Twitter or any of the big online services, you're also talking about ULAs and terms of service. And these things are often not constructed in such a way that, say, next of kin have easy access to personal data, let alone an archivist. The best paper that I know that's a kind of overview of the issues here is by Simpson Garfinkel. It's called, the internet footprint is in the title, something, the internet footprint. And that would be the place to start. He makes the, one of the points he makes is that data on a local file system, such as a stored password, is often what somebody will need to unlock somebody's online life. And so there is, I think, a kind of permeability between the local file system and what's happening in the cloud that's important to acknowledge. I think the only thing I would add is that the, another good resource, which isn't even remotely official, but is one of the best that I've seen, actually comes from the blog Lifehacker. They have a really good guide to sort of archiving your social media self, and also making sort of a cloud-based will in terms of what you want to be done with your data should you pass away, and ways to securely ensure that if you have an untimely death, whatever you want to happen with your data will happen to it, and somebody will be able to access it. Kind of a follow-up to Dean's question about the researcher. It seems to me that one of the things that we are losing today, say for example in one branch of literary research, is the ability to see the different drafts and the editing of the different drafts because people tend to write on top of that file. So it would seem to me that giving the tools of the disk itself to a researcher would be one way that they could try to mine that for the changes that went into various iterations of a draft of something, right? Howard, did we plant that question with you? Yeah, the New York Times book review published a piece a few years ago that included this heartbreaking comment from Sadie Smith, the novelist, that she doesn't have any of her digital drafts because she just control S's and saves over. And I think the solution there is actually very straightforward. It's not to use magnetic force microscopy to recover the array spans of her manuscript drafts. It's to educate, and I think providing people with sort of, I don't know, archivally responsible computing practices, that's simply an essential thing that needs to happen. I do know that archivists who I speak with, there are sort of legitimate concerns and debates about where is it appropriate to intervene in somebody's creative process and at what point are you sort of crossing that line? Yes. But you do have things like once a year, the Library of Congress throws a personal digital archiving workshop, which is sort of edge-keeping people about how to do just that. You have software vendors becoming more aware of that issue. Office 2010 has built in version control should you choose to use it. But you do also have to think about really the ethical implications. If somebody hasn't said in their donor agreement that you can go through and use Stig analysis and other things to find all of the hidden data on their hard drive, is it okay to do that? And then no one's waiting. So a separate question. This gets to the smoking gun kind of issue. It seems to me there's another reason for us to stay away from the smoking gun in that there are anomalies that happen that are not necessarily due to what looks like the most obvious reason. So for example, the example you gave that Matt didn't change his talk or did change his talk because of the date stamp. Well, maybe Matt can't control the PowerPoint and every time PowerPoint opens, this happens to me at least, it rewrites over the file and there's a different date stamp on that. I mean, there are all kinds of anomalies that happen and I would caution us from thinking that something has been messed with necessarily. Things happen in the digital world that are a little bit different than backdating of documents that happens in the analog world, so. That's true, although it wasn't just a date stamp, it was something generated from the actual bits of the document and we kind of like to say in the archives world that when we're talking about authenticity, we're saying that the document or the record is what it claimed to be when we got it and it hasn't changed since we got it and there are ways to make things stable, to make files right-proof so that when you open it and whatever you open it with, it's not going to get rewritten to you by the autosave, so the algorithms and things like that to ensure that the integrity of the file is maintained is that it's been maintained since its arrival at the repository. So you are absolutely correct that that happens day to day. And, but ultimately also, I mean obviously once it's in your chain of custody, there are other things that come up but in the same way that there have been historical attempts to doctor paper documents when they're already in the custody of an archive. Someone who's very savvy about computer forensic techniques probably could doctor those as well, could simulate some kind of key to be the right key to put there. It's the chicken and egg thing that you always have as you develop new security methods, you develop people who know how to defeat those security methods and then so you do still new ones. Which is why I suggest using SHA instead of the MD5 that I actually use, because people have hacked that. I think also the larger context for your remarks, there is a kind of aura around digital forensics with all of its CSI connotations and I think researchers will ultimately need to be educated and encouraged to retain their critical skepticism and at the end of the day in our world you're still going to have a human being looking at a set of documents or artifacts and drawing conclusions and forensics is a very important tool but I don't think it will ever subsume individual critical judgment. If you want some really fascinating reading with math that I don't understand, there are some really great articles on there on detecting images that have been altered since they were taken. It's really quite amazing what they can do. Hi, Jonathan LeBreton from Temple University. Can you talk a little bit about the prospects for training in terms of developing a community of practitioners? Where is that gonna be happening in your view and what if anything gives you optimism that we'll have a cadre of practitioners in this pretty cool area? So two specific things I could mention. I know that UNC Chapel Hill is the recent recipient of some funding from Mellon to begin a forensics sort of training module in their archives program and I think this is a kind of pilot study so I think the results from that should be watched and I'll also say that a little bit of a personal plug but they were a book school at the University of Virginia now offers a course in born digital materials that I co-teach with Naomi Nelson at Duke and we do include forensics in the curriculum there. I've also seen it pop up in pre-conference workshops and tutorials and sort of no more ad hoc way. All right well I think it's break time so thank you very much.