 The title of the talk is quite long. I wanted to get the longest talk of the entire conference. But unfortunately, there was somebody else who had a talk titled that was five lines instead of four. So here we are. It's anti-forensics and anti-anti-forensics. There is no uncle forensics. All right, this talk's going to talk about techniques that can complicate digital investigations and things that investigators can do to mitigate these techniques so that they can retain the upper hand in the investigation. And just digital complications in general. So if you're interested in that stuff, you've come to the right place. My name's Michael Perklin. My day job, I'm a corporate investigator. I'm a digital forensic examiner, a computer programmer. In the past, I've been an e-discovery consultant. Basically, I'm a computer geek and legal support hybrid together at last, merged together. So that's me. The stuff I'm going to be talking about today is not that difficult. Actually, the majority of them are not sophisticated at all. Although a couple of them are a little bit sophisticated, but they're fairly easy to understand. Every single one of them can be defeated by an investigator who has either experience or training or knowledge in computers and understands what's going on. So if you're here looking for something that's brand new to break an investigation or to really beat an investigator, you're in the wrong spot. But all the stuff here definitely does add a lot of man hours and a lot of money that will make the investigation take a lot longer and it'll make it go over budget. And this will increase the chances that the investigation will be thrown out, or sorry, that the case will be thrown out, and maybe a settlement will be reached instead. So everything that I'll be talking about today is not designed to break the investigation process. It's to lengthen the investigation process and to make it cost a lot more money. There's three typical methodologies that are employed by a variety of different friends at collabs. One is the copy first and ask questions later methodology. This is typically employed by law enforcement personnel. They go in there, they grab every single piece of digital evidence that the guy has, or the girl, I'm sorry, and they can do this, they can take it all back without any impunity. The next one is to assess relevance first. This photograph you see here on screen is a LogiCube talent. It's a forensic duplicator that has the ability to do a keyword search. So what some investigators will do, they'll draft a list of keywords that they know will appear somewhere in the files that they're looking for and they'll search the entire hard drive for any of those keywords. If they're found, then that hard drive looks like it's relevant to the case and at that point, they'll start to acquire it. So instead of grabbing everything, they only grab the stuff where certain keywords appear. This can be used by any type of investigator, whether it's a law enforcement personnel, a private consultancy firm, or an enterprise firm. The third methodology is remote analysis of a live system. Basically, this is an administrator user who can connect over a network to another machine, scan the hard drive for certain things, for certain files, read the log files, and whenever they find something that's relevant, they'll copy it back. This is deployed in large organizations, such as Intel or AT&T, if they've 10,000 or more employees, because it's a lot easier to go at it this way than it is to image every single computer in such a large organization. Private firms can also use this methodology as well, if they have assistance from the firm that they're dealing with. In a lot of cases, a large firm, let's say Intel, may contract out a consultancy firm, a forensic consultancy firm, to come in and do all the imaging. They'll share the administrator passwords so that the investigators can still go in and grab all the stuff they need. Now regardless of which type of forensic lab you're running in, law enforcement, or it's a private consultancy firm, or it's an enterprise firm, you're gonna do essentially the same workflow. There are six stages that all forensic investigators run through. Number one is creating a working copy. This is either imaging a file or creating some kind of a forensic container that has digital hashes to preserve the evidence as it existed on the target system. Once it's preserved, you can process it for analysis. When you're processing, you're doing things like calculating MD5 hashes or SHA-1 hashes or SHA-256 hashes, and then you're also identifying the file types, making sure all the extensions on each of the files match the format of the file itself, and doing some other processing, getting it ready for your analysis. Next, you separate the wheat from the chaff. I came up with that term. It's essentially sifting through the data to narrow down what you need to look at when you start your analysis, because you're not gonna want to analyze every single file in the hard drive. You're gonna only want to analyze the stuff that you think may be relevant to your case. Once you've got that stuff narrowed down, you can analyze it to see if it's relevant to your case at all. This can be done with a variety of tools that you're comfortable with as an investigator. Once you're finished your analysis, you then write a report on your findings. This report, if you're a private consultancy firm, you can write this report for your client. If you're an enterprise firm, you can write it for your boss, or if you're a law enforcement, you could write the report as an affidavit to go to court. Either way, you prepare some kind of report of your finding, of what you found. And finally, you archive the data for the future. You never know if this case is gonna be settled today or tomorrow, or if it's gonna go on for years. And once a determination is made by a judge or an arbiter, it may be challenged and it may be overturned. It could go for another 10 years or 20 years. Who knows? As an interesting point, in Canada, at least, sorry, I'm Canadian, I don't know if the law is the same in the United States, but in Canada, if there's a case involving a murder, the data needs to be persisted forever. So, archiving data for the future is a pretty important step in the process. So that's a typical workflow that any investigator will have to run through, regardless of which class of investigator they are. Now, there's three classical anti-forensic techniques that I'm sure all of you are familiar with or have at least heard of. One is HDD scrubbing or file wiping. This is essentially going over a certain area of the hard drive with either zeros or random data, one time, seven times, 15 times. There's a variety of different standards of how many times you go over it, but it destroys the evidence permanently. The second classical anti-forensic technique is encryption, using something as TrueCrypt or PGP. This makes the data unreadable to anybody else, except for you, who has the key. And the third one is physical destruction. Basically, smash the crap out of something. Destroy a USB key, destroy a hard drive, make it so that nobody can read anything off that thing. Now, all three of these classical techniques I'm not gonna be covering today. The reason is, in the case of HDD scrubbing or physical destruction, once the data is destroyed, as an investigator, you can't do anything about it. So, there's no reason for me to even talk about it. In the case of encryption, yes, you can still get the data back, but you can get it pretty easily either by coercing the target to give you the encryption key or by getting a court order, maybe a subpoena or something like that, to compel him to disclose the key. There's ways you can get it. And of course, there's a brute force, but that doesn't necessarily work that well because it takes so much time. But still, with all three of these, two of them, as an investigator, you don't care about. And the third one, there's ways around it. Coercing is very effective, especially if you if you plan to either get them fired if they're an enterprise employee, or get them in jail, or you can dangle quite a lot of carrots in front of them to get them to divulge the key. So, because of that, I'm not gonna be talking about either of these three. On each of the slides, you're gonna see running tallies at the bottom. Excuse me. The total number of hours wasted by the investigator and the total cost for the investigation. I'm gonna assume that the digital investigator is running at a rate of $300 an hour. That's a fairly average rate. Sure, there's investigators that are cheaper and sure, there's definitely investigators that charge more per hour, but I'm gonna adopt the rough average of about $300 an hour for the purposes of my calculations. The tallies are gonna be green for running tallies and for each individual technique that I'm gonna show you, the tallies will be red, so you can see that this one technique cost this much and the next slide, the green tallies will show you the overall cost. So, without further ado, let's get started. The first stage of the process is creating a work and copy. Now remember, this is where you copy the entire device, either by forensic imaging or by creating some kind of a forensic container of the logical files or some other form of persisting the data for your analysis. First technique is quite simple. Just own a lot of media. As a suspect, you can keep every single piece of digital media that you've ever had, every cell phone, every USB key, every burn CD and DVD, every hard drive, every laptop, anything that you've ever had. As a bonus, if you can use these frequently, then the investigator won't know if the data he's looking for is on this hard drive or that hard drive because as far as he can tell, all of them have been used within the last few months. So he will have to look through everything and this will add quite a lot of time and quite a lot of money to the investigation. I estimated about eight hours, that's one work day's worth of work. It could actually be quite a lot more depending on how many devices you have kept. So as a dead simple technique, keep absolutely everything. In order to mitigate this technique, you can, there's two techniques you can use. Number one, you can parallelize the acquisition process. A lot of labs use forensic duplicators like the talent that I showed in an earlier slide. These devices take the source drive of your suspect and they take a blank drive of your own and it just copies byte for byte over to the next one. The more drive duplicators you have, the more of these hard drives you can grab at the same time. So the limit here is essentially your budget. However many of these forensic duplicators you can afford and you can keep running at the same time, the more hard drives you can go through in an hour. The next one is using their hardware against them. If you were to use their laptop or their desktop and boot from your own Linux Live CD or your own operating system, once it's booted, you plug in your external hard drive by USB and once it's plugged in, you can copy all the data directly to your USB hard drive and you've got all the data there using their own hardware. The limit here is not cost. The limit here is how many computers they have that your suspect has. So this could actually be the same limit that you have in your investigation. You know you need to copy 100 machines. They've got 100 machines. One for one, you're good. Here's a slide showing nine machines here. Each of them has a USB hard drive attached. They're all booted with a Linux Live CD and they're all copying to the USB hard drive. I have to thank my friend Joel who's also a forensic investigator. He took this photo. This is him at his work and it very beautifully illustrates this technique. Excuse me. Priorities. Okay. The second technique is using non-standard RAID. For those of you who aren't necessarily techies, RAID is a way of taking the two or more, excuse me, two or more disks and spanning them together to hold more than one disk will hold. Now a lot of these RAID controllers use common settings such as block size, order and things like that. If you use a controller that has different settings and different parameters, the investigator may not be able to combine the RAID array properly back at his lab. Also, a lot of controllers, when you update the firmware, it will include backwards compatibility for more standardized types of RAID. So simply don't update the bias of your RAID controller. By doing this, you'll only be running the controller with the stock, the stuff that the manufacturer put there which only works the way that they wrote it for their first crack at it and there will be no backwards compatibility. Some more information I guess about these RAID parameters. One is the disk order. If you got four disks or five disks inside of a machine, which order were they put in? Also, was the stripe ordering set up as left synchronous or right synchronous? Or is it left asynchronous and right asynchronous? Are the blocks stored in big Indian or little Indian format? Also, what is the block size? If any of you saw the DEF CON 17 talk by Scott Moulton on using porn to fix RAID, it explains the problem beautifully. I really recommend you guys go to YouTube and look up Scott Moulton's DEF CON 17 talk. Actually, is Scott Moulton in the audience? Nice, a round of applause. That was one of the best talks that I've ever seen at DEF CON. It explained things so well and kudos. I've used that presentation in my investigations and I know that some of my forensic friends have used it as well and it really helped cut through the chat to recombine things. Now in order to mitigate it, the problem is recombining the RAID. Which way did they stripe it? Which type did they stripe it? So an easy way to mitigate it is to simply de-RAID the volume on their own hardware. Again, you're using the suspects hardware for your own purposes. If you use a boot disk on their server to boot it up, this is software that you control, you can use their RAID controller, which has all the parameters to read their RAID array properly and you can then copy everything to an external USB hard drive or you can even slap in your own two gig, four gig, sorry, two terabyte or four terabyte SATA drive right into the server and dump things directly to it. You don't have to worry about recombining the RAID because they're hardware's doing it for you. I estimated, oh, I don't remember how many hours it was for this technique. I should probably drink for that, sorry. But I do know that the running total is about 16 hours and we're at about $4,800 so far added to the investigation with really not doing much. The second stage of the workflow is processing data for analysis. Now again, this is where you take hashes, this is where you do file signature analysis to see if the extension matches the file format and other processes similar to this. This here, now I realize it probably isn't showing that well to the audience but this is the internals of a JPEG file. I've highlighted the first four bytes of the JPEG file, the byte values are FF, D8, FF, E0. It looks like yo-ya but that's not a Y or an O or a Y or an A, it's a different ASCII character. But these four characters denote a JPEG file. Whenever a picture viewer tries to open a file, it'll make sure that these four bytes are there so it knows you are trying to open up a JPEG file. So file signature analysis relies on these header bytes to figure out is this actually a JPEG or is this an executable file or is this a zip file or is this a PDF file or any other file format. So the third anti-frenetic technique is again, pretty simple. We'll get to the tougher ones later. File signature masking. Basically you hollow out the middle of the file, put whatever data you want in there, maybe it's an encrypted drive of your own or it's a simple text file and whenever the file processor goes over that file, it'll read the first few bytes, it'll see oh, this is a JPEG file and it'll classify it as a JPEG file without realizing that embedded inside it, you've got a bunch of text. There's a tool in the Metasploit anti-frenetic framework, the Mafia framework called Transmogrify. This does this for you. You can point Transmogrify at a file that you've created and it'll modify the headers of it so that it looks like any file type that you want. Excuse me. Here's an example of that. This is a standard notepad window. You can see the first two characters in notepad are a capital M and a capital Z. Those two characters, MZ, are the file header for executable files, EXE files. So the text that's written in that window is in the file, but it's recognized by a lot of these forensic tools as an executable. Here's a slideshow of NCASE doing just that. I don't know how well it's showing up for you guys, but the red arrow shows that after file signature analysis, NCASE has identified this text file as an executable, even though it's clearly not executable because I just wrote this in notepad. So it's damn easy to make one file look like another. In order to mitigate this, there's a couple of things you can do. One of them is using fuzzy hashing. Fuzzy hashing is similar to doing a cryptographic hash, but it's a little bit looser. It can compare different parts of files to other different parts of files, so you can see how similar files are rather than are they equal or are they completely not equal like a traditional hash? So the chances are if your attacker has used this technique, they have copied another file from their file somewhere else and they've modified that copy, they've hauled it out the center of it to put their stuff inside. That means that the header and the footer and a lot of parts of the file will match the original file that they copied. A simple question you can ask is why does this file have a 90% match with notepad.exe or some other file? Another way you can mitigate this is by analyzing all the recent entries in common applications. Excuse me again. I've enjoyed DefCon a lot, so excuse my coughing. Let's say WordPad. If WordPad was used to open up a DLL file, well that doesn't make sense. WordPad opens up RTFs and TXTs and docs, so why is the recent list showing a DLL file as being recently opened? These types of questions would help lead the investigator to zero in on some of these files to analyze them in more depth. All right, so that's two steps of the workflow. We've got four to go. We're now on the third, separating wheat from the chaff. Again, this is the process where the data is whittled down to a sizable volume so that the investigator can only look at a couple of things. He's not gonna look at everything. Some techniques involve date filtering. All files created within this date range or maybe it's by custodian. All files created by this user ID. There's a variety of other techniques. One of them is using the NSRL or the National Software Reference Library. This is a library of hashes that's published by the National Institute of Standards and Technology or NIST. It's a huge database of every single file that's installed on a machine by commercial installers. Every file in Microsoft Word, every EXE file, every DLL file, every help file, every text file, every everything is hashed and these values are stored in this database so that investigators can use this to see is this something that's very common that's installed by millions of people around the world or is this something that a user has created on their own? The process is sometimes called denisting. Basically, the investigator will take this hash library, apply it to all of your files on your hard drive. It'll suppress them from you so that the only thing he's looking at are things that don't match the NISTed files. Things that the user has created, things that the user has modified from the original. So this next technique is basically using this against the investigator, NSRL scrubbing. Basically, if you were to modify every single file on your file system to be subtly different, even just one byte off, if you were to take a string that's in the file and change it, the hash of that file will not match anything that's in the NSRL. So when your investigator goes to try to suppress all of these common files, so he's only looking at your stuff, all of the files for Office and Windows and everything else will still be there. Now, this is easy for text files and Word documents, but if you were to try to modify exes or DLLs or other executable files, they will break and they will not work. That's because inside of these DLLs and executables, there's a CRC value that matches the contents of the file and now that you've changed it, that CRC won't match so it simply will not run. That's easy to overcome for a sophisticated attacker, like I'm sure most of you guys at DEF CON, recalculate the CRCs, update the CRC value in that file to match the new contents that you've changed and now the program will run just fine. If you're gonna do this with Windows files, like Run DLL and stuff like that, you'll have to turn off Data Execution Prevention, or DEP. This is a feature in Windows that makes sure that Windows itself hasn't been modified. It's designed to stop viruses and malicious code from running, but this will also stop your modified versions of exes and DLLs from running at all. It's fairly easy to turn off. You can see in the screenshot here, it's grayed out or disabled, and that's because in the boot I&I, I've set the policy level of no execute to be always off. It's a fairly simple change you can make in your boot I&I file. The next time you boot, DEP will be off and Windows will not try to verify any of the files that it's running. So your modified files will work. Wow, we've wasted 28 hours of the investigator's time, $8,400, and again, I haven't mentioned it in each of the slides. Deserves another drink, I'm sorry. I keep ignoring the green text. Okay, so in order to mitigate this, there's really only one strategy. Rather than adopting a whitelist approach, adopt a blacklist approach. Don't suppress all the stuff that you don't wanna look at leaving only the stuff you wanna look at in your view. Instead, try to target the stuff that you do wanna look at. You can use strategies like keyword filtering or the keyword searching, date filtering, and file signature analysis, basically anything to zero in on the files that you do wanna look at. It's a slightly different approach, but it's just as effective. For the next technique, I'm gonna talk a little bit about histograms. I had meant to get a photograph of a histogram on the screen here, but unfortunately, I got to drinking a little bit too much. Histogram, I'm sorry. How was that unfortunate? I guess, it was fortunate for me. It may be a little bit unfortunate for you guys, but if any of you remember grade nine or grade 10, or I guess for your Americans, ninth grade or 10th grade, you'll remember a histogram, and it's basically a bar graph showing how many occurrences of something in that date range have occurred. You'll see something like September 1st is a very short bar. September 2nd, very tall bar, because there's 10 things that occurred on that day. Then September 3rd will be another short bar, because only two things occurred on that day. It's a bar graph by day or by month or by some kind of date range that allows you to see which specific days or which specific time spans have had a lot of stuff happening. This is useful when reviewing VPN logs, how many times the user has logged in. If you see a big spike on a certain day, you know you should zero in on that day. Firewall alerts, or even file created times. How many files were created on this day? That could be indicative that an entire folder of 1,000 files have been copied on a specific day, because they were created. The next technique is, again, a little bit old hat. I promise we're gonna get to some of the more technical ones later on, but scrambling mace times. Mace times, C-E, that stands for modified, accessed, created, and entry modified. Each of these four timestamps exist on an NTFS volume for every single file that's there. And if you use a tool like Time Stomp, this is another tool in the Metasploit Antif forensic framework, the Mafia framework. It will allow you to set the time value on any file for any of these four values. You can scrub your entire hard drive of every single file time by modifying every single modified, created, accessed, and entry date so that the histogram will not show anything. The histogram will be essentially uniform because all files will have randomized values. In order to, oh, sorry. Two other things you can add onto this. One of them is to also randomize the BIOS time. If you have a system service or a daemon running in the background, it can set the BIOS time every 10 minutes, every half an hour, or every random interval so that the current time of the system is completely different than the real time. And also you can disable the last access update in the Windows registry. This will prevent Windows from recording the time that the file was accessed whenever you open it or whenever you read it. You can disable the last access time in two ways. One of them is a registry hack. If you go to this key and you set the value of NTFS disable last access update to one, that will prevent Windows from modifying these access times. And if you, the other way is if you open the command prompt as an administrator and you run this command, FSUtil behavior set disable last access one. This does the exact same thing. Windows will no longer record the last time that you opened the file or the last time you touched it in the Explorer. To mitigate against an attacker that's doing this. Wow, hold on. This will take 16 hours of an investigator's time because he'll need to figure out what's going on and this will cost $4,800. Nice, I remembered it. To mitigate this, you can, first of all, you should ignore all the times that you see there. You know that they're all scrambled by the attacker, so you know you shouldn't read into them. Instead, look for log files where time values are stored as strings or as text. These log files are written sequentially. Every time an event occurs, a log entry is put in with the current date of the system. This will allow you to see a set of times that are roughly similar. And if you have this, if the attacker has this system service running constantly changing the bias time, a sequential log will show which time values came before which. This slide here explains it in a little bit more detail. You can see that in the first block of timestamps, it's a year of 2026. In the second block, it's different in the third block, it's different still. You can now infer that all the values of 2026 have occurred sequentially. And right after that, the bias time was changed to 1983. Excuse me, and after that it's changed again. So now that you know which block of time comes before and after which other block of time, you can use these time values that you've learned from the sequential log file to scan for other files on the system that have similar dates. It'll show you, it'll give you a rough timeline of what's happened. Now you're still wasting a lot of time doing this, so in essence the attacker is still winning, but it'll at least allow the investigator to get through it and figure out what's going on. Also I should mention that the attacker doesn't need to change the timestamps on every single file. He can only do it for one file. And if he does this, I'm sorry investigators, you're a little bit screwed. Whenever a report is written by an investigator, he can't say that this occurred on this time. Reason being, as we all know from timestamp, you can change the time values on any file. So we don't know necessarily that this really did occur at this time or that the attacker has updated the time on that file to make it appear like it occurred on that time. The best we can do as investigators in our reports is write that the time in this log is consistent with the MAC times in this file and it's also consistent with that other log file. Consistency is the key. So if an attacker updates the time values to make it consistent, as an investigator you really have no clue that he's done that or not. All right, we're on step four of six now of this typical workflow. We're gonna look at techniques that can confound the analyzing data portion. So far we've refracted 44 hours of the investigator's time. That's $13,200. Already we're probably at the point where somebody who's running the budget may say maybe we should settle instead of continuing with this investigation, but we're gonna move on. Now confounding the file analysis, when you're doing data analysis, sure there's a lot of forensic suites like NKs or FTK that allow you to see inside the tool what the file is like, but that doesn't show you the whole story. A lot of times you'll need to export the file from these tools to your own analysis machine and run it with the native application. Open up the doc file in Word itself or open up the PDF file in Acrobat itself. Now when you do this, there can be some problems. Sure, there's viruses and things like that, but putting those obvious ones aside, one of them is restricted file names. A lot of people don't know this, but Windows 7 still has a lot of holdovers from the old DOS days. File names like con, prn, com1, com2, lpt1, I'm sure the older members of the audience will remember these as system devices for printers and modems and other things. If you attempt to name a file with one of these restricted file names, Windows will bark at you and say, no, can't do that. Now you can still hack the file system to have one of these files named with one of these names. And if you do that, if you try to read the file, Windows again will bark at you, no, you can't do that. It'll give you error messages because it's instead trying to access a system device rather than that file itself. So as a suspect, use these file names anywhere that you can. Now I estimate that this is only gonna take about one hour of the investigator's time because as he realizes, oh, well, it's probably just the file name. He could change it or export it with a different name. It's really not gonna cause too much problems for the investigator, but it will still hinder him a little bit. In order to create some of these restricted file names on an NTFS volume, there's three techniques you can use. One is to access it using UNC paths. If you go to double back slash your computer name slash C dollar sign to access your C drive, you can navigate down to the folder and rename the file there. Windows will let you do this. Another way is to do it programmatically if you're a programmer. In the kernel 32 lib, there's a function called move file. This allows you to move the file programmatically rather than relying on Windows itself. And third way is to basically boot it off of Linux. As long as your Linux distribution has support for NTFS, you'll be able to rename the file no problem and now you've got a file with one of these restricted names. To mitigate against this, as an investigator, never use their file names. You own your own analysis machine. You are the one who should dictate which file names you get. If you're exporting something from a forensic image, export it using a name that you choose. Maybe 1.jpeg or something like that. FDK4 has a feature that allows you to automatically name a file. It's something that will always be guaranteed to be okay. I recommend using this. The next technique is kind of a fun one. I did some research in this and I really like this technique. Basically using circular references. If you're a Linux user or a Unix user, you're familiar with SimLinks. But if you're a Windows user, this may be a little bit foreign to you. Folder names typically have a limit of 255 characters. You can't have a path that is longer than that. You can use one of two features in the Windows operating systems. One is called junctions, which is introduced in Windows 2000, and one is called symbolic links, which is introduced in Windows XP. Either of these will allow you to link a folder to another folder so that as you click on that folder, you're really jumping across the file system to somewhere else. Now if you use this feature to link to a parent, you've now created a circular reference and you go to parent to child, to parent to child, to parent to child, this will break anything that recurses through a file system. So as an attacker, create some of these SimLinks and junctions so that if the investigator tries to scan through all these files, their system gets bogged up because it's now going recursively. Eventually one of two things will happen. Either it'll run out of memory because of a recursion error or it'll get a path too long error message because 255 is the maximum length. And once you've got too many parents and child, parents and child, you're gonna run over that length. This will take about four hours of the investigator's time I estimate, which equates to about $1,200. Now again, the investigator can mitigate this pretty easily, but it's gonna take some of his time to figure out why his processing engine is not stopping. Why is this going forever? And why is three megabytes worth of files taking an hour to export what the hell's going on? This technique will definitely hinder anything that recurses through the file system, but it won't touch anything that a forensic image was used. If you create a forensic image of the drive and you're analyzing an N case or FDK, these will recognize that it's a sim link, they won't bat an eye, and there's no problem for the investigator at all. This means that in enterprise areas, if the administrator connects to a machine remotely and is scanning for a specific file name or searching through the files, live access to the system will definitely be hindered by this, but again, forensic images won't. So to mitigate this is fairly simple, always work from an image and be mindful of this attack, simply knowing that it's possible to create these circular references. When you run into something that's taking a lot longer than it really should, you'll be able to recognize it fairly easily. Next technique is breaking log files. As I mentioned earlier, investigators process logs to figure out timelines and for a variety of other reasons. If you use weird ASCII characters in these logs, you can sometimes break the parser depending on the program that's being used to analyze these logs. Another thing you can do is to add delimiters within the text messages of the error messages. If you can customize some of these event messages, you can add commas, you can add quotes, you can add pipes and parsing will be very difficult. If the parser is trying to parse each record by a comma, it won't work or any of these other delimiters. As a bonus, if you're using Windows event logs and you're parsing Windows event logs, if you, as an attacker, can add a Windows event record with the text ELFL right in the middle of the record, these four bytes are the, they denote the start of an event record. Whenever a parser goes through each of the records, it will look for these four bytes and it knows, okay, this is the beginning of a record. If you start throwing these start record bytes right in the middle of a record, you could break some parsers into thinking that a new record has started when it's not even done parsing the first record. I estimate this will take about six hours of the investigator's time or $1800. To mitigate against this, ask yourself, do you even need that log? Maybe you can figure out a way to prove your case without this log file. If you do need the log, do you need the whole log or do you only need a couple of records? If you could zero in with manual analysis of the log file and copy paste just the few records that you need, you don't need to worry about parsing programmatically at all. Excuse me. Again, priorities. Okay. At worst, the investigator will need to write a quick script or a quick program to go through everything and account for all the weird characters that the attacker has thrown in. But there the investigator, the attacker is still winning. The next one, I, whoops, sorry. The next technique, I didn't necessarily want to throw in but I realized, you know what, every time I've run into this thing, it has caused so much problems. It has added so much time to the investigation. This technique is dead simple. If you are writing email, use Lotus Notes. Lotus Notes has gotta be one of the worst programs I've ever used and if you're an investigator, you will agree Lotus Notes will add so many hours to your investigation. They use NSF files instead of PST files and these NSF files can be encrypted with an ID file that has the user's ID and an encryption key in it that's locked with a password. It's such a convoluted system. Sure it may be a little bit more secure than PSTs but you're gonna add hours to your investigation. There are a lot of tools that can convert an NSF into a PST and convert every one of the emails inside but every one of these has their problems. Main reason is, all these tools are, or the majority of these tools I should say are built on IBM's own Lotus Notes API. Lotus Notes' API, sure it works but it's kind of convoluted. These third party developers, when they're trying to use this API, they don't necessarily do it properly. This password dialogue that you see here comes up every time you try to open up an NSF that's encrypted. It always comes up in the interactive context so that the e-discovery operator or the investigator needs to type in the password manually for every single access of this NSF. It is quite a pain in the ass. I like the title of this slide, mitigating Lotus Notes. Train yourself on Lotus Notes itself. Don't try to convert it to something else. Just use Lotus Notes as it is. It has a search feature so you can find the specific emails that you're looking for and when you find it, print it to PDF, attach it to your report, you don't need to worry about any conversion. You'll save so much time by using Lotus Notes itself rather than trying to convert it into something else. I don't know why investigators don't do this to begin with. Two more stages and then we're done. Reporting on your findings. I didn't think that there were any hacks that you could do on writing a report but then I realized, well, there sort of is and that is hash collisions. When an investigator writes a report and he puts a file that is notable, he'll put down the hash value of that file. Well, if you have a good file and a bad file which both calculate out to the same hash value, it could confound some of the less-technical people that are reading the report. Also, if you're searching by a hash value, you're not only gonna find the files that you're looking for, you're gonna find other files that you weren't looking for who happened to have the same hash value. This will probably add about two hours to the investigation. Now, this will only really be useful in a select few cases. Let's say an employee took data from company A as he quit his job and he took it to company B and he put it on their file server. An investigator may search for those files by the hash value on company B's file server and if he finds them, well, hey, there they are, I got them. But if there's other files that also come up in that search, he'll have to include them in his report or he'll have to omit them. Either way, the defense can say, well, why did you not include them in your report or if you did include them, why are two files hashing to the same value? I thought you said it was a unique fingerprint. You'll have to explain to a judge or to an arbiter or to some non-technical lawyer what a hash value is, what a hash collision is and all the technical stuff around that. And this confusion in a non-technical person may add just the right amount of reasonable doubt so that they acquit or that they let the guy off scot-free. So hash collisions really do work. Now, a lot of work has been done on this already. Mark Stevens wrote a thesis on this in 2008 which sort of kick-started the whole MD5 is broken thing. I recommend you do a Google search if you're interested in creating files that have similar hashes because there are definitely tools out there that can do this. To mitigate this is, again, fairly simple. Don't use a broken hash algorithm. It's been broken since 2008 people. There's no need to use one of these broken algorithms. Use SHA-256 or Whirlpool or something else that doesn't have any known collisions. Also, double-check your work. If you found something by hash, open it up. Don't just include it in your report because, boy, would your face be red if you included it in a file that wasn't actually the file you were looking for. And the final technique in this talk is using a dummy hard drive. A dummy hard drive is essentially a hard drive that's not used by the operator of the machine. If an attacker has a hard drive that's in the machine but he doesn't use it and he instead boots off of a USB key, he can use the machine without touching that hard drive. He can also mimic regular usage of that hard drive by making rights to the hard drive from a daemon or a system service. This daemon or a system service can retrieve news articles. It can sync email with an account that is guaranteed to have legitimate email. Basically mimic regular usage of the hard drive. As an investigator, when you pop the hard drive out of the machine and you start analyzing it, oh, yeah, look, it was used recently. This is Buddy Guy's hard drive. I got his machine. And you'll be clueless to the fact that all the real data is stored either on that USB key or stored in the cloud somewhere or on some remote server where he was accessing everything remotely. To mitigate this, again, is fairly simple. Always look for USB keys. They can be super small these days. Check the back of the machine in all the USB ports. Also check the motherboard. There are risers on the motherboard where you can plug in a USB port that goes to the front of your case. So what if the guy took those four pins on this USB key that you see here on the screen and he soldered them right onto the motherboard so there's a small little chip on the motherboard. To an untrained investigator, he'll have no idea that there's a whole other device right there. If you're booting from a Linux live CD, you may see the device come up in the device listing. But if you just pop the hard drive out to plug into his forensic duplicator, he'll have no idea that there is a whole other device there. Also, to detect network traffic, use Wireshark or something to see which IP addresses that machine is talking to. That may disclose a location of his remote server or the cloud that he's storing all of his real data on. Now for the final stage of the workflow, archiving the data for the future. I mentioned earlier that investigators always keep the data that they've worked with. Just in case the case comes back, maybe somebody's challenging the results of the original decision, things like that. You'll need to keep the data for sure. This technique is the same as the first technique, data saturation. The more data that the investigator needs to keep, the more money he'll need to spend. I estimate that this is gonna cost about $20 per month per hard drive. So if you've got five hard drives worth of data, that's a hell of a lot of money. I'm too drunk to even calculate that right now. So we've come full circle. Keep as much data as you can and you'll make the cost go up. We've taken up roughly 63 hours. I didn't update that slide, that deserves a drink. We've taken up 63 hours of an investigator's time. That's more than eight work days without overtime. I'm sure maybe they'll do overtime, but now you're paying time and a half and you're costing even more money on the investigation. All of this extra time wasn't spent analyzing the data or looking at the stuff to see did the guy do it or not. All this time was spent on menial tasks, things like copying files, trying to image drives and reading email. Like these things should not take a lot of time. The investigator still has to do all of his regular work. This will increase the likelihood that the opposing counsel will decide to settle or to just call, or your client will say, you know what, forget the investigation. Let's not investigate anymore because this is costing a hell of a lot more than we budgeted for this, so just stop it right now. And that's my talk. If you have run into these problems or any problems in your investigations, I would love to hear from you in the Q&A room. I'll be in the Track One room. How did you deal with it? I would love to hear about it. Or if you have any questions about investigations in general, I'll be there as long as people are there. And thanks for letting me speak. The slides on your CD are really outdated. I submitted them maybe three weeks ago and I've updated this talk a lot since then. So go to this website. I'll be uploading it later today. If you try to go to it now, it won't be there, but perclin.ca, that's my name, slash tilde defcon20 slash perclin underscore anti forensics. Copy this down. It'll be up for the next few minutes. You can get the updated slides there.