 So I don't want to waste your time at all. We've got about 60 slides and if we can get through those, I've got some additional materials. So I'm going to fill up this time. I'm very comfortable with this material. So please feel free to ask questions at any point, short questions, and it may take some effort to get my attention. But so if you stand up and wave your arms, I'd be happy to, you know, take your comments. And actually, I'd probably be more interested in those in these materials because I've already seen this. So I really would value your feedback. So anyway, what we want to get down to is internals of event logging on Vista. And there's some radical changes in both the component architecture and the encodings. And that's really what we want to talk about. And the process, we want to kind of put things in perspective. And all of this relates not just to forensics, but one of the central issues of doing forensics is interpreting. This is mostly static analysis, but interpreting what has gone on from the system from artifacts that you can recover. And that depends in large part on your ability to interpret the data. And so understanding how the system works and what the encodings mean is something that's shared with a lot of other disciplines. So this relates to a lot of, you know, all the other areas of security. And in the same way it borrows from all the knowledge that you might use in those other areas. So it's all tied together. So that's kind of, that's a little bit of perspective. I'm going to back up and give you an introduction now. This is my sponsor who paid my travel costs. They are a forensics and data recovery service company in Houston. So they also do technology consulting. And just so you know, this is not legal advice. We're not going to talk that much about the law here, but in case we do mention something about how the law applies. And we do do expert testimony, but this is not that. I want to recognize a couple of people who've helped us over the years. Cesar and MB5 of the ghetto hackers. All the guys from Houston, HTMapathy who are here. And have been going to DEF CON for years saying, man, when we get back to Houston, we're going to get together a team. And they're beginning to do that. DarkTangent for making this conference, which is the best among all the others, happened and Fednaughty who I really should be over at Capture the Flag with right now. And thanks to Geraldine Martis at ACS, who's been an excellent technical editor. Josh Pinnell at IOActive. And Matthew Geiger at CERT, who helped me decide to begin some of this research. And one of the guys with HTMapathy who isn't here this year because he's over in the Middle East. So we miss him and he misses us. And this doc has dedicated to him. So this stuff may be skewed slightly because of my perspective. And I tend to look at everything as research and development. My background is a PhD in EE. I was on the faculty of a medical school for a while. So I did research, went into industry.com boom with a security management company. And still want to keep my hand in the game in terms of understanding the field. And the way you do that is through services. So I'm doing forensics with applied cognitive solutions. Along the way, I've also written some free software. GNU graphics. I'm a contributing author to asterisk. I was on the free BSD core team and the X-Free 86 core team. So those are the kinds of things I do. So I look at a lot of these issues, problems as a question of understanding the mechanisms and writing new tools to automate the processes. So I want to talk for just a minute about how to catch up. We're going to talk about things that are not in print yet. But if you want to get a handle on these kinds of processes and tasks, this is the body of literature you might look at to try and get a handle on forensics and log forensics in particular. So one of the leading certification in the field is in case. And they have a study guide which will take you, it's probably the best book out there, although 10% of it is mostly errors. But it's the best book to teach you kind of a survey of all the kinds of tasks that you might do for static analysis of a hard drive. There are some newer books. The one on the bottom left is by Bunting as well, Mastering Windows Network Forensics and Investigation. And it talks more about Windows artifacts, all the different kinds of file formats that you might analyze. Still a newer book is Windows for Incident Recovery by Harlan Carvey. And it goes beyond these other tools because it uses Perl to add new capabilities to extracting data and automating things. So those two books are probably the most recent that you can find on this kind of work. Along with that, there's some more specific material on Windows event logging, but it's for older versions of the operating system. But it will tell you about things that are backward compatible and still valid on Vista. And finally, if you want to find out where the field is going, there's a really good journal that's three or four years old called Digital Investigation. So those are the materials you might look at to catch up on what's going on and what digital investigation get ahead. Stephen Bunting has a website where he's published some materials about recovering event logs. So there are various levels with which you might recover event log information, intact files in the file system, data carving, whole files, recovering fragments. And this is some of the most recent information on manual methods. I mentioned Harlan Carvey's books. He covers a lot of different artifacts that the other books don't cover and how to write scripts, mostly Perl, to parse the data and analyze it. And probably there's more about event logging in Bunting's book, Bunting's more recent book. But when they get to the question of log repair, so if you recover logs and they're not intact, they say there are no methods for doing the repair. And that's what this talk is about, understanding the encodings and figuring how to go farther with extracting the data and analyzing it. So we're kind of at the edge of what is published. And the new work, there's some new work that is coming out in Digital Investigation. In fact, I have a paper coming out in two weeks that is on Windows XP. And that's one of the reasons I'm not going to talk about XP today because I can't talk about it for another two weeks. And I'm not going to talk about automation either. So if you want to know about some automated methods to do what we're going to talk about, that paper is available. You can give me your email address or send me an email and I can send it to you. Okay, so back to our outline. We're going to talk about event log analysis from the perspective of a case study. And this is representative of the kinds of work that our company does for intellectual property disputes. And so this will provide us some motivation for what kind of questions we're trying to answer and what we're trying to get out of the analysis. And then as a part of that, that's going to motivate us to extend the capabilities or go beyond the limitations of the current tools and then at the end put it together and see how that impacts the kinds of answers we can come up with. So let's look at a typical process for an engagement. We can break it down into three phases in terms of how we interact with the customer. We do civil only. We're mostly dealing with corporations who demand a certain amount of control and estimates for work. So very often we'll begin the engagement with an estimate of work that tells them what kind of things that we know we can do without having any information about what's in the data in advance. So we'll provide a preliminary scope of work and it'll tell them what's feasible, what we know we can do with the tools, what we can do quickly, and generally it will provide them a preliminary report. And almost always in the preliminary report there are some surprises. You find out, you know, because there are so many levels of methods for recovery and varying amounts of effort that go with it, it depends on what you find in the data as to how much effort it's going to take to tie all the pieces together. So there are almost always surprises and the trick is to come up with ways to go from step two to a final report in a way that's going to be feasible. The final report should provide some kind of in-depth coverage. So you want to be able to say, well, there was one indication that X happened, but you want to also be able to say something about whether the rest of the 100 gigabytes or so or whatever of data for, say, an individual machine doesn't indicate that there aren't, it couldn't be explained in another way. So you want to have enough coverage that a single item indicates something clearly or not. And you may need to adapt the methods to do that. So typically you're going to be contacted by a corporate officer. Something bad has happened, business interference, possible contract violation, and very often what we deal with is proprietary information going out and being used by someone in violation of, you know, agreements. And very often it's former employees who had used that information as a part of their job, part of their role as what they did for the company. So it's not enough to show that they had the information. But you want to be able to distinguish whether someone took the information out in a specific kind of way at a specific time and discriminate between things that were part of their job and should have been and things that weren't. So the first task is to find a scope of work. What kinds of things can we identify where we could show outgoing file transfer? We can examine hard drives. That's the most typical thing we do. We can look for e-mail attachments, e-mail going out, uploads, file transfers, and there are various other kinds of things that you can do. Ways that the information could be transferred out. So let's consider a typical preliminary report. We look at the hard drive, and the client may have told us things to look for. They may have told us key words that would appear in their proprietary documents or information, names of products, names of services. So you look at a hard drive image. By golly, you see a name for a document that is exactly what they were looking for. It's proprietary information there, and it's got a decoll in. It looks like an external drive, not the system partition, but it's in an unallocated space. This means that at first glance, we may not know what this piece of data means other than the fact that it seems to appear referred to an external drive. Furthermore, there's some bad news. The reason it's in unallocated space is we find out that IT deleted their user profile after the employee left, and then they gave the laptop to a new employee, which kind of complicates forensics and terms of attributing this to the next person, differentiating between the next person that was in position of the laptop. And this was six months ago. I had it for six months. And then on top of that, they had reformatted. That's... And then they reinstalled. Okay? So this makes it really interesting. So what do we know? We've got in our preliminary report, we know we've got a document in unallocated space. So we look at the surrounding data, just as ASCII text or HEX, and it looks like a shortcut. Shortcuts have a certain structure and a certain signature at the beginning of the file that we can recognize. So it looks sort of like a shortcut. And shortcuts are useful because they contain a snapshot of the thing that they refer to. So they're kind of like a soft link. They point to something, but on Windows they also take a snapshot of time stamps. They take a snapshot of the volume label of the device that it refers to. They take a snapshot of the volume ID, serial number for the volume. So they've got a bunch of stuff that could be unique to whatever device held that document of interest. So here we know we've got something interesting. We've got kind of a lead on an indication that there may be something here that would be of use to the client. We want to identify outgoing file transfer. How do we do it? When we know that everything, the file system has been overwritten, the original file system has been overwritten. So some of the things that we can do, or we can data carve for various things that hold paths, file paths and time, like shortcuts. And then we can look for things that would have time stamps that might correlate with that. And there are a number of things that are going to contain a wealth of time stamps, things like event logs, the time stamp events, internet history, which is generated both by the browser and all kinds of applications. And we mentioned shortcuts. Okay, so now that we know, we know what we need to do. We need to recover all these kinds of information. And we know that it's not available through the file system. We're going to have to go carve it out. This gives us the motivation to understand the encodings and how the system works. Understanding the encodings that allows you to extract the data, understanding how the system works and allows you to interpret what you've extracted. So we want to know something about how these stores log events and how the logging system works and what the events mean. So the process for doing that can be, there's a bunch of process models in the literature, and this is one view of them that you might extract the data, extract records, extract fields from records, analyze them, and then reconstruct what was going on in terms of and interpret what those records mean. And we're going to look principally at the first part of this, where we recover logs and events, and then they may or may not be valid in terms of the tools being willing to read them. So we may have to deal with repairing or reconstituting the log files so that the tools will manipulate them and finally correlating all the information. So that's what we want to focus on. So we do so for the shortcut, and what do we get? This is actually just the first section of the shortcut, and it holds some standard pieces of information, including what we saw before, the path to this file. It stores the kind of media it was on, CD-ROM, a volume label that looks like a date, serial number, file size, creation dates, last write date, which are a time stamp of those values on the media. So we know more specifically here that it actually is a CD-ROM. So whatever this file, whatever was going on, this file was being looked at on a CD-ROM. And that's even, so that's a little further support for investigating this further if the client is interested in transfer of intellectual property. One of the ways that you might recover whole files are using data carving tools, and there are a number of tools out there. The most popular are commercial tools because the majority of people doing forensics are in law enforcement and they prefer commercial tools. So that's why I'm presenting a commercial tool. This is Datalifter, and it has a couple of hundred, two, three hundred signatures for standard formats in there. It does not have event logs. So if you want to carve event logs, you've got to go and get event logs and figure out what the header is. Put it in the dialogue here where it says edit file signature and give it a length. So those are the things that you need to do to figure and get the length. And one way to do that for event logs is to look at a couple of different logs in WinHex. So you pull them into WinHex and say synchronize and compare. That's the menu option. And what WinHex does is it takes two files and shows you the first one and highlights all the bytes that are different between the first and second. So all the black boxes you see here are the sections that are different between the two. And you can see the first line, there are no highlighted bytes. So that section of the file is the same. And we can use that as a signature. So this is the method that you can use to figure out what the signature is for a given file type. Just go grab a couple, load them into WinHex, do view and compare. So for XP log signatures, you see this byte sequence here. Things change on Vista. And in fact, it's a little bit easier to remember. The signature that Vista event logs start with is L file, which is event log file file padded with nulls. So 16 bytes L file. So you put that into data lifter, tell it a file size. On Vista, the logs are either going to be 64k or 1 megabyte plus 64k. So you use the latter. Do the data carving. Hang on. Do the data carving. And you may get a whole bunch of logs back from that. So if this machine has been in service for a number of years, say three years, even though Vista had been reinstalled, when we do, obviously Vista hasn't been out that long. But when we do this for XP, you may recover 100 logs. So because of the way the system, because of the way the system behaves in terms of when you clear logs or create a new log or defragmentation occurring, there are lots of ways in which a log, the system will move log files around and leave an old copy. So it's easy when you have a large disk that is underutilized in terms of capacity that you'll find lots of data. That is lots of old log data. So you might recover 100 logs and it's very common to have only one or two of them be intact to the extent that the tools will open and read them. So the question then becomes what do you do now that you know you have 100 of them? What do you do with the ones that appear that the tools refuse to open or view? And that's one of our motivations for looking at the internals. Vista changes everything. Now, one of the things that we can leverage in understanding how Vista works is the backward compatibility. So with each new revision, they're going to keep some things from the old. And that's going to happen up at this top layer where those green dots are between the DLLs, the libraries, and the application. So there's interfaces there that will continue to exist in the new version of the operating system. So there are things about the way XP worked that we can use to reason about how Vista works. And one source for understanding that is this book on event logging. Now, Vista changes a whole lot of things. So instead of three or four logs, depending on whether you've got IE7 installed, you've got more than 50 by default. And there's logs for individual services. So there's a whole wealth of logs there. And they're in a different location. System 32 WinEVT logs. And they have a new extension, EVTX. So we're going to talk about a little bit about why they did this. The new system is intended to solve a lot of performance issues. So they took the old event tracing for Windows components from 2003. And those are the engine for event logging on Vista. And they do buffering of events in the kernel, circular precues, and to get very high performance to do that. And it unifies logging for both the apps in the kernel, the drivers. There's a richer component architecture to this where you had an event log service and an application and a single log, like the application log if your application was logging. Now you have providers that may log to channels. They can create new channels dedicated to their service or application. And collectors that can control the flow of events from the channels to log files or remote log collections. So you can centralize logging with Vista as well. And some advantages. The other kinds of performance advantages that you have because it's buffered into the kernel, the performance impact is much lower. You can dynamically disable and enable this without rebooting, which is not possible on XP. And you can filter the events. So you've got lots of new capabilities. In terms of speed, whereas a couple of hundred events might bring XP to its knees, a couple hundred events per second might bring XP to its knees, you can log 20,000 or with transactions, 200,000 events per second and only use 5% of the CPU on Vista. And this is a big deal for forensics as well because it means that all of a sudden you can, applications and services can use the logs for new things. Because they can do so with the performance impact, they maybe begin to log lots of things that they haven't in the past. The framework is much more flexible because all the interfaces for drivers in the kernel and the applications are layers on top of a single new component, the event tracing for Windows. And it unifies all those interfaces. So from a programming perspective, understanding how things work, we want to do that so that we can interpret the events, interpret how the system behaves and be able to say, given an event and a few pieces of information, what they really mean and what they don't. So we want to understand how the logging service works. And this is the new API for logging events. If you have an application and you want to send events to the log, the application developer writes a schema and compiles it to be bundled with the application that describes the events. And at runtime, the application registers as its source. It creates a session and it sends those events for the schema that it's registered. And then we've got Vista has... There are very few tools for looking at these things, but the tools that are there have a bunch of new capabilities in terms of the new event viewer can filter and do a kind of sequel querying across events. And it can do so both locally and remotely. It can do so for live events as they're occurring or across multiple log files. Now, if you look at the MSDN documentation, it's going to describe everything as XML. And that's a wonderful thing because XML is neutral in a lot of ways. So this is the kind of structure for an individual event. And it's at the beginning going to have a bunch of standard properties for things like the name of the service, event IV IDs, level of severity, those kinds of things that are common with XP. And then a bunch of other stuff that is potentially an entire XML document specific to the application. And that's where things get interesting in that the application can define a document structure. And the log viewer or analysis software can filter that using XPath to access, you know, walk the tree and look for individual items. So how does the application define that? The application provides a manifest and that provides type information and the structure of the document that it's going to log events as. Providers of events also have a description that contains a unique GUID and a provider name. And it specifies the DLLs that contain some of this information. Just like you have message catalogs on XP, you have resource names and parameter file names for this additional type information on Vista. And a similar document to define the channel. So an application can define a dedicated channel for itself to log events to. And then a controller can say, well, I want everything for that channel either sent to a specific log file or selected parts of that subscribed to to be sent remotely for collection or centrally for collection. Templates define the shape of these documents that the log records are encoded as. So you have XML payload and the templates define the structure of that payload. So the manifest is going to define all the event attributes such as IT, version, keywords, tasks. It's going to reference a template that we talked about in the last slide and name channels. And one of the reasons for doing this is both that it can be used for all kinds of services and that you can use it in new ways. So the event logging is guaranteed now because the performance is impacted so low you can guarantee delivery of events. So a wireless driver can depend on the event logging to send messages to a service to manage networks or VPNs, things like that. And because of that, that will have a forensics impact in terms of having records of those kinds of things to use for event reconstruction. So we've got high performance tracing. The events can be forwarded to a collector's service or stored locally. They're buffered in the kernel. They can be delivered as they arrive or pushed to a remote machine. So let's look a little bit at the encodings, the internals. The documentation talks about this all as XML. The events... So we've got these XML record format for the specific events. Those are held in a section and you'll have in front of a section body a section header that says what size it is, what the manifest that you would use to decode it, and then descriptors for each of those sections, and then a record header that would hold standard attributes. So this is the encoding for a new encoding, and it's very different from XP where you have basically an array of strings or an array of data types, a single array of types. In the record header, you have these common attributes such as time stamp and severity. Again, the section descriptors tell you what source, what provider of the events, and the offset and lengths of the bodies, and then the header for each body is going to tell you the encoding for the body. One of the other advantages of this new architecture is that log size limits are removed as well. So on XP, you're limited to about 300 megabytes by the system, and at that size you have a severe performance impact, and that's for all of the logs combined. On Avista, the logging service memory maps only 64K of the log at a time, and that allows it to have much, you know, basically removes the limitation of the impact of mapping the whole file on XP. So if you want to recover logs and records on Avista, because it maps only 64K, you've got both the structure that you see signature at the beginning of the file, and there's an identical file structure for each of these what they call chunks, 64K sections that are memory mapped. And if you do the same thing with WinHacks that you do for the whole file, take two files, load them in at 64K, well, beyond the first 4K, if you look at 64K increments past that, you'll see a signature at the beginning of each 64K block. And you can potentially use this to recover individual fragments of log records. So even if you can't get the whole file, because these chunks are potentially standalone, the events are not going to cross a chunk, you can recover fragments of logs using this signature. So as I said before, the minimum size for the log files, I'm sorry, it's 64K plus 4K for the headers, so that would be 68K is the minimum size. So you're going to see a dozen or so, maybe 20 that are that size, and the rest of the logs are one megabyte plus 4, 1028. So if you're trying to data carve for them, the easiest thing to do is just give a specified size of 1028 to do the data carving. Now one of the surprising things about the documentation for this is that if you look at MSDN, you will get the impression that everything is XML. But if you go look at the encodings of the events, it's not XML. What they're using is a thing called binary XML, which is popular for cell phones, web browsers, and geographical databases. And it is a way to encode XML with very light weight and that you can put the binary XML in memory and walk it in memory. So it's a kind of serialization. It's got higher performance or lower performance impact both in space and time. It's very compact in terms of it uses string tables to store all the element names and tag names and attribute names. So it's compact because of that. And it can be 10 to 100 times faster to parse, in part because it can store binary data. So rather than having numbers represented as ASCII, you can represent them as binary. So if you've got heavy numeric data, you'll often get 100 times faster to parse. And that's important for doing the analysis because if you're looking at historical logs over a long period of time, the parsing will be the limiting factor, performance limiting factor. Okay, so how are they serialized? If you look at the binary XML standard, they have type values that precede individual numbers or strings. So a particular data item is going to be preceded by a type. If it's a variable length, it'll have a length following the type code. So a value here, we've got a num that gives us all the codes for the types and an integer number would be represented by F4 for the type and then 4 bytes for the value. Looking at strings, they'll be preceded by a length and a series of characters. If it's a string table that would be preceded by a type code that indicates it's a table. So these are the kinds of things that if you're recovering data from within an event, if you've got a fragment, this is the information that you would use to interpret the data. And finally, for the XML structure, each element is going to be in the tree structure of the document. Each element will be enclosed by a beginning tag which has a type code for the element and a reference to the string table that tells you the name of the tag and a closing type code that will tell you where the element ends. And there are no tools as it stands to do this in an automated fashion. So all this stuff, these are things that you can use manually, but you cannot yet, they're not tools yet to do recovery of even a fragment sort of repair files, and that's the area that I'm interested in. So if you do, so recovering these files, what kind of analysis can you do with them? These new XML events have rich information. Instead of a flat structure with an array of strings, you have a whole document that can be represented as a tree and you can do XPath filtering on the elements. You're going to have a lot of new kinds of information stored in the events and a lot more detail. And the tools now will do queries across multiple logs, so it's now easier with the native tools of the platform to analyze across a whole set of log files. And just about the only tool to do that at the moment is the native tool, the event viewer. And it has these new items to create a custom view or a filter to allow you to do queries on the logs. And you would do so by specifying XPath, which basically like a directory path, you'd be walking the tree of the document rather than the tree of a file system. So here we've got system slash provider, so that element of the tree equals CD burning service. So let's say we've recovered along with that shortcut, we've recovered a bunch of these files and we're going to filter them for that kind of event. What might we find? Well, here we've got a table with a series of just a few items from the records, the timestamp and the message that was associated with the event. And this is the kind of thing you would see if the native tools are used to burn a CD. So here we've got time, a message that says that the burning services started, messages that basically one minute intervals saying that the burning service is running, and then a final message that says that it's entered the stop test state. And this is consistent with this kind of pattern that distinguishes CD burning from the kind of pattern that you see when the system just starts up and starts services. So you'll see the service start when the system boots and you'll see messages about it starting, you know, for other reasons, but the message is saying that it's running at one minute intervals distinguishes this kind of pattern from others. And we can combine that with information in the timestamp. So we've got timestamps in the event log, timestamps in the shortcut, and we can correlate them. And we want to combine that with our understanding of how the system works to say something about what was going on. So we do that. We can look at the correlations and determine that a CD ROM was burned. We've got a SID in the application log associated with the events. And the timestamps on the events correlate with the timestamps. So we know the burning was occurring and the shortcut refers to the CD that was in the drive at the time it was being burned. We've got a SID in the event log. The shortcut holds a label for the CD that looks like a date and we've got a volume serial number. So we can use this to distinguish whether it was just possession of materials that should have been a part of the responsibilities of someone using the documents or transfer and that we can distinguish that it was CD burning that was occurring. And we can tell exactly what was burned, a document of interest. We know what size it was. We've got dates. We can tell if we had another source for this document, we might be able to correlate those dates and size and name to figure out if it was the same document. So we use information in this timestamp to do that. I mean in the shortcut to do that. Now there's one thing about these particular set of timestamps that's somewhat unique to Windows and that Windows, when you see a creation time that's later than the last write time, it means something different than on other platforms. So we've got a created time that's newer than the last write. How can that happen? Well that happens when you transfer files from one media to another. It gets created on the new media but the last time it was written is the last time it was modified. So Windows updates the created time with the time that it lands on the new media and the last time it was modified is preserved. So we can use that because we've got one of the timestamps that was preserved from the old media. We can go look back at various source media to see if we can identify that specific file and if we can find it we can say this file was transferred from this particular source. So by this we can do a couple things. We can distinguish whether or not it was transferred in terms of CD burning. We can distinguish maybe if we can find that file on another media file server, some external hard drive. We can tell where it came from relatively uniquely and that in addition to those timestamps we've got a volume serial neighbor which is unique to each volume. And we've got the timestamp and size to correlate with that individual file. So if all those things match for the file and the volume we can say something about where it was transferred from. Okay so that's basically it. If you want to know more about doing this on Windows XP there's an article coming out in digital investigation in a couple of weeks. I can't make it available till then but I'd be happy to send it to anyone that's interested. The journal allows authors to distribute electronic versions of the copies otherwise it's 30 bucks from their website or go to the library to get the journal. But I can send that and I would be happy to send that to anyone who's interested. I'll be talking about these kinds of things at a couple of other conferences and I would love to get a chance to chat with you if any of you will be at the George Mason University Forensics Training Symposium next week or at the Digital Forensics Research Workshop in two weeks. And lastly at the High Tech Crime Investigators Association International Conference at the end of the month in San Diego. So thanks very much for your patience. Appreciate it and I hope you found this valuable.