 Welcome back everyone. Today in what I'm reading, I'm going to be talking about a functional reference model of passive systems for tracing network traffic by Thomas Daniels from Iowa State University. This paper was published in Digital Investigation. It was accepted in December 2003, published in 2004. So in terms of digital forensics and digital investigation, this is almost a lifetime ago, really. Lots of things have changed since then. So I went into it thinking that it is an older paper and we, let's say we've advanced or at least we know a lot more since this has been published. But it was still interesting to read it and see what people were talking about in 2004 about passive systems tracing and network traffic. This paper was specifically looking at making essentially a reference model. So you'll see that they talk, if you read it, you see that they'll talk about a reference model. And the overall paper is quite, really, in my opinion, quite general. They were trying to generalize their model where I didn't particularly care for that generalization. I didn't like the fact that they were trying to make a reference model. I would have preferred, you know, they just describe exactly how they actually applied what they were trying to do. And they do give a formal model that describes essentially their overall system, but I don't really see it as a reference model anyway. Overall, looking at passive, basically passive network analysis. So their whole idea is just like other monitoring systems. You have these what they call observers in the system and they're monitoring monitors that passively capture data without modifying the data. Well, that's not so new, right? We have these passive monitors placed throughout the network or throughout different segments of the network that are just passively capturing this data without modifying it. And they describe an observation from these observers as some subset of data with the monitor ID appended. So whatever node actually observes some traffic going through it, it's sending all of this traffic to either its own storage or a central storage repository. And it's tagging that data with its monitor ID. Once it has the monitor ID and then you have all the data in kind of a centralized storage system, then you can know essentially where the data is coming from. Okay. Yeah, so it doesn't really say it's in the title, but basically what they're essentially trying to do as far as I understand it is looking for origins from a digital forensics perspective. Where is the origin of traffic coming from? We can't necessarily trust, for example, the IP address or even the MAC address because we could spoof those. So we're looking for the origins of network traffic. So the idea is that you have these sensors all around the network and whenever traffic flows from one segment to another, then different monitors see it and then you can correlate basically different traffic sent to a centralized analysis system based on the monitor ID that actually saw the traffic when the traffic was seen and what the traffic actually is. We have data or observations that are stored and the observations are defined as a subset of all of the data. I'm not really sure how they pull out the subset of data, but you have this data and an observation is some subset of the data that you're actually interested in. Remember, this is all pre-incident. So this is a monitoring system, which means you actually control the system. For forensics, if we're looking at digital forensic investigators, we would have had to actually know that some event is going on to be able to use this system. Analysis programs do node correlation or relationship building. So they have these kind of sensors around the network that they call monitors or observers. They're making some sort of observations or basically data collection throughout the network, and then they have these analysis programs that do correlation or relationship building either within the data, within the monitor ID information that's appended to the data, or just relationship building between the data that's sent between one segment or another. Really, I think the most interesting thing that I saw in here was sufficient functionality for passive origin identification. They do introduce these five mutually sufficient conditions for passive origin identification, and they are network separation by trusted monitors. I'll describe these in a second. Network separation by trusted monitors. Enough storage per monitor to accommodate the analysis, analysis program to collect and process observations from the monitors, trusted communication path between analysis programs and the monitors, and correlation of an input to a given output across every relay. The first two, they kind of describe the rest of it. They don't really describe very well, but basically network separation by trusted monitors. You have essentially different segments of a network, and the idea is that, let's say if you have sensors on both sides of a gateway, for example. One sensor thinks that data is coming in through the gateway and going to one of the nodes on its segment, but the other sensor did not see that data going past the gateway, then you know that the data must have originated from within the segment rather than going actually across the gateway. That's kind of the idea, so network separation either into segments or placing these sensors somewhere where they can actually tell about the flows of the data, but they don't really describe flows. Again, I think it's mostly because the terminology wasn't really defined at the time in 2004. Enough storage per monitor to accommodate the analysis. Again, in 2004, if you're trying to collect, especially raw packets, you would have had to have huge amounts of storage space, which would have been a problem potentially back then, but not so much now. Again, they are collecting these observations, and I'm not really sure at what level they were collecting these observations if it's just all packets or particular protocols or whatever, but obviously if you have a big network, then collecting raw packets for the entire network might require a lot of traffic even today. Enough storage per monitor to accommodate the analysis, and analysis program to collect and process observations for the monitors. Once they actually get or collect all the data going through these different segments, it's mostly dealing with flows here, then some sort of analysis program that can analyze the data quickly enough so you can then remove the raw data, keep the observations or keep the analysis part of it, and then you don't have to worry so much about storage. A trusted communication path between analysis programs that I didn't really see them talk about much about that, but they did kind of allude to the fact that their monitors could be compromised, and apparently if you're sending data from a monitor to a storage area, then I guess that could be compromised as well. So I think that really refers to the fact that these sensors could be compromised in the storage, storage locations could also be compromised. So they just describe a trusted communication path, and a correlation of inputs to a given output across every relay. So yeah, they also really didn't describe that too much either. Network separation allows monitors to determine the origin. Again, whenever data is moving across network segments, the sensors are basically correlating that information. And yeah, that's how they're figuring out where things, the actual origin or trying to determine the origin here. Storage, like I said, the storage was an issue in 2004, not so much an issue now, but still could be if you're trying to do raw packet analysis, but then you would probably do it hopefully fast enough, you wouldn't have to store too much. They do describe forensic implications. So forensic implications, there's a section on it. This section is, again, I can't really tell if it's not just not very focused or not very well written, or if it's just the fact that terminology wasn't really well established yet. So I think terminology here is a bit of an issue. So just keep in mind, if you read this, then some things don't really seem to make sense, or they seem very vague, and it's probably because they weren't really well defined at the time, whereas now we have definitions for them. So again, think of this as kind of an archive paper or something a little bit older that still has relevance, but some things don't really make a lot of sense. So forensic implications I have here. Current techniques are only useful for investigation. That's a direct quote. Current techniques are only useful for investigation. And there's no really explanation what they mean by that. I think it's the fact that it's just passive, and for example, at the time maybe they couldn't respond to incidents that were going on. I'm not really sure what they mean, and they don't really elaborate on that quote. So I thought that was interesting. They're thinking that passive analysis is only useful for investigation. Well, I don't think I would agree with that even at the time, but again, I don't really know the context that they're saying it in. Limited to a single type of traffic. I think here they're talking about how to filter out traffic so you don't use as much space, or again, I don't really know the context around that, but they're talking about limiting analysis to a single type of traffic. Data reduction removes important information. This is related to the amount of storage space that's available, and they say that data reduction methods basically remove some of the redundancy that is potentially interesting for digital forensic investigators, and does not give investigator enough information due to transformation. So apparently there was some sort of abstraction in the techniques at the time, and that abstraction was reducing the amount of information that was useful for digital investigators a little bit too much, and that basically results in that investigators using methods similar to this method could figure out essentially where an event occurred, what the origin actually was, but once they actually figured out the segment that the attack took place on, they wouldn't be able to figure out exactly what machine did it, and I think that again the context is a little bit fuzzy, but I think that's pretty much what they're talking about whenever they're talking about these transformations, and that it didn't give the investigator enough information. So then the investigator would have to basically go to that segment and then do a manual investigation from that point as far as I understand it. This is proposed as a reference model, and because of that I think they tried to be quite general. They do give formal definitions of their method. So formal definitions of an observation, formal definition of internal external monitors, where it was kind of the state machine model of networks and passive network origin identification. So really this should have been just called passive network origin identification. I don't think it really should have been a reference model. I think the formalizations, the model that they actually give is interesting. If you're looking at this today, I think it doesn't hold up very, very well, but it's still interesting to read it in the perspective of 2004. So some of their conclusions I don't really necessarily agree with. I don't think they've done them well enough. So they did describe several conditions that are sufficient for determining the origin of NDE's or basically the monitoring systems. So the conditions, I think their biggest contribution probably were these conditions, which is network separation by trusted monitors, enough storage per monitor to accommodate the analysis, the analysis program to collect and process observations. I don't think the program to collect and process observations from the monitors is a necessary condition to be able to do it. I think as long as you have the data, well, I mean you would have to analyze it anyway. So a trusted communication path between the analysis programs and the monitors and the correlation of an input to a given output. So basically what they're saying is the most interesting I think is probably this separation of trusted monitors to actually be able to identify origins and then everything else is, I won't say extra, but not so interesting. So overall, again, considering how the paper, it's 12 years old at this stage, almost 13 years old, it's pretty well written. I understand most of the English in it. Sometimes they were a bit vague or I think they had too many ideas at the same time, which made it a little bit difficult to understand. So for example, the implications for forensics, so, sorry, where was it? Yeah, the forensic implications, they have a lot of different ideas in here and some of them they just kind of throw out the idea, but they don't really describe what the implication actually is for forensics. So they say another major forensics problem facing is they focus on single types of transformations or are host based. So they actually talk about host based transformations quite a bit, but I think a lot of the ideas that they put forward, they don't really back up very, very well. They didn't explain some things very well and the organization of the paper could be quite a bit better. Again, this was the first issue of digital investigations, so you can't really hold them to today's standards. I think it's overall quite well done and, yeah, still quite formal. So, yeah, if you're interested in what I would call basically just, well, if you're interested in passive monitoring and looking for origins within a network, network traffic origins, then this could be an interesting read for you. If you basically, yeah, so passive origin identification, I think that's probably what the paper should have been called, passive network traffic origin identification, something like that. Anyway, so that's what I'm reading today. Not a bad read, a little bit old, but still interesting. So that's it for today. Thank you very much. If you liked this video, please subscribe for more.