 Welcome back to the last session of today, intrusion detection and prevention. So let's see what we've studied so far. We looked at the basics of cryptography. Then we looked at security protocols. We looked at software security, malware, network security, and now finally intrusion detection and prevention. So intrusion basics. And intrusion is the act of gaining unauthorized access to a system or resource so as to cause loss, harm, theft, etc. What are examples of intrusions? Unauthorized logins, installation of spyware into a victim's computer or infection of a machine with a worm or a virus. So these are different examples of intrusions. How do you deal with intrusions? Usually dealt with by a combination of preventive measures, detection followed by an appropriate response. But what distinguishes prevention from detection? So any examples here of prevention? Closing all the open ports. Yes. Closing all the open. All the open ports only leave some open at least? Some. I mean service which, apart from the service that you host. So which ones do you want to keep open? HTTP for example. 25. So just check what Google keeps open. So I think you all did the experiment. I think it was HTTP and FTP. We just did a scan on the Google server and they kept open at least these two ports. Were there more or just two? HTTPS and HTTP, not FTP also. So one thing is to prevent attacks, try to keep as few ports open as possible. Only the least things that you want, just keep those open and keep everything else closed. So that's an example of prevention. Any other examples of prevention? Firewalls. To prevent stuff, okay. So you prevent intrusions, you just write rules so that only these kinds of packets can enter, not some other packets, et cetera. So like that, it would be a nice thing to go through different examples that you could possibly list of prevention and detection and see which ones are this and which ones are that. And are there any other handling techniques that are neither prevention, not detection, but possibly things like recovery, preceded by forensics, et cetera. Prevention anticipates various kinds of attacks and takes steps to forestall their occurrence. Examples include the use of encryption to prevent eavesdropping. So I know that somebody is going to eavesdrop on this line so before that happens, I encrypt all my messages. And code auditing to minimize the chance of a buffer overflow or an SQL injection vulnerability, for example. So I thoroughly audit my code using some good tools so that any such occurrence can be eliminated. Intrusion detection system. So we talked a little bit about prevention. Now what about detection? First, an IDS. So the key things over here, the key verbs are monitors, analyzers, and alert. So somebody asks you, what is an IDS? What is it doing? Three things. Monitoring, analyzing, and then sounding an alarm or an alert. First an IDS monitors events of interest. Now of course, what is this event of interest? Occurring in the target system or in the network. An event of interest may be the opening of a file containing sensitive data. Now the only problem with this is you can start monitoring every single thing that goes on and then you have huge amounts of data. So that is not a very good idea because you'll have a lot of false alarms for example and stuff. So you have to be very careful about what are your events of interest. An IDS generates a large amount of data which it then analyzes and converts into valuable information. So from data to information. Valuable information to be used by system administrators. Now that's where you can use all your wonderful machine learning techniques and AI techniques and so on in trying to analyze this data and making sense of this data. Based on all this data, do I see that there might be a DOS attack or some attack etc. And IDS raises an alert each time it observes anomalous or suspicious behavior. The IDS should be capable of learning, so that's where the AI machine learning comes in the picture should be capable of learning what is normal behavior detecting anomalous events when they occur and flagging such events. So what is exactly meant by normal? Is this the normal number of logins per day? Is this the normal number of TCP packets that have come in etc. So I have to establish a baseline and whenever I find something goes very high or very low compared to that baseline then I take action. So the first thing is monitor. Monitor various system or network variables. Analyze the recent value behavior of variables. So analyze so that you can find the norm. Then any particular value or behavior matches an attack pattern. So does this match an attack pattern? What is exactly this attack pattern that you're trying to match with? There is a database of these patterns and you're doing some kind of signature matching. Any guesses? Matching what? Yes. So you said behavior of a virus. So you're trying to find for example if there's a virus that has been unleashed into your system. So the most basic thing is to search for a sequence of bits because as we said before most of these things actually contain assembly code. So you search for this pattern of bits and if there is a match then you conclude so this is what I mean over here the value of the behavior matches an attack pattern. So have you found this? Yes then raise the alarm or alert. And no, the other kind of intrusion detection there are two of these anomaly detection and signature detection. Is there an unusually high number of TCP packets coming in for example? An unusually high number of ARP response packets and so on and so forth. If so raise an alarm. So this is exactly what one of your IDSS would do. So for example to distinguish between prevention and detection in the context of passwords passwords should be strongly stored securely not written on sticker pads and should not be communicated to friends, relatives and co-workers. So this is prevention of password password abuse. Store it securely don't reveal it to anybody, don't let anybody see you type your password and so on and so forth. This will prevent password abuse. Now detection of password abuse an employee has for 10 years never logged in outside of office hours say between 9am and 5pm today however this user logs in at 4am. This should cause the IDSS to raise an alert. So this is detection. You find something very strange happening and then you try to take remedial action. So prevention prevention measure or preventive measure and a measure involving detection. Another example of the two we've just seen this in the morning buffer overflow. To prevent it make the stack non executable prevents at least one class of buffer overflow problem. This will be a one class. Is this will it prevent all buffer overflow problems making the stack non executable? Is there a buffer overflow problem which doesn't involve executing stuff that is written on the stack? There is and it's called the return into libc buffer overflow attack. Return into libc the libc is a c library. So what you do is you just have a call to the system library call and you pass the appropriate parameters. The parameters are on the stack but the code is not on the stack. You actually use the code of the system call and you can still launch a buffer overflow attack. So making the stack non executable to some extent prevents buffer overflow but not in every case. On the other hand detection. Again we had seen this morning an example of detecting buffer overflow using the canary. The so called stack smashing attack. Using the canary value on the stack detects a buffer overflow and thus helps thwart its possible exploitation. So here I'm detecting. I'm not preventing it but I'm detecting. I see that that canary value has been changed just before I exit from that sub routine and I conclude that there was a stack smashing attack. Okay some of the key questions what are the different types of IDSs? What are the variables that the IDS should monitor? When should an alert be raised? When should an alarm be sounded? Where should the IDS and the organization be placed? So some of these questions for example what variables should it monitor? Now when you think of designing an IDS or any system for that matter you ask yourself how well does it perform? So what do you think are the figures of merit for an IDS? Say I design an IDS how am I going to evaluate my design? When I design a microprocessor or I design a piece of software I look at functionality I look at reliability I look at security perhaps and so on. When I design an IDS and I'm trying to sell this IDS to the world what how is it going to be evaluated? How well it performs for example so what is the metric? Yes exactly so that's what I wanted the accuracy of the IDS so this is a key issue over here the accuracy of the IDS so the false negative rate and the false positive rate so what's the difference between the two? So the false negative rate is there is an attack and this guy never said anything about the attack the attack took place I was not warned and I've been harmed so if this happens this thing is a false negative and if there are too many false negatives we say that the false negative rate is very high so this is very bad the not so bad thing is false positives but it's very irritating to the system administrator to have too many false positives what are false positives? The attack is not taking place and every time I'm being told there's an attack there's an attack to the point where I'm just bored and I'm just endlessly interrupted from my work because I'm being told there's an attack but the attack doesn't take place this is called a false alarm or a false positive rate so the false positive rate and the false negative rates are two measures of the accuracy of the IDS and of course another measure is the speed as I mentioned before there was this code red scanning worm, internet scanning worm code red and the faster I'm warned about it the better as soon as you can warn me at least I can patch the machine or put the IS server off so the IAS I mean the IDS should have enough speed besides accuracy so here are some sample variables log-in frequency to a particular account I monitor this and I've got support to monitor this there are different kinds of logs in the system there are firewall logs operating system logs, application logs database logs etc so I observe the log-in frequency to a particular account and if it's unusually high I suspect there's an attempted break-in into the system this guy is trying to log in quite a few times he logs in too many times let's say 5 times and they're all unsuccessful attempts typically the system should stop him and it stops him but then again after 10 minutes he tries again and so on and so forth so then there's something suspicious because I see my operating system logs from the same account there are so many attempts to log in then the percentage of half open TCP connections half open means the SIN packet has reached and you've send the SIN plus act packet but you've not received the third packet so in many of these half open TCP connections I'm suspicious that it's possibly a DOS or a DDoS attack then TCP header flags I see all kinds of funny combinations of TCP header flags set I suspect that the attacker is trying to do operating system finger printing so anybody knows what this means how do I look at TCP flags and suspect that there's operating system finger printing Ritu Bala had mentioned when she was talking about the Nmap demo that you can also do besides figuring out the hosts besides figuring out which ports are open you can also figure out which operating system and which version is running so the trick is you send the TCP packet with some weird combination there are six flags in the TCP header like SIN, FIN, RESET etc you set some illegal combination of flags and send it to this system now TCP is implemented as part of the operating system different operating systems and possibly different versions will react in different ways so some might send you a RESET message some might send you a FIN message some might send you something else some might not respond at all etc etc so based on the response you can figure out this operating system did this this operating system did send a RESET packet with a RESET flag on this other one sent a packet with a FIN flag on this one didn't do anything and thereby you can conclude that it was probably this operating system on the host with this version so that's operating system finger printing where you send a lot of these TCP packets with different header flags on different combinations of these flags on TCP connection establishment to an unused destination port why is this guy looking at port you know 48 and port 79 and so on and so forth does he really want to do something port scanning so attempt to find which services are open then payload of the incoming packet I find a specific sequence of bytes in the packet and that tallies with what I've got in my database of signatures, worm signatures I suspect it's a specific worm attack then I look and see that there's a sequence of operating system calls that this foreign software has executed I suspect that this looks like the signature of a particular worm or a mutant of a particular worm, a certain worm family that worm family you know executes this operating system call followed by this one, followed by this one etc etc so that's another signature for worms not just a bunch of bits but also a bunch of operating system calls actually the new worm signature is a graph of operating system calls you have a full graph maybe of 200 nodes where there's a link connecting operating system call A to operating system call B if there's a dependency between the two that is A produces something say a handler that B uses so this is a new kind of signature for worms it's called behavior based malware detection there are two dimensions of differentiating these things, anomaly based versus signature based and network based versus host based so what's the difference between anomaly based and signature based anomaly based intrusion detection involves making a determination whether the behavior of the system is a statistically significant departure from normal so the first thing that's done and it's a difficult thing to do in general is to setup the IDS and train it to understand what is normal behavior so the IDS will have to learn over time what constitutes normal activity usage and behavior unfortunately the definition of what is normal may vary as a function of time of the day or day of the week what is normal for 10 o'clock at night may not be normal for 10 o'clock in the morning and so on what is normal may also vary from one host to another what is normal on Sunday may not be normal on Tuesday etc signature based IDS signature based intrusion detection works by identifying a specific pattern of events or behavior that pretend or accompany an attack such a pattern is called a signature so what is this worm signature for example a specific sequence of bytes but you can't really rely on that completely because of you can't just say that code read worm every instance will have this sequence of bytes because worms could be polymorphic and they could be metamorphic anti-virus product by creating polymorphic and metamorphic worms a signature based IDS maintains a database of known signatures it attempts to obtain a match between the currently observed behavior of the system and an entry in this database so what is the sequence of operating system calls does it match with some sequence that I've got stored in my database if so then it means it must be this particular virus a real world signature based IDS will have thousands of attack signatures against which to compare a network based IDS an IDS that captures information about packets flowing through the network is referred to as network based an example of the information captured number of half open TCP connections ratio of ARP request to ARP response packets the percentage of HTTP packets etc so what exactly to capture is you need a lot of intelligence use that information even more intelligence so the IDS is really something that is designed by experts a host based IDS typically implemented in software and resides on top of the host's operating system its main job is to monitor the internal behavior of the host such as the sequence of system calls the files accessed etc very much system variables rather than network variables it makes use of system logs and operating system audit trails to identify events related to an intrusion so we just talked about this an undetected intrusion is referred to as a false negative an IDS generates a false positive if it raises an alarm even though there is no intrusion currently occurring or about to occur so two aspects of IDS accuracy so some papers will talk about the sensitivity and the selectivity of an IDS as you can guess high sensitivity implies a low false false negative rate high sensitivity that's very sensitive it has detected all these intrusions so low false negative rate while high selectivity implies a low false positive rate I'm very selective about what I say I don't just say it's a false alarm and it's not actually an alarm so that's high selectivity one nice kind of IDS like device is a honeypot so honeypot is a closely monitored network decoy what is this word decoy a decoy is like a trap that you lay say for example you're trying to trap a tiger there's a tiger roaming around in this place in this land and you want to trap it so you put a decoy you put a goat or something tie the goat or a dog or whatever to a tree so the sound of the goat or the smell of the goat will attract the tiger in the night and then when you see the tiger you can immediately trap it so that's a decoy so the same thing you're doing with the honeypot a honeypot is a closely monitored network decoy you planted somewhere in your system it distracts the adversaries from more valuable machines on a network so the first thing is somebody is trying to attack your network hopefully it will see the honeypot that's why it's called honey actually because it's very sweet and so on they will be very attracted to the honeypot and they will try to attack the honeypot instead of attacking some of the sensitive machines on your network not only that so the honeypot is like a phony kind of thing so it attacks the honeypot and then you get all the viruses and other things that have attacked this and then you analyze those things analyze who sent this what was inside this etc etc which IP address did it come from so it distracts the adversary from more valuable machines on a network provides an early warning about new attacks and exploitation trends and allows in-depth examination of adversaries during and after exploitation of the honeypot so you deflect the attack deflected from sensitive machines to this honeypot and then of course you study what has happened to this honeypot who has come there and so on and so forth many new worms will first enter into the honeypot so you can immediately capture them and see what the whole payload looks like etc another person says it's a security resource whose value lies in being probed attacked or compromised so it's like the opposite you don't want something to be attacked but in the case of the honeypot your most eager that somebody attacks it so that you can find out more about the attacker now some measures to handle DDoS so there are different kinds of measures in different places it could be in the host, it could be in the network etc so packet discarding is one cache, syn cookies egress filtering distributed drought filtering various detection techniques and then finally handling it through IP trace pack so the first and very simple approach categorize the IP addresses that is the source IP address so you're receiving packets now which packet to allow and which packet to discard categorize the IP addresses is almost certainly genuine if it's almost certainly genuine you allow this guy to come in probably spoofed you be a little careful and so on and so forth so you just give them different grades the guy you really trust the guy you don't trust at all the guys who are halfway in between and so on under moderate load load conditions allow all incoming send requests to be entertained so when there's very little load find no problem everybody can come in even if they perform a DDoS it's not going to be very serious then when the load increases you have to be much more cautious under rapidly increasing load packets with unfamiliar source addresses are discarded with high probability the disadvantage is collateral damage is possible so can you think of an example where this might be might happen you have an e-commerce site for example and there's a very high load should I discard packets with all unfamiliar source IP addresses there's just an advertisement about my site on television so everybody wants to visit my site you've just heard that I've got a new product and everybody wants to buy it now many of these are first time buyers at my site should I discard them? no so what I'm doing is I'm actually hurting myself from the foot this is called collateral damage collateral damage is a military term used when your own soldiers by mistake fire on your own soldiers that's called collateral damage so unfortunately I'm hurting myself because potential customers to my site have been prevented from proceeding so you're discarding all the packets from new customers which is not a good thing at all so that is one approach another is handling DDoS via sin cookies how many of you have seen sin cookies what is the idea of a sin cookie so let's see it's a simple idea and it's implemented in many operating systems today the responding machine places a sin cookie in the sequence number field of the second handshake message the cookie is computed so it's not a constant number it's computed as a hash function of the source address destination address source port, destination port and a secret the initiator of the connection dispatches the cookie it just received in its act message upon receiving the act the responder recalculates the cookie and verifies that it matches the value enclosed in the received act only then does it reserve buffer space for the connection so you choose a random number for choose a random value for the sequence number in the second message of the TCP handshake and as you very well know the sequence number has a direct relationship with the acknowledgement number in the subsequent packet so you put that number random and you just see whether the guy sends it to you at a certain point of time the initiator if he does then he's a genuine guy otherwise the possibility is that it's a spoofed packet so this is a sin cookie and you can enable it only when you sense there is a lot of traffic and there is a DDoS, the onset of a DDoS attack is this clear or should I explain it again the idea for sin cookie so you choose a random number which is actually a pseudo-random function of the source address, destination address source port, destination port and some secret and you put that value inside the so that secret also will change over time and you put that value inside the sequence number field in the TCP header so the initiator sent a message the responder is responding that is the server is responding so you put that value inside the sequence number field and you wait for some time to get that value now what happens is the attacker will use the spoofed address he will not be able to see that cookie because it goes somewhere else altogether so you start on with the third message and then you time out and you just reclaim that memory space etc so this is the idea of a sin cookie as I said it is implemented on all modern operating systems the other simple idea is that of a sin cache while a connection is in the half open state minimal information about that connection is stored in a hash table called a sin cache so typically you would store something like 4 kilobytes or so for each connection request that is the buffer size but right now do not allocate so much space just leave only about 300 bytes not 4 kilobytes but right now because you have not completed the whole connection just leave about 40 bytes or so so you do not waste too much memory space the minimal information that you reserve includes the TCP sequence numbers the size is the sequence number sequence numbers and act numbers 32 bits so reserve 32 bits for that 32 bits for the act number source port how many bits for that 16 source port destination port IP address just reserve that minimal information which will take how many bytes maybe around 20 bytes or so 20-30 bytes that minimal information includes the TCP sequence numbers reserve only that much space for an incoming connection and only when you get the third message you reserve the entire 300 bytes so a full buffer of about 300 bytes for a given TCP connection request is allocated only upon completion of the 3-way handshake so this is the idea of a sin cache you just cache the connection parameters and then you allocate the full 300 when it is completed egress filtering another measure now so this is not at the host the other two things sin cache and sin cookies were at the host this is somewhere in the network your own network DDoS attack packets typically contains poofed IP addresses the egress router is the last router encountered by any packet generated inside an organization's network before it enters the internet if the source IP address of an outgoing packet does not match any address in the organization's network the egress router drops the packet by thus detecting and filtering poofed packets it helps prevent DDoS attacks so the idea is very simple over here so this is the network of your organization this is the egress router this is the last hop before it enters into the internet so this is the subnet and then this is the internet so what you do is every single incoming packet that you see everything that goes to the internet will go through this egress router you look at the source IP address of each of these things and if you see that it's an address that doesn't belong inside this domain then you discard that packet it's probably a spoof packet from somebody inside this network so that gets discarded because as I said before most of these DDoS attacks used spoofed IP addresses so this is one way of preventing any kind of spoofing of IP address so this IP address of an outgoing packet does not match any address in the organization's network inside the organization's network the egress router drops the packet by thus detecting and filtering spoofed packets it helps prevent DDoS attacks so these words are interesting detecting and preventing in the same sentence by thus detecting and filtering spoofed packets it helps prevent DDoS attacks interesting use of those two words in the same sentence you are detecting spoofed packets and thereby preventing DDoS attacks okay distributed route filtering this is another research idea that has come up I don't know whether it's actually implemented but somebody from I think it was Purdue University came up with this idea some years ago DRF is an extension of egress filtering to routers in the core of the internet so now notice where we have gone we started with handling strategies in the host then we went to your own network and now we are going to the core of the internet so DRF is an extension of egress filtering to routers in the core of the internet a DRF enabled router maintains for each of its interfaces the set of all source addresses from which packets arrive en route to some destination now as you very well know routing in IP looks at the destination address over here you are looking at the source address and you are seeing this is the source address therefore it must have it should have come from either this this or that if it doesn't come from there you discard it the filtering decision is straight forward if a packet with source IP address is equal to s arrives by interface it should not have that packet is assumed to be spoofed and is hence discarded let's try and understand what this thing means so probably a good example would be to take some cities everybody is familiar with this there is something in the center let's take Agra anything to the south of it Gwalior is that correct more or less okay we just take different directions New Delhi Jaipur to the west okay and I guess Kanpur would be southeast something here southeast Kota Kota so let's imagine this is a router and it's getting packets from different directions I was looking at the source port do you think it would be okay what would this guy's reaction be if he sees the source address being Mumbai and coming from the link from New Delhi something weird right what is going on over here Mumbai is somewhere there and how can you approach wherever you are going say you are going to Kanpur why would you be going from Mumbai over here in general this is not this doesn't make sense if you were coming from Mumbai you'd probably be coming from say this link or this link perhaps or maybe even this link but probably not from this link or this link so that's this idea of distributed route filtering it's an extension of egress filtering if the source address is Mumbai then you must have come from here most probably from Kota but if there was congestion on some link over there maybe you took a little detour and you came through Jaipur or you took a detour this side and you came through Gwalior but certainly not I don't suspect you came from Kanpur or you went through New Delhi and so on and so forth okay so I just look at the source and I discard it so what I say is if it is Mumbai it should be either this or this or this because most probably it's poofed so that's the general idea so the filtering decision is straight forward if a packet with source IP address is equal to s arrives by an interface that it should not have there's no reason for it to come from New Delhi that packet is assumed to be spoofed and is hence discarded okay so that much about some handling strategies now what about detection here's where you can use a lot of statistical stuff forecasting stuff and so on monitor the number of half open TCP connections a half open connection is one for which the first message of the TCP handshake has been received by the server the second message has been sent but the third message has not yet arrived so you monitor the number of half open connection if this number or this percentage of half open connections exceeds a certain threshold then you suspect that this is the onset of a DDoS attack so statistically you can come up with some very nice algorithms to do this just as a SIN packet is used to establish a TCP connection a FIN packet so another little algorithm everybody knows that when you establish a connection use a SIN packet when you terminate the connection use a SIN use a FIN packet I mean normal termination there are terminations with reset also but I'm talking about a normal termination so just as a SIN packet is used to establish a TCP connection a FIN packet is used to terminate it if in a given time interval so whatever the time interval you have to design that if in a given time interval the number of SIN packets greatly outweighs greatly outnumbers the number of FIN packets then a DDoS attack may be underway instead of looking only at the current interval so the interval may be too small look at the last few intervals to determine whether there has been a build up of SIN attack packets so how many intervals you look at what is the size of the interval what are the thresholds all of those things has to be very cleverly intelligently figured out then another aspect of handling this thing is from the law enforcement point of view is sort of like 4N6 is IP trace back so I'm trying to detect it I'm trying to prevent it first prevent then I'm trying to detect it I'm trying to handle it at different levels at the host within the network at the egress router at the routers in the core of the internet in a very rich area now what about IP trace back I want to know who is behind sending those spoofed packets or the guys that are behind it so the source IP address and DDoS attack packets are typically spoofed hence we cannot rely on their source IP addresses to determine the subnet from where they came instead we attempt to identify the path or the paths traversed by the attack packets this idea is known as IP trace back now there are two principle approaches to IP trace back either the packet keeps track of the routers it has visited this is term packet marking or each router keeps track of the packets passing through it this is termed as packet logging so can you just think of a way suppose you have to design a way for this packet marking how would you do it so the idea is you start at some point this is the place which is bombarding you and you receive these attack packets and you're trying to figure out where this thing came from what is the path so here is one suggestion in the literature can you somehow put a chap on the IP packet in the header of the IP packet in the IP header which might be a good place to write something so the average hop path the average hop count on the internet is about 10 hops now one thing I cannot do is I cannot take this packet that I started over here this is the source of the packet and this is the victim clearly I cannot put a 32 bit quantity by each of these guys my header is only typically what is the IP header typically how many bytes 5 bytes sorry 5 very good so 20 bytes typically and each IP address is 4 bytes so if I'm going to put 4 bytes of this guy's thing over here and so on this is going to be very very large but that's not a very good idea I would like to do that I would like each router to be designed such a way that it puts its chap on this thing so then if I get this packet finally and I look at whoever has put its chap on it I'll be seeing this guy's address this guy's address and so on but that's not very good because the header gets large and you're just wasting bandwidth can there be a smarter idea can you think of anything that can be done there are many many suggestions but let's look at the simplest basically what is the problem this is the attacker let's assume for the time being it's not even DDoS it's just DDoS he's attacking and he's got some spoofed IP address so this is the source of the attack and I'm the victim and here is my suggestion that I use this thing called packet marking so I mark something on these packets basically the identity of these guys my problem is I can't put 4 bytes of this guy 4 bytes of this guy all their addresses because this will get so large this is already 20 bytes and if the average hop count is 10 then I'm putting something like 40 4 times 10 or 4 times 9 I'm adding greatly to this header that's not a very good idea and I've got to change the entire IP protocol for that so is there some way the first thing is if I want to put some kind of chap the question is where to put it inside this header so it turns out that there is one field that is not used too much can you think of any field that is not used too much in the IP header I think it's a 16 bit field in the context of fragmentation there is this ID field which identifies you see if you fragment the IP packet then all those fragments will bear the same ID number so that at the receiver and you can put them together now that is not used too much because these days there is less need for fragmentation so very often that ID field is left blank or unused and that's a 16 bit field but now I've got another problem ID field and how do I accommodate all these IP addresses I can't even accommodate one of them so anybody has a solution to that think about the good old hash the hash function suppose I've got a hash of the IP address and I've taken an 8 bit hash for example hopefully it doesn't collide too much 8 bit hash maybe I can put two of these addresses and that's exactly what they do they put only some of the addresses so for instance I can put a 16 bit hash or I can put two 8 bit hashes in that ID field so suppose I put one of those one 16 bit hash in the ID field then I probabilistically put my chart on that field so with probability p I put it and with probability 1 minus p I don't so as it's going along I may put it I may not put it this guy may put it this guy may not put it and so on and so forth so it's entirely possible this attack packet has gone over here and it's this router and this guy has put overwritten on that chart so I see this guy's name on it then it might be the case that in the next packet it may be this guy put a chart and this guy put a chart but nobody else so ultimately if I collect enough attack packets I will get a good idea of the path from where this is coming of course assuming the path is fixed and it doesn't keep varying all the time so that's this idea of packet marking so the packet keeps track of the routers it is visiting and this is called packet marking the other approach which is the opposite is each router keeps track of the packets passing through it this is term packet logging again this looks very fantastic how can you look you're seeing you know gigabit links you're seeing so many many many packets per second how can I keep track what sort of data structure should I design so that I keep track of every single packet that has come to me again there are some very nice solutions to this so first packet marking what I just said the IP header has no provision for keeping track of the routers it has encountered however a 16 bit ID field in the header is employed over here a router's IP address is 32 bits hence the IP address is not used to identify a router instead a global fingerprint of a router is used this is a 16 bit hash of the router's IP address to complete the story probabilistic packet marking an intermediate router writes its fingerprint the 16 bit hash value into the ID field of a packet's IP header with probability p now p is again a design parameter note that it could overwrite a previously written fingerprint of a router closer to the source of the attack each ingress router has a map of all the upstream routers from it to identify the perpetrator of the attack the ingress router at the victim end will need to collect a sufficient number of attack packets that are part of the same flooding attack based on the value of the ID field in the collected packets the victim can construct the most probable attack path so this is one way in which law enforcement can figure out who is the perpetrator of this attack where is all this emanating from the other approach is called packet logging with packet logging each router attempts to keep track of every packet that passes through it in say the last 5 minutes so I cannot keep track of too much but at least let's see if I can keep track of the packets that have come into me during the last 5 minutes span the average size of an IP datagram is 500 bytes so the amount of space required in each router is absolutely ridiculous it is 1 terabyte so who is going to spend that much money to put 1 terabyte on each router the space requirement can be greatly brought down the details are in the textbook can be greatly brought down by the use of something called a nice interesting data structure called a bloom filter so you can bring it down drastically from a space of 500 bytes per datagram to just something very close to about 10 to 12 bits per packet you only need that amount of storage per packet that is coming in and again I am only storing packets that have come into me the router in the last 5 minutes so then what happens is when I have stored that information here I am the victim I ask everybody have you seen this particular packet all those who are connected to me have you seen this packet so of course I know where it has come from so the correct guy will say yes it has come from me then he will interrogate the guys further upstream have you seen this packet have you seen this packet etc so that is actually the hash they compute a hash of the content and so on have you seen this so one of them will say yes I have seen this thing with this particular hash value the hash value of the packet content then he will interrogate the guys further upstream have you seen this hash value etc so one guy will say yes and so on and so forth and finally they will be able to trace back this is the source of the attack so this is through packet logging rather than probabilistic packet marking so these are two approaches that have been greatly studied in literature and also you've got hybrid approaches which are a combination of the two we've talked a lot about DOS and DDoS but what about the challenges in worm detection so one of the big things is I've got a database of worm signatures but what happens if there is a zero day or zero hour worm everybody knows what is a zero day worm I've never seen this thing before if I've never seen it before it's been unleashed just yesterday my antivirus product doesn't have this in its signature database now what do I do, how can I detect this thing so that is one challenge how does one detect a new worm that has never been seen before a database of worm signatures will almost certainly not contain the signature of the newly unleashed worm so this is a serious problem because new versions are being created all the time the next thing is speed efforts by humans to detect internet scanning worms could take hours as we had seen before this is clearly unacceptable worms such as slammer which spread across much of the internet in just 15 minutes and then the other thing that I mentioned and that is non-monomorphic worms polymorphic or metamorphic worms detecting such worms is not straightforward since multiple instances of the same worm may not contain the same substring so what you do in these worms as I mentioned is that the payload is encrypted so when I'm seeing this thing going through the internet all these routers and all are seeing it they see different bits different sequences of bits in different instances of the same worm so what is the solution so one solution, this is again one of the modern things that is being done is what's called behavioral based malware detection dynamic behavioral based malware detection what it does is it captures that entity you suspect something that's coming to your organization as a worm I don't know it's all polymorphic and so on it's a secluded, isolated environment because I don't want it to create trouble so in a sandbox and then I execute the thing and when it's executing guess what I look for its behavior what does behavior mean how do I capture behavior basically the worm has to make operating system calls it has got to for example write into another file if it's a virus write itself into another file when it wants to spread or set up a TCP connection to infect somebody else and so on and so forth so I monitor the system calls and from the sequence of system calls it's not even the system sequence but it's a graph of system calls from the graph of system calls I'm able to deduce which species it belongs to so there might be a total of 10,000 different worms but many of them are mutants of the same worm for example there might be 100 mutants of the same worm so that's how I figure out all those mutants will have the same operating system graph this graph is a graph of dependencies that is A has to take place before B takes place and so on and so forth so there might be only there might be say 1000 mutants for each species so my signature my database of signatures will have only for example 100 entries each entry will correspond to 1000 variants so this is the idea and I can very easily see now based on its behavior some other one so that's basically the solving on IDS and IPS