 Good afternoon, I am Virender Singh from Department of Electrical Engineering. Today I will discuss briefly about the network inclusion detection system. What are the components, how it is implemented. Perhaps you have brief idea about how network inclusion detection system is implemented. Perhaps there was one lecture before this one. So this is the brief outline. What is the network inclusion detection system is, how this can be implemented and then here some of our earlier work wherein we try to implement that network inclusion detection system that can give you almost the line speed to match the network traffic. So let me go through the brief introductory stuff. Perhaps by now you must have been gone through this multiple times. Maybe most of the speakers might have been covering this stuff. So what is network, computer network inclusion and as per definition this is defined as to obtain some unauthorized access to a computer or file or a program to control the state of your computer. This may be malware, this may be, so malware can be like frozen, this can be spyware, this can be viruses or this can be rogueware, there are many other thing or the social engineering attacks. This can be tried from the insiders in a organization. Abuses of the privileges extended to some members in the organization or this can be like attacking to the weak security practice that is like your password guessing you have all of us do have some password that is personal to us. In the morning lecture I think there was some discussion about how to have a password that password should be say easy to remember for you and difficult to gas by others and many times we use some very weak password like date of birth or name or surname or spouse name or something like that which are easy to gas. So all the means through these practices you can include inside the system. I think I do not need to explain much all of these things have been must have been said earlier that the internet growth has seen a tremendous growth pathway in the past almost 400,000% growth in the number of internet users in last one decade and perhaps these are little bit older data might be more now. Now again here according to 2011 ITU databases the number of internet users were almost 2 billion. So that is almost one third of world population uses internet therefore it is very very important. If you look at the malware growth in that case here again malware growth has been significant in last couple of years. So as per the Panda Lab record they recorded about 60 million malware stance in 2010 as compared to the earlier 2005 where in they had only 92,000 malware. So there is a tremendous growth Cisco also reported about 139% growth in malware in 2010 as compared to 2009 as I said these are little bit older data but the trends are more or less same. Now if you look at these malware or the there are two things are all these malware the new family of malware or these are is replica or some kind of obfuscated malware of the early generation. And then if you see then the family of malware grows almost linearly however the total number of malware are grows exponentially and that is a big concern how to deal with these kind of malware. There were many attacks in past you must have seen that and many of the attacks must have been discussed in the past like here intrusion on the internet giant like Yahoo on and off you means at least I do get messages that I am stuck in London or Spain somewhere and something has been been my purse has been stolen passport has been stolen and then now can you help me. So all such kind of things are happening due to these intrusion. So if you look at the layered network security in that case network is secured in many ways and firewalls are the primary equipments who can secure your network may be firewall can be implemented in hardware or this can be software firewall. So you have your internal network and there is an ideas and then the you have firewalls and then through the fire you want to keep some of the like web server etc. out of firewalls. So they are in the demilitarized zone DMZ and so now in this architecture firewall does some job that is like here mostly it inspect the header of the packet which is coming to your network and based on that here it classifies whether it could be a security threat or not and then here may be sometimes it gives you warning that this might be a malicious or if you have properly configured your firewall it can stop that. An ideas comes after your firewall wherein you have to look do something more than what firewall is doing. So now when you are receiving the packets or internal traffic it has to any malicious code or can be embedded anywhere in the payload see it is not necessary that it may be only in the can be detected through the header. So therefore you have to inspect the entire packet. So an ideas generally looks at the payload wherein it does the deep packet inspection. So now you can see the difference between the firewall and an ideas firewall typically looks at the header therefore the amount of processing you need to do is much lesser than the amount of processing that you may need to do when you are going for deep packet inspection that is needed to be performed in an ideas. Now one thing is very clear that we have to have deep packet inspection of the data or packets that we are receiving and you can see that at network interface we are receiving data at very high rate may be billion bits per second or something like that and therefore it should operate at that line speed otherwise it will significantly slow down your traffic and traffic will start to chop at your network interface. So that puts some stringent requirement on the implementation of an ideas as opposed to the firewall wherein you have simple rules and then you are taking only at the header of the packet that you are receiving. So now there are two approaches to check this one is called as the anomaly based detection and other is called as the misuse detection. Anomaly based detection is a more statistical approach wherein you try to model the behavior of the applications that you are running and now you want to see whether when I am getting any traffic or data from the outside am I violating that behavior or it is exactly following the same behavior which is expected from application like for example the behavior of a browser if you use internet explorer or you use Google Chrome or you use Mozilla Firefox when you are invoking those and running those they have to go through a sequence of system calls and now may be based on the sequence of system calls you can model the behavior of these use of these browsers and now when you want to when you are diverting from that behavior you would like to warn that here hey this is not the behavior that I know therefore this might be a malicious data and then perhaps you can do more deep inspection of those data and can figure out whether it could be a malware or not. So that is based on totally based on the statistical approaches as you have to model the behavior and then you see that how far you are from the behavior are you diverting say 10 percent, 15 percent, 20 percent or you are over within the boundary may be 2 percent, 3 percent so based on that you are making a decision. So now the big task here is to build a model that can capture the behavior of the system so and which is not that easy job typically in the research community people try to build a model based on the system calls and more over the statistical system calls and based on that use the graph theoretic formulation to see whether you are violating that behavior or not. So the good thing about anomaly based detection is you can protect against the zero day attack. So that means here the moment you are you are diverting from the expected behavior it will warn you therefore the day the malware or virus is lost you can it is likely that you can capture that however this pro gives very high false positive because whenever I mean it is not easy to model the exact behavior of the system therefore whenever there is a little diversion it will start to say that here this may be a suspicious packet and you will start to inspect that so that is like a kind of warning it is used. So and then here this is also vulnerable to the statistical poisoning so if there is a something mismatch in the statistics somebody can make use of a statistical manipulation to get away from this one. So it has some of the positive some of the negative and therefore it is not widely practice although it is becoming now more popular as I said that the virus malware families are not exploding that much the growth is more or less linear however growth of the total number of viruses though which are malware which are around us is exploding exponentially so therefore this anomaly based detection attract the attention of researchers in security community nowadays. So the other approach is called as the misuse detection or signature best approach so in misuse detection you have a repository of signatures which you have seen earlier so that means here at least once that virus or malware must have been detected once it is detected you can put that in the repository and after that you can check against this repository whether you have that kind of malware in the incoming data traffic or not. So you must know the attack patterns vulnerability is that you have so once it is known in that case you can protect against the previously known attack but then this is vulnerable for the zero day attack because you have no clue about the new malware and it will treat that malware as a regular traffic and hence it allows it to pass so it will make the zero day attack damage anomaly is more statistical approach whereas the misuse is signature best approach so once you know the attack once it is detected you can make a signature of that put that in the repository and then for the remaining days may be you can check your traffic against those signatures and if there is a match you raise alarm say that this is a vulnerable data otherwise you let it go so the attack should be known or attack both patterns should be known to you protects against the previously known attack this gives you the law false positive because now you are just matching against the known earlier known attacks this is a vulnerable to offset an attack that is the real challenge because the number of like here for example if you have a code that malware code and you add few additional instructions like for example when you store and load instruction if you add it will make a totally different signature therefore for every obfuscated network malware you need to have one signature and therefore the number of malware are growing exponentially and the size of signature repository is increasing exponentially and it is becoming more and more difficult to do the signature matching in real time here the big issue is matching of this in real time perhaps you might have seen or might have used the antivirus software in your computer and the moment you start to run that it consume lot of resources and runs for hours so now you have to do this check against the incoming traffic and the traffic arrival rate is something not mbps but it is a billion bit per second so that is a really challenging gigabit per second traffic checking or is really a difficult process so how to do with that so now if you look at the NIDS architecture you get the input data stream you look at the TCP header of that after looking at header you get the payload put that in a buffer and then you start to match against the signature repository so now you have to match the entire payload for the signature if there is a match it should raise the warning otherwise it will let it go as it is so that is the so this is the first step the most time consuming process wherein you have to match the entire signature set with given with the incoming traffic or packet so now what are the ways to do this there are there are two ways one is you can use the string matching so you may you have a signature of the attacks and then you use any non-string matching software to match it if you use the string matching the problem with that is first is again it is vulnerable to the obfuscation so now if there is a change little bit in that case here you have to have another string and then the total number of signatures are exploding although here it has low false positive because whenever there is a exact match of the string happens you can means it raises the alarm so there is a very low probability that it will raise the alarm falsely there are I mean computer science where we have we are habitual of using regular expressions for many other purposes and therefore the regular expressions are also of interest too for such kind of work or network intrusion detection wherein you have a pattern and you want to match with the pattern so the regular expressions are more expressive however they can raise more alarms than the strings but there are earlier works in ideas try to use only the string matching algorithms however it is very very difficult to match the strings in the real time now the current research is more focused towards the regular expression matching which can deliver you high throughput with a reasonably low false positive so a regular expression best signature matching so you have now regular expressions now how you can match the regular expressions so these are couple of regular expressions which were extracted from the snot database basis so you can get such kind of a regular expressions in the traffic that you are receiving so to match the regular expression there are typically two ways one is the use of non-deterministic finite automata another is use of deterministic finite automata so both of them have their on advantages and disadvantages so say for example if I so say you want to match a regular expression say a dot star a b dot star c d this say one way you can match if you have a string you say a b c d is a string then the matching is very simple if you have say arrival of a in that case here it will go to state a then here once you get b on the arrival of b it will go to state s 0 s 1 then it will go to state s 3 then on d here it will go to state s 4 and that is the accepting state so now in accepting state here you accept the input this is string matching so now if this cannot handle another kind of kind of like here c a a b c c d such kind of string so for that you can use the automata and then a regular expressions so for say for example a regular expression dot star a b dot star c d what it says that here if you get any number of character here before a then but once you get a and then you get b in that case it will go to the next state then you get c and d then it will go to the accepting state so now the state transition will look like this it goes to some state p and now here from this p here it will go to state q on arrival of b if you are not getting a in that case here or if you are not getting b in that case here it will go to again here go back to state p and now from p here it will go to state p q say r and on arrival of c and then from c on arrival of b it will go to state s which is accepting state otherwise here it will stay in state r so this is the way we can use the regular expression to match the signature so as I said that there are 2 ways one is the you can make this automata this is deterministic one so you can make a automata as a deterministic or it can make a non deterministic if it is a deterministic automata I am perhaps over I believe that most of you are from the computer science background you know how to to construct automata for a given expression so if it is a DFA deterministic finite automata in that case the problem is if the number of characters or expressions are increasing in that case the number of states are exponentially increasing so however the matching time remain constant because here it is deterministic so on getting off of one character or one alphabet it always transit to one state it does not transit to the multiple state therefore it can give you the constant matching time so on that front it is very very good however as I said that here the number of states are exploding therefore you need enormous amount of memory and as we discussed earlier that in current malware signature repository the number of malware are increasing exponentially and the growth also means the state growth also happens exponentially therefore it is a like kind of double exponential growth therefore this is it is very difficult to accommodate such kind of DFA in a limited memory system but this is suitable for high speed applications wherein you can say a match this with the constant almost constant time so that means you are no matter how many number of expressions you have although it will it will explode but this can match in the constant time so on that front it is very extremely good however on the contrary the non-deterministic finite automata based approach does not explode exponentially because the transitions are not always deterministic these are non-deterministic in that case here they are linear in a space complexity so if the number of expressions or signature signatures are n in that case here growth is linear with respect to n which is affordable for any practical system however matching time is polynomial so it is not constant so as the number of signatures are increasing the time matching time is increasing and therefore it is very difficult to use this with in a practical system as I said earlier that because we have to implement this for a practical system at every network interface level and it has to match the line speed therefore if we want to go for the software implementation perhaps the DFA is better approach as the matching time is a constant time however if you want to go for the hardware we are more concerned about the resources and then we have only the limited memory therefore the use of NFA is more suitable for the hardware solutions however use of DFA is more suitable for software solutions although it is easy to say that DFA is easy to more suitable for the software solutions but in software also we do not have unlimited high speed memory therefore there were some solutions proposed in the literature which try to combine both of the kind of automata like here the which is hybrid version of DFA and NFA so they try to make you take advantage of both NFA and DFA and therefore it offers the better skill ability so I mean in not say if I can say that how the NFA DFA hybrid method works is you try to build NFA and DFA and then when it is start to explode you go to the NFA so that here you can control the explosion so here I have listed if you want to implement the efficient regular expression matching system in hardware or in software like if you have a generic solution in most in software in that case here you would like to share the states you have to share the how to share the transitions there are approaches which are trying to compress the alphabet set or if you are going for the hardware solution in that case here you can use the logic based FPGA design or memory based FPGA design so that means here you need to say construct a system wherein you can have you can either build a logic that logic can match the your signature in parallel the good thing about hardware is it can exploit the maximum parallelism available in your algorithms so for example if I want to match one character with multiple strings or multiple regular expressions at the same time in that case I can build the different logic circuits and then here they can match in parallel and after matching here you can end the outcome of the matcher and say that whether there is a match somewhere or not if there is a match then here it is good enough to raise the alarm that there is this particular payload or packet is malicious another approach which is becoming now popular although it is more expensive is CAM based design so CAM is a content addressable memory so you have a memory which is addressable by its content so now here you are getting a input string or input bit stream you match with the so the content of that is the data that is stored in CAM and now here it is parallely matching all the content stored in the memory therefore it can quickly match and hence you can match the line speed so this is useful for NIDS however at the same time this kind of approaches are useful for many other kind of applications such as the genomics and other applications or you can develop some architecture specific solutions like here for a given repository of signature you have you develop application specific signature that can exactly match those and this should have a upgradable possibility to upgrade so now here I will tell you some of the work that has been pursued in this direction in mostly the address that here we had is mostly the hybrid version of NFA DFA so couple of works have been already been proposed in the literature you can go through the literature these are some of the pointers like here one is the hierarchical DFA which was presented in Lien and Schaub in conference GLSVLSI 2009 wherein they have a master and slave DFA so the DFA is split in two parts master DFA generate some symbols and those symbols are consumed by the slave DFA and then here if there is a match in both in that case here it can raise a alarm so but it targets some very simple regular expressions the other approaches were like a bit parallel NFA which was the solutions presented to implement an ideas in hardware so this uses the bit parallel simulation of NFA and it implements using the registers mask and logic element on the FPGA so the that you can efficiently computed and they have shown that you can achieve speed all almost 10 gigabits per second which is a quite encouraging so that you can meet the current day requirement. Other hybrid DFA which was hybrid in nature so wherein you can see that there are divide this in DFA and NFA so you compile first with the had in DFA and then as I said that where the point where it is start to explode you will start to construct NFA so it is kind of hybrid approach that can give you advantage of both DFA and NFA DFA restrict the time required to match your signature however NFA restrict the memory requirement so it cater pretty much both of the requirements. There were another works which you can go through those are like a look at finite automata given by Bando and colleagues it is in ANX09 they reorder the regular expression components to parallelize it then there are other approaches like here delayed DFA which is also referred as D2 DFA which is based on the graph compression techniques see the as I said that in DFA one problem is state explosion if the expression the number of expressions are increasing the number of state unique states in the automata are start to explode so they have they came up with a concept called the default transition now here in order to restrict that here after a while they will again means use the default transition so if it does not match to anyone the existing one in that case here they all have to take the default transition they are using this here there is a dramatic reduction in the memory footprint of DFA therefore it means up to large extent you can make use of DFA and therefore you can restrict the time requirement to match this and as I mentioned multiple times time is very important as we have to match the line speed so our group also had done some work in this direction and we also explored the hybrid approach so which is basically the hierarchical based approach called as the hierarchical DFA or word based DFA so I in the interest of time I skip the basic definition of signature and the expression so the expression in a signature can be like here if I will sigma is the alphabet said in that case this is a sequence of some of the component in this so this can be a word can be a wide card can be a class of characters can be a optional character that character class with the clean star operator that can be with the plus operator and so on and so forth so now this proposed cascaded automata architecture is two stage approach and basic idea behind this is that if I divide this automata in say two cascaded automata first the entire pattern has to pass through the first automata and if there is no match in that case we directly allow this to pass if it matches in the at the first automata in that case we have to have more deep inspection and then have must pass it through the level 2 or second automata the observation says that here 98 percent traffic is clean traffic it is only 2 percent traffic which is malicious so here idea is that if we can directly pass or allow say 1995 percent traffic to pass through only first stage automata it does not pass through the second stage automata in that case here it can match much faster than a single automata so the first automata does the string matching and the character class matching and generate some symbols so if there are some symbols generated from the first stage automata that means here this traffic is little suspicious traffic and it has to go through the second stage automata so these symbols which are generated from the first stage automata must pass through the second stage automata which is based on the regular expression matching so first stage is simple string matcher second stage is regular expression matcher and now and by doing this here we can reduce the memory footprint of the automata therefore we can accommodate the entire automata in the same or on hardware or in the software memory it can also use the existing string matching solutions for the regular expression matching that we need to use the first stage level so this is one illustrative example that when this is the first stage which is the string matcher that is a DFA and now if there is some match here in that case it generates the symbols and the second stage consumes the symbols and if there is a match in that case here it will raise the alarm otherwise it won't raise the alarm therefore it is called as cascaded automata as the symbols are generated by the first stage and consumed by the second stage so what means we are calling it is a word based second stage as word based NFA because here it is consuming the words or symbols which are generated by the first stage so those are not really the single character but these are the words so the first one will generate the new symbol based on the substring or the characters the generated word based and then it will generate the word based NFA using the simple Thomson's construction and now the second automata that is word based NFA this will accept the original regular expression and then it matches so like here the word based NFA look like this for some expression I forgot to write here for which expression it is sorry the expression has not been written here but this gives you the idea that here how the word based NFA looks like then this approach is simple it is simple to come up construct such kind of NFA as I said that say some symbols are generated from the first stage and then you concatenate these so you have to rewrite the regular expression by using the symbols generated by the first stage so they say in this example if S1, S2, S3 are the symbols generated by the first stage and then these are concatenated with the other characters you can rewrite the expression so now this is the expression which you are implementing in the second stage NFA now the size of expression is small therefore the NFA does not explode there are couple of challenges for such kind of NFA I am highlighting some of the challenges in the interest of time I would not go in the detail of all now one of the challenges that first stage is generating the multiple symbols and now once a symbol is generated in that case here multiple symbols are active at the same time here so therefore you have to you have requirement of matching with the multiple symbols like here if you are getting this string in that case here there is a match here now for the this string here you have another match here and now this also has so now say this will have two matching now backslash X3 will result in this XA matching this is this matching so now due to this there are multiple matches another is this say when you are matching this you have to have second stage NFA you are losing the timing information like here for example if you are matching URL so you get some string here and then when you get all in that case here you will match this URL here what is required is that you this is a string and therefore URL should be given in the successive cycle so that means here once it is start to match here it has to match in the successive cycles the fourth problem is you have multiple instances so these multiple instances like here in URL when you are matching this URL URL URL has been already matched so this state is a active state and now so this once it is it is matched here again this multiple multiple instance may start to become active the another is like here non overlapping issues so say you are you want to match ABCD this will generate one symbol as ABC and another symbol BCD is being generated by the first stage automata but now if you want to match say ABC dot star BCD in that case here this should not match so now here you have to deal with this overlapping matching because here now this BC is matched here and then BC is also matched here so all these are the problems with the word based automata I we have provided solution to couple of these and maybe I can give you a brief overview how this can be handled so say the major problem is handling the time so length of the transition so now for example you have ABC dot star CDE in that case you have to look at like if it is ABC then it should be consecutive so therefore you are expecting something within some time so now you have to bound the length of the transition if you so now we use the counter based approach so a counter is associated or an array of counter is associated with every transition and therefore you have to after that once you are in that state you start to match the counter that means after how many cycles you are symbols you want to check that so counter based timing information is being used to validate the word transition if it does not match in that case here it will make it inactive otherwise it will remain active so now here with the when you are getting any symbol in that case here the multiple states are active and then all these states will have an array of counters and that the counter should be matched to accept any symbol so by doing this here we can achieve see this can restrict the explosion of the automata and it also restricts the time required to match the signature so if you look at that how effective this technique is we have used this knot rule set to perform experiments and now we have taken the input traces 4 mb size traces with roughly 5 percent malicious content in that these experiments were conducted on something called to do kind of machine with 4 GB RAM and now we have to do some experiments you can see that these are what the number of regular expressions we had this will tell you the number of states we have now in the word based DFA and now the number of states are dramatically reduced and so you can see the counter bits you need is something 120 6 bits those are in addition to the states that you are using so if you see the memory requirement in that case here the regular expression as I said that here in knot 1 we have 37 regular expressions knot 2 16, knot 3 40 and knot 4 22 the in DFA the memory requirement is something 15.9 mb whereas here we need only 2.3 mb memory so there is a dramatic reduction and this if you compare with the another approaches like hybrid finite automata or delayed DFA in that case here we are little bit higher in terms of memory consumption in comparison to delayed DFA but much better in the hybrid DFA. So this again shows represent the similar kind of results in the graphical form. Run time if you look at in that case here you can see the run time of DFA is of course should be the smallest one but you know that DFA consume lot of memory therefore it is not a very practical solution however you can see that our run time is pretty much similar to the run time that you can obtain through a single DFA and much better than this delayed DFA. Delayed DFA needs lesser memory than our approach but this perhaps gives but the run time is much higher as a conclusion here what we can say is that network intrusion detection system is now mandatory for almost all networks. There are two ways one is the signature based another is the anomaly based. Anomaly based detection is gaining more attention now because the number of malware families are still linearly increasing however the number of signatures are increasing exponentially therefore to deal with the exponential explosion for the signature based NIDS you need to have very efficient fast signature matching system and regular expressions are gaining more attention to deal with this. There are various approaches we have discussed the use of DFA and NFA DFA is very good in terms of timing requirements however NFA is better in terms of a memory requirement. So if you want to implement on software perhaps the DFA kind of approaches are better than NFA however for the hardware NFA is better than DFA. Here we have proposed a castages which can be easily pipelined if you are using the hardware implementation if you are using the software implementation it gives you the run time requirement pretty much close to the use of DFA however in terms of memory requirement it is pretty much close to the NFA. So this is a kind of balance between so lower memory footprint always means also allow you the processor based implementation in terms of software. This is orthogonal to the other graph compression techniques which also has been proposed in the literature so you can also make use of that. So with this I conclude my lecture I would be happy to answer your query or questions that you have. What is the role of master's life in DFA independently in a DFA? Oh master's life DFA so master's life DFA is like a partitioning DFA in two DFA one is the master and another one is the slave one. So where in again the approach is kind of a cascaded DFA wherein one is generating symbol and another one is consuming this symbol. So that is kind of a cascaded DFA master is the first stage DFA so if you can say get some match it will allow it means it will pass it through the second stage otherwise it will right way we say that traffic is clean and you allow it to pass. Yes. Good question. Good afternoon sir. So my question is for the NFA to DFA construction we are having two methods direct method and the Thomson's construction method but Thomson's construction method we are going along with more number of epsilon elimination closer steps. But in comparison with direct method it is going much more time taken much more time so which one is going to be give efficient pattern matching for the regular expression to DFA construction. Thomson's one. See it will take longer time in construction but here that is offline process what is important is online. So online how fast you can match right so you for that you need lesser memory you need a faster matching time. Also one more question you have mentioned the timing information is a challenge for the word based NFA the timing information is your one challenge also the multiple instances so in what way you are telling that multiple instances is becoming a challenge of that word based NFA sir can you give me a clear. Multiple instances are challenge because you have to keep those multiple instances active all the time right. So it puts more pressure on the processing. If one state is active at a time in that case you can quickly match transit to the next state and then you can quickly check whether you are reaching to the final state final state or not right. R.C. Patel. Hello good afternoon sir how to collect network traffic data at server end. So at the server end you can use that these are parl compatible expression so PCRE data is available in that PCRE format. So just you need to capture from any traffic capture tool that can capture at the MAC layer level. Sir I have two questions can you suggest some evaluation methods for our induction detection that is my first question and the second one can you differentiate an anti malware and an anti software anti wireless. So first second question first see malware is a bigger family and in that here you can get virus is part of that presence are part of that. So this is the bigger family. The first question how to evaluate that so for evaluation method we are going to use that there is a traces given by MIT those are 1999 traces where in you they have specified that how much traffic there is. So this is the first question how to evaluate that so for evaluation say we were using that there is a traces given by MIT those are 1999 traces wherein you they have specified that how much traffic is malicious how much traffic is clean traffic you can make use of that though here those are the repository from 1999 but still for evaluation these are valid. Maybe with my slide I can pop the. Sir so only the traffic parameters that we can use for evaluation. These are the regular expressions given by them and traffic traces so traffic traces you can get from there to test because if you are getting your own traffic you do not know how much it is malicious or not that is only for the benchmark purposes otherwise you trap your traffic from your network interface and they pass it through. My question is sir is it okay if we use only software firewall as there are a lot of advanced software firewalls and we are not going to use the hardware firewalls. You are not going to use which one? Hardware firewalls means we are going to only emphasize on software firewalls instead of hardware one so is it going to be providing a lot of security. Nowadays it is possible perhaps maybe I can comment on that but I guess now software firewalls are coming with the advanced features however firewalls are only checking at the headers and they are really not capable to do the deep packet inspection. So Professor Bernard mentioned that there are various firewalls and like if you have deep inspection firewall those are as good as the other hardware firewalls and NADS. Good afternoon sir. Sir my question is what is the difference between firewall and network inclusion detection system? Generally firewalls are meant to have the brief inspection at the packet header level and see whether the current data which is coming in the session is valid or not. Maybe Professor Bernard can answer it in better way about the firewalls. That is a very important question. What is the difference between an inclusion detection system and a firewall? So when you think of firewall the first thing you think about is filtering. Filtering traffic based on certain predicates like this port number or this bad source IP address coming from a fissure or this or that. So that is the primary goal of a firewall. So they have evolved over time for example a proxy firewall might even cache certain things might even perform some authentication for instance. Then you have deep inspection firewalls that can look deep inside the packet payload to search for certain worm signatures. So this is the kind of thing that you associate with a firewall but the starting point is filtering. Now go back to IDS. What is IDS doing? It's trying to as Professor Virendra Singh nicely put up that slide. I thought it was a very nice slide on anomaly detection and signature detection. I hope all the participants can recall some of the differences between the two. For example the anomaly thing can detect zero day worms, zero day malware and the other one cannot because it's not got a virus or malware signature in it. So the anomaly detection system can detect a zero day worm while the other might not be able to do that. Now just imagine what is going on in the IDS. Just imagine what are the kinds of algorithms used over there. All sorts of machine learning algorithms, statistical algorithms he put the thing on anomaly and he put statistical out there. He put the one on the other side, on the right hand side was signature based firewalls and he mentioned knowledge base out there. So that's knowledge of what are the kinds of signatures. Just imagine what is going on in both of those types of IDSs, something very different from a firewall. The primary goal is to use some kind of machine learning technique to see when the behavior is a departure from normal behavior. In the case of anomaly based IDSs. So what is going on inside is very different in an IDS and in a firewall. In an IDS you are trying to detect that this is probably what is happening. There is a worm or there is the onset of a denial of service attack, et cetera, et cetera. In the case of an IDS, in the case of the firewall you are just like a policeman or a security guard. You are saying you can go and you should not go. This packet can go and enter. This other packet is discarded and that sort of thing. So that's the main difference. So my question is that the DW application how we use for further cause here now at the time we have implemented security injection like high level, low level and medium level. So this application how can we use for further the student to design a project like that? So I think the DBW thing is related to the damn vulnerable web application. It's not related to Professor Varendra Singh's talk just now. So let me take it. So the thing is in the DBW case it's basically like a teaching tool. The application already exists and there are different so the level zero security has got very little security defense in it. Defending against SQL injection. Level one has got more of a defense and level two has got even more. So that's like a security tool. There are three different versions if you will of the same application and the goal of this whole thing is for you and your students to come up with attack vectors. Can you attack this application and get valuable information out from the database? So that's the purpose of DBWA. Okay. Now can you use it in an application? Well the application already exists. So what you can do is probably come up with your own application which has got multiple security levels instead of just three levels you can have five levels and you can release this product in open source so that other people can try to use it. But whether I can use it in an existing application the answer is probably not because what you want to do with an existing application I guess the question then would be in an existing web application you want to see whether it is SQL injection vulnerable. Now for that there are a whole bunch of tools and so on that can be used for any application. This DBWA is only a specific application that a bunch of people have created with three levels of security that is three variants of the same application. But you can't use this to test an arbitrary application that you or somebody else might have built. Yuvipatil. Which are the different new features introduced in any firewall recently? So coming to that there has been a lot of churning in the security industry. So we started off with as I said the most basic kind of firewall that just does filtering. Okay then what is the next thing after that? Stateful packet inspection. Then what's the next thing after that? Proxy firewalls. What's the next thing after that? So all of these things you look at so to start with the simplest thing the filtering firewall. You filter based on what? You filter based on port addresses, IP addresses and port addresses. So source port, destination port, source IP address, destination IP address and so on. You can also filter based on MAC address as you might have seen in the presentation on IP tables. That is very very basic. Now you go one level up. You make it a stateful inspection firewall. You're keeping track of previous packets and flags inside them, the SYN flag, AC flag, etc. So you're making it a little bit more sophisticated. So that's the next level. Again, firewall design is pretty ancient now. It's been around for 20 years so this is or more, not 20, you know, 40. Okay so now you want to make it a little bit more sophisticated. What's the next thing that you do? Now you start introducing proxy firewalls that do some kind of authentication as well. So they understand passwords and login names, etc. Now you want to go a little bit deeper. You want to make it a little bit more sophisticated and sell this new firewall product. So what's the next thing you do? You make it a stateful inspection firewall. Or sorry, a deep inspection firewall beyond the statefulness of the earlier version. What does that mean, deep inspection? It just doesn't look only at the headers, TCP, IP headers and perhaps application layer headers. It also looks inside the payload. It looks inside the payload to possibly detect the existence of worm signatures. So that's the next level. Now can we do something beyond that? It's left to your imagination. Maybe put some IDS kind of functionality inside the firewall itself. Just like in some of the routers, you put some firewalling functionality or you put some IDS functionality inside some of the routers so that you can get a better clientage for your new product. So it's up to your imagination. You can design a firewall which has got more. You know, you've seen in the case of IP tables there's a traffic limiting feature for example which can possibly guard against some sort of DOS attack. So some defensive measures can be put inside a firewall as well to defend against a DDoS attack. You've seen the limiting feature that probably Professor Shiva mentioned. So it's up to your imagination. It's up to how much space you have, how much real estate on that particular piece of hardware. You can definitely put more and more features and as you very well know, that's the trick, right? When you sell an automobile, you put more and more features. Each feature that you put, you add the price by another 15%. So you keep doubling the price very soon. It's the same thing like this. You create some security box and now what are the main security boxes? A firewall, an IDS system, a VPN and the SIEM tools, that is security incident and event management tools and you put more and more things and inside the same box you put more functionality and sell it for a much higher price. So I could definitely see you putting some IDS slash IPS functionality inside a firewall and you know, making it a much better sort of device. That's clear. In two types, right sir. This use an anomaly detection. Yeah. So in that, you have mentioned four approaches. So these four approaches belong to which category? This use or anomaly? No. I mentioned the signature-based detection. So these belong to the signature-based detection. Misuse, not anomaly. Anomaly is like constructing a graph that is constructed out of system calls or something like that and then analyze that behavior. Thank you very much. Thank you.