 So, now I will call Someshwar, who is going to talk to you about a different aspect, which is just the log analysis, not the bigger framework of coordination and response, but simply doing the last part that is called forensic analysis, that you have the logs of some interesting things that happened, for instance email or you have the logs over the last few days, weeks or months, you have the email logs. Similarly, for the last few days, weeks and months, you have the web proxy logs. So, Somesh is going to tell you what are the tools available to analyze these logs, from which you can detect and find out what are the good and bad things that happened in the network. And these exercises will be done by you on Friday, but we are using the opportunity today to guide you through some of that. So, I will ask Somesh to take over for the next 15-20 minutes and talk to you about log analysis tools and the lab for Friday. Good morning everyone, I am Someshwar and I will be explaining different log analysis tools and a bit about different proxies, proxy servers and squid ACLs. So, a server log is a log file that is generated, that is created and maintained by a server of activities performed by it. A typical example is that of a web server, which maintains a history of all the web, all the page requests that have been made. A log entry for each page request is made and each log entry contains different information such as the client IP address, the page requested, the time at which it was requested and many other fields. This information can be used to improve your services as well as provide security. So, I will be explaining what is WebLizer. WebLizer is nothing but a web server log analysis tools. So, what it does is, it analyzes the logs and it produces a statistically analyzed report in HTML format, which can be viewed in a browser. So, first you will move to the folder in which you have stored your logs, which is this case is WebLizer Apache log followed by, you will... So, what I have done is, I have opened a log file and this, the field that I have selected is a log entry. So, it contains different fields. I will simply copy this field and I will explain you the different fields that are present in a log entry. So, here in this case, we have the first field is the client IP address. This is the IP address of the client that has made request to the server for some page. So, here it is 157.55. So, this is the source client IP address and the next field is ident field. This is used to identify the user and it uses an ident protocol and here if some data is missing, we use a hyphen, simply use a hyphen. The third field is a frank field. This field is used to identify the user name and it is identified using a sttp authentication. The next field is timestamp. It specifies date and time at which the server finished processing the request. Next is a get method that was used followed by the URL, the page that was requested and next is the sttp protocol that was used and this is the sttp status code. Here 404 is a status error. Next field is a size that is 280 bytes were used and here this field is a referrer field and here since it is missing, we have used a hyphen followed by user agent which are in case is Mozilla and then we have a miscellaneous field. So, now I will explain you about the tool, bubbleizer. First you have to move to the folder in which you have stored your logs. Now LS, you can see that there are many logs CD access log 2014 0508 till 2014 0512. So, the command goes like bubbleizer which is the command followed by minus p. Minus p is used for incremental run. The advantage of using incremental run is you can preserve information between runs. The information obtained from a previous run can be used along with the next run followed by minus f it specifies what type of file log you are using which in this case is we will be accessing we will be using Apache log. So, we have specified CLF which is combined log format and then minus n it is used to specify the domain name and followed by minus o where we specify the directory in which we want to produce the output. So, it is slash where slash www slash bubbleizer and then the path to the log file. So, here you can see that history file not found is given since we are running this tool for the first time there was no previous history. So, it is showing history file not found and now we will run the same command on another log file. Here you can see that there are 98946 records. Now we will run the same command on another log file which is 09 in this case in this case you can see that reading history file from weblizer history is given. So, using the option of minus v gives us a allows us to use the information that we have obtained in the previous run. So, once you have run this command. So, once you have run this command you can view your output in a browser. So, simply type local host followed by slash then weblizer. It generates usage statistics for iitb.ac.in domain name and you can see that there are usage summary for itb.ac.in on the x axis you have the months and on the y axis you have visits files and pages and on the other side also you have different other graphs. So, there is a problem with this tool simple there is a bug in this tool because of which the size there is a problem in adding the total size of bytes that has been received. So, it always shows 0. Since the log file has is contains only the logs for the month of May we have we will see only for the output for the month of May. So, these are the monthly statistics for May you can see the total files as 19121710. So, sorry this is total number of hits. So, hits means the total number of requests that was made to the server during a given time period which here is 191710 and files. Files is nothing but the total number of hits which resulted in some data being sent back to the user and page is nothing but the resource that was requested from the server. Visit is whenever a site makes a request to the server for the first time it is considered as a visit and there is a time limit there is a time period during which if the same site makes multiple requests it is considered as a single visit. So, this is the daily message daily usage for the month of May you can see for every day the there is some statistics and since the log was only for the dates from 7 to 9 we have just 3 outputs this is the same thing you can see that the hits for the day 7 is on day 7 is 91936 and other fees. So, you can simply go through this and you can have a statistical analysis of the logs and the below one is early usage for the month of May you can see at different hours there have been different peaks at during time 05 to 06 there are the hits are being very high 3794 and these are the top 30 URL top 30 URLs that have been accessed that have been requested of which slash NPTEL slash electrical is the one that has received maximum hits this one is the top 10 of the 735 total entry pages. So, entry page is a page that is requested for the first time when you visit a server when you make a visit. So, these are the different pages that have been requested and the next one is exit page and exit page is the last page that is requested when during a visit this is exit page the total number of sites you can see there are different number of sites from the outside that have tried to access a resource from inside the network that is an IITB and these are the total number of referrers referrers is nothing but the URLs that lead a user to a site or causes a browser to visit a server and this is usage by location. So, here you can see that maximum number of request have come from India followed by other unresolved unknowns then United States and some. So, this result is the mapping from IP address to a location is done using GUIP database which is developed by maxmind.com and these are the same top 30 total locations from which request have been made. Now, we will move to some simple scripts of how you can use the log files and you can find the question is how to find total number of sites in a log file. So, this is to be done using a script. So, first what we will do is we will change to the directory where we have stored the logs, we will run the command weblizer followed by the file type which is CLF since we are using Apache logs followed by the domain name which is IITB.ac.in and then the output directory slash wire slash www slash weblizer followed by path to the web server log. You can view the output in a browser. So, you can see that when you type localo slash browser it will produce the HTML reports. So, here you can see the visits as 202 total number of sites as 2029. So, this we have to find the total number of sites using a script. So, the command goes like this cat followed by the file name it will output the contents of a file on to the term on to the standard output and this is given as input to another command which is AUK it is used to select a particular field. Since the first field represents the sites which is nothing but the client. So, we will print all the we will print the first field only and then we will sort the first field using sort command followed by unique command what it which will remove all the matching adjacent lines followed by WC minus L which gives us the count. So, here you can see the output as 2029 which matches with the output generated by that by weblizer. So, now the second command second script question is print all the IP addresses that visit pages between timestamp T1, May 7, 2014 at 3.30, 3.29, 0.3 and time time timestamp T2, 0s at 3.30 in the log file CD access log 2014 0.508. So, you will again follow the same instructions cat the contents which will print the contents of the file and then you will give us input to AUK file here you can see that the command we will what we will do is we will put a condition if the field value if the timestamp is greater than the given T1 and it is less than the second timestamp T2 then only we will print the print that field. So, when you run this command you get two outputs we can cross verify it in the log files the two IP addresses are 128 these two are the two IP addresses that are given as output. So, we can cross verify it in the log file. So, here this is the timestamp that is just below or equal to the timestamp T1 and the second one this one is the timestamp that is just above or equal to the timestamp T2. So, all the log entries between these two log entries are to be considered of these the two IP addresses are these 128.211.2025. So, these are the two IP addresses and these two IP addresses match with the result that we have obtained using scripts. So, now I will explain what PF log sum is PF log sum is nothing but it is a log file analysis tool for mail servers it gives a detail information about the number of mails received the number of mails sent and others, but it produces output in a .txt file. So, the command is PF log sum followed by path to the access path to the mail server log and this is a redirection operator the output will be stored in another file which is result.txt you can specify any .txt file or any file format the output will be redirected to that file. So, once you have run the command you can open the result .txt file and view the results. So, here you can see the total number of received emails the total number of delivered emails forwarded deferred deferred the deferred emails are those emails that are not being able to send that were not sent at the that were not delivered at the first attempt and bounds emails are those which did not which did not get delivered the reason may be because the destination was not available or there was some error. Now, you can see per hour traffic summary at different times you can see at time 0 0 0 0 to 0 1 0 0 there are 1 4 0 4 received emails 1 3 something delivered deferred and the bounds emails and rejected emails. So, you can view the results in .txt file and this is the host domain summary of the received messages. So, here you can see that IITB domain has IITB dot AC dot in has received a total of 2 8 6 0 domain 2 0 2 8 6 0 emails of which 1 8 3 3 were given to some dot IITB dot AC dot in and others similarly you can see senders by message count. So, mailman bounces at some dot IITB has received a send lots of messages you can see recipients by message counts similarly the mailman bounces has received many mails that have been bounced senders by message size ok. So, another tool is aw stats. So, the tools the 2 tools that I have explained so far are weblog weblizer and pf loxum these 2 tools only they the weblizer will analyze the web server log files and pf loxum will analyze mail server log files, but aw stats is a superset of these 2 it can analyze ftp logs streaming media logs web server logs as well as mail server logs. So, for running this tool first you have to do some configuration this configuration will be explained in the during the lab session. So, the 2 important changes you have to make in the configuration file is you have to specify the log type as m, m stands for mail server that since we are we will be doing the analysis on a mail server now and then you have to specify the path inside the configuration file for the mail server log. So, since we have specified the domain as itb.ac.in in the configuration file these statistics are for the itb.ac.in domain. So, you have to select the month for which you want to see the reports and on the left hand side you have several options monthly is three days of month etcetera. So, when you select monthly one monthly history you will see the log the number of unique visitors the number of mails involved and the size of the mails involved. So, since the logs are only for may we have we are accessing the logs of may the outputs are only present for the month of may. You can also view by days of month that on which date how many mails were sent and of what size they were sent. So, this is for the day one. So, only because of that we have only output for day one you can also see here per day basis the size and number of mails involved in total. So, by clicking on host you can see the different number of hosts that are involved these are the unique visitors. So, and also you can see on the right hand side there are different mails they are involved with the size they are the number of the size of the mails they are involved with you can also see sender email senders by email counts. This is the email address and sender email address and this is the mails that have been sent the size and average size of mails and on what date the last mail was sent. Similarly you can see for receiver emails mails size and average size. So, I will explain about what proxy servers are proxy servers they act as an intermediary for request from client seeking resources from other servers. So, here you can see that Alice wants to access some resource on the server Bob. So, instead of directly requesting it to the Bob Alice is requesting via a proxy server. So, what is the advantage of using it? One of the advantages privacy which means that Bob does not know whom the response is going to. Here when Alice sends a request Alice specifies that Alice wants some service from the server Bob, but Alice will not specify, but Alice will send this request to a proxy server. Now the proxy server will make a request to the Bob and Bob will assume the request is coming from a proxy server. So, the Bob will send response to the proxy server and he will not know and the Bob will not know whom the response is going from the proxy server. So, there are three types of proxies one is forwarding proxy, open proxy and reverse proxy. In forwarding proxy the user will make a request to the proxy specifying a destination that the request is to be made to this specific server and the proxy will make the request and when the output comes back the output is given to the proxy and it gives it back to the user. So, here you can see that X will make a request to Y specifying the destination Z and Z will make a return to Y and Z will return the response to Y and Y will return the response back to X. So, why do we need forward proxies? It may happen that the administrator Z has blocked the user X. This may be because X was trying to hack the server of Z or Z was a forum website and X was spamming it. So, Z has decided to block X. So, in that case X will not get any access to Z. So, because of that if a proxy server is used then X will be able to use it. In open proxy it is accessible by any internet user usually what hackers do is they try to hack into your computer and hack into your system and make it a open proxy. So, in doing so they can hide their identity using this proxy. So, it is accessible any internet user it is a bit risky and sometimes if configuration is not this type of proxy results from a configuration error. The another one is a reverse proxy. In reverse proxy X will make a request to the proxy saying I want some service and Y without specifying the server. Now, Y knows where the server is and Y will forward it to the server. The advantage of reverse proxy is that X will not know about the internal network of the organization from which it is requesting and when the request is processed it is the request is written back to the proxy and it returns the response to the requester to the client. The advantage Y we use reverse proxy is that the user may want the client Z may want all of its traffic to come via Y. So, when the traffic all the traffic comes via Y Z can control several things Z can control which clients which clients get request to access the resources or it Y can do load balancing. The next one is squid ACL. It is the squid logs are also similar to Apache logs may be some formatting is different the way time stamps are specified and other fields are specified they may be different, but the logs are to some extent same. The advantage of using squid log is that we get to use access control list. In access control list we can specify different, we can specify list of clients or clients who can get access to some special services like you have HTTP access allows HTTP clients to access the HTTP port and log access controls which request or log and others and etcetera. A simple example is how to allow all the clients to access a HTTP port. So, first you have to define an ACL element that corresponds to your client's IP addresses. For example, ACL my clients this is an ACL element followed by source specifying that this is a source address and this is the IP address of the source and next you have to allow those clients in the HTTP access list to use the, so here you will write HTTP access followed by allow and my clients. So, my clients will be able to access the HTTP port. If you specify deny in this part then this specific IP address will not be able to access any the HTTP port. So, thank you Ashok and Someshwar. Like I was saying you will be doing a lot more of this in the next two afternoons Wednesday and Friday when you do it in the lab. The reason they gave you this quick overview was to set the stage. So, let me summarize what Someshwar said maybe in a minute or two that a lot of services for any organization cannot be directly provided to every individual end station using firewalls and NAT that is IP tables is one way to do it, but that way gives you less control and if you go through proxies for instance for web access both ways students in your organization want to access data from outside go through a forwarding proxy and that name of that is squid, squid is a very good example. Similarly for making your web servers of your various departments in your college accessible to the outside world instead of directly exposing all the servers go through a reverse proxy. So, this gives you a fine level of control he mentioned access control list and depending on the complexity of the software you can do many things you can allow disallow based on time windows mornings you can allow afternoons you can disallow you can allow or disallow based on the load. If the link is lightly loaded allow if the link is already half full do not allow you can shape bandwidth you can do many interesting things which gives more security and more assure performance of your network. So, in the lab you are encouraged to explore all this, but the focus is not on how to achieve that functionality but how to look for security incidents. So, all these services write logs and the log files have to be analyzed and so measure was starting from the very simple tools command line tools like pf log sum and webalizer which do static log analysis by static log analysis we mean that it is already known what you want you want to count certain events you want to make reports nice graphical web based reports, but all are fixed in advance and they just statically analyze the data as it comes in aw stats is slightly more sophisticated it can be configured not only to analyze, but to look at the result and based on the result do different actions. So, a aw stat is slightly more leading towards the first lab that Ashok talk to you which is a system that reacts a system that understands and that is the final goal. So, these two labs will probably guide you through the basics and the more advanced level of what can and needs to be done to do security assurance and that with that view we have designed the lab and the sessions today were only to help you get started and I hope you will have a reasonably interesting experience in the labs by using these tools you will get much more insight and you will be able to explore more of this when you actually teach the course and you design your own labs and experiments for your students. So, with that I will conclude this session.