 to the 25th lecture in the course design and engineering of computer systems. So in this week, we have studied briefly how computer networks work. So we have not gone into a lot of detail because that is a full course by itself, but we have studied a few important concepts that will help us when we go towards this end to end design of computer systems. So in this lecture what we are going to do is similarly we are going to briefly touch upon some important concepts in network security. Then a full fledged treatment of this is a full course by itself, but in this one lecture we will just briefly touch upon some very important concepts. So let us get started. So what is network security? So whenever we access any internet service over the network, a lot of bad things can happen and network security is sort of just the mechanisms that are in place to prevent or to at least reduce the impact of these bad things. So what are some of the security problems that you can face in any computer system? So you can have various things go wrong. For example, you can have what is called the man in the middle attack. That is you think you are accessing some website, say your bank's website, but instead of going to that website, somebody in the middle pretends to be the bank, takes all your account details and can steal some sensitive information from you and so on. So that is the man in the middle attack, that is somebody impersonating somebody else on the internet. You can also have a denial of service attack, that is you know some malicious attackers want to bring down a website or a web service. So they send a large volume of traffic to the server and cause the server to crash. So they cause the server to deny service to legitimate clients. Then you have all sorts of malware on the internet, you know you have viruses and so on, malicious software that is somehow injected via the network into your computer system and it can damage different parts of the computer system. Then you have other more sort of specialized attacks like what is called port scanning. You know on your server, you can have many different processes, you know an email server, web server, different processes listening on different port numbers. So what you know some attackers can do is they can keep sending you know SIN packets, you know opening a connection to many different ports at random on your computer to see which ports are open. Then if they find some port is open, some server is running, then they will try to attack that server. You know you can just randomly scan ports which is not something that you know regular clients would do. So therefore this is you know the beginning of an attack in some sense. And you can have all sorts of other things like you know when you are sending your data over the network, somebody can tamper with your data packets or even if they do not tamper with them actively, they can do you know tampering is sort of more active or they can just passively eavesdrop on your data packets and you know learn some confidential information about the user or the computer system. So all of these are various things that can go wrong in a computer network today. And of course not a lot of bad things happen because we have mechanisms in place to protect us against all of these attacks. And this lecture we are going to briefly understand what some of these mechanisms are. And some of the properties that we want from these mechanisms from the network security solutions are we want a way to identify and you know stop all bad traffic. Any attack traffic, malware traffic, any bad thing that is coming into your network you kind of want to filter it out and send only the good traffic in. You need some mechanism for this. Then on the internet today you need what is called origin authentication. That is when you are accessing a website you know especially for trusted services like an online banking service or something. You need to know that the website is actually genuine you know it is actually who they say they are. You know you need some kind of authentication of the origin. Then we need integrity of data. What does it mean whatever data you send it reaches the other side without any tampering or anybody modifying it and you know injecting some bad content into it in any way. Then we also need confidentiality. We do not want anybody to snoop on our traffic and steal any private information. So all of these terms I would like you to understand origin authentication, data integrity, confidentiality right and these are all the properties that we will see how today's solutions provide all of these properties. And of course there are many solutions in use today for all of these things and this one lecture cannot cover all of them but we will briefly cover just two or three key ideas in the area of internet security. So the first thing we begin with are what are called firewalls or intrusion detection systems. So what is a firewall? A firewall is you know a piece of it can either be a software program or a hardware box that is usually put at the entry point to a network. You know all the traffic coming into your network it goes through this firewall either the software or hardware box and then only the traffic that clears the firewall is permitted into any network. So what does a firewall do? It will filter packets based on some simple rules. For example if you do not have any web servers then you should not have any traffic coming into port 80 into your system. If it is coming in maybe it is you know malicious traffic that is doing some port scanning. You know by using rules like that you know for certain secure servers you do not want anybody apart from authenticated some specific users to handle to access these secure servers. You do not want traffic from hosts inside the network to go to certain specific websites or IP addresses. Right? A firewall basically filters traffic that is both coming into your network and going out from your network based on some simple rules. And these rules who sets these rules these rules are configured by the network administrators you know whoever is managing this network will configure these rules in a firewall in order to block some ports or some traffic from some IP addresses whatever there are various formats of these rules. And in that way a firewall ensures some amount of security. And in addition to firewalls large networks today also deploy what are called intrusion detection systems or IDSs. So these IDSs do sort of one level deeper scrutiny than the simple rules of a firewall. They inspect all traffic to find various patterns. For example they try to do some anomaly detection. You know if at this time of the hour every day you only got an X amounts of traffic but suddenly 10 X traffic is coming you know to some specific host in your network. Then that is an anomaly right? It should not be happening. So such anomalies maybe that is a denial of service attack or you know too many SIN packets connection requests are coming into a particular host to random port numbers you know that could be a port scanning attack. In this way they will try to find some patterns you know abnormal patterns that is what anomalies are right. So these IDSs run some complicated algorithms to detect these abnormal patterns and maybe alert the sysad in the network or even block this anomalous traffic. And the other thing they do is they look for signatures. You know some viruses once they are well known that this virus is there you can actually inspect the incoming traffic and search for the signature in the traffic. So these IDSs also look for signatures of already known malwares and viruses and they can raise alerts or block the traffic you know this is raising an alert is called intrusion detection. If you block the traffic that is called intrusion prevention you are kind of taking a more active role here. So in this way firewalls and IDSs are deployed at the entry point of most large networks today in order to stop malicious traffic. And of course that does not mean all attack traffic is stopped there are still you know clever attackers that can subvert these firewalls and intrusion detection systems which is why you occasionally have these security attacks happening but for the most part these systems try to block a lot of the attack traffic. So the next idea we are going to study is we are going to understand how HTTP security you know secure HTTP works on the internet today. And before we go there we need to get some background about what is called a field of study that is called cryptography. So full treatment of this is again a complete course but in a couple of slides I am going to briefly introduce to you what is cryptography and with that limited knowledge we are going to understand how HTTPS or secure HTTP works. So cryptography is of two types either public key or symmetric key. So let us see a background of what is public key cryptography. So cryptography in general is basically a set of algorithms that provide some security properties and at the base at the core of these algorithms are what are called keys or you know strings of bits. So you use some large array of bits and using these keys you run various computations on these keys and you somehow guarantee some security properties. And there is a lot of theoretical underpinnings to cryptography you know there are certain hardness assumptions saying some computations are hard to do. So of course we would not go into all of these details here but just know that people have you know some very smart people have designed some very smart algorithms to provide certain security properties. And the proofs of why these algorithms work is beyond the scope of this course. So the one type of cryptography is what is called public key cryptography. In this cryptography between the various entities that are you know doing some computations and guaranteeing security between all of these entities two keys a pair of keys are used in all of these algorithms. One key is public is well known to everyone and the other key is private. So when you use two keys a public key and a private key that is called public key cryptography. And this public key can be distributed out in the open to anyone whereas the private key is usually kept secret and the secrecy of this private key is important for the security. So what is public key cryptography used for the two common uses are one is what are called cryptographic signatures you know for authentication. So that is suppose you have you know two people Alice and Bob these are usually the common names used in cryptography. You have two people Alice and Bob exchanging a message you know Alice is sending a message to Bob. Now when Bob gets this message how does Bob know that this message came from Alice only and not from some other random person we use signatures for this authentication. So Alice uses a signature to authenticate herself to Bob that is Alice basically if there is a message Alice will sign this message using its private key. Now this private key is known only to Alice therefore only Alice can generate this signature and it will send the signature along with the message. And when Bob gets this message Bob can run another cryptographic algorithm to verify the signature using the public key of Alice. So Bob knows the public key of Alice and with that public key and the signature Bob will run some algorithm and that algorithm will tell you if this signature was actually generated by whoever held the private key of Alice. Anybody else any random person any attacker say a prime cannot generate the signature that will pass this verification test that Bob is running with the public key of Alice only the person that had the private key of Alice could have generated this signature. In that way there is some amount of authentication that this message actually came from Alice. Of course it is not authenticated to Alice the person but it is authenticated to be more specific to the private key of Alice whoever held the private key of Alice has generated this message that can be verified by Bob using this signature. So the other use for public key cryptography is an encryption which is used for confidentiality. So suppose again Alice and Bob are exchanging a message but they do not want anybody along the path you know Alice is sending to Bob and any eavesdropper along the path should not be able to understand what is in this message. So the way this is done is Alice instead of sending the message Alice will encrypt the message using the public key of Bob and send only this cipher the cipher text or the secret text will be sent to Bob. Now Bob using his private key can decrypt this message and recover the original message that Alice wanted to send. So Alice knows the public key of Bob whoever Alice wants to talk secretly to Alice will know the public key and encrypt and Bob will decrypt the message. So in this way with public key cryptography you can do you can ensure some kind of confidentiality you can ensure some kind of authentication and there are many widely used systems today that depend on public key cryptography. The second broad class of cryptographic algorithms are what are called symmetric key cryptography that is the difference is that in public key cryptography there were two keys. There is a public key and a private key with symmetric key cryptography there is only a single key a single shared secret key is used. And why this difference why this two types of algorithms it is because these symmetric key algorithms are much faster than public key algorithms. But of course there is a trade off here you know which is you need to somehow agree on the secret key ok whereas public key cryptography can be used between any two people as long as you know the other persons public key which is publicly available you can you know send some encrypted message. But with symmetric key cryptography you need to agree on this shared secret key and how is that done there are various algorithms once again you start with you know Alice and Bob start with each others public keys and then somehow agree on a secret key and then shift to symmetric key cryptography. So a lot of systems begin with public key cryptography then move to the faster symmetric key cryptography once a secret key is agreed upon. And some uses of the symmetric key cryptography are what are called message authentication codes you know for once again for data integrity that is if suppose Alice is sending some message to Bob along with the message Alice will send a tag which is you know a message authentication code generated using the secret key that Alice and Bob are sharing ok along with the message there is the tag. Now Bob can verify that this message was not corrupted by regenerating the tag now Bob gets the message now Bob will once again using the secret key compute the tag ok. And then if this tag does not match the tag that came with then you know oh something is wrong with the message I received if somebody tampers with this message then when you recompute this tag it will not match you know because two messages cannot will not end up with the same tag these cryptographic algorithms guarantee all of these properties. So if anybody tampered with your message Bob will recompute the tag it will not match the received tag then you know that the integrity of the data was not preserved. But if this tag matches then you know that nobody has tampered with your message it is the exact same message that Alice had sent you. And the other use for shared key cryptography is with encryption ok once again you can do encryption with the public key cryptography also and you can do it with shared key cryptography also this is faster but the principle is the same Alice will instead of sending the message directly Alice will encrypt the message using a secret key send it to Bob and Bob will decrypt the message using the same secret key ok that is why it is symmetric key cryptography here and these all of these are in general much faster than public key cryptography algorithms. So with all of these concepts in place now with the basics of public and private key cryptography we are going to understand how secure HTTP or HTTPS works on the internet today. So any computer system that is dealing with any kind of private confidential you know important data today runs on HTTPS ok. So what is HTTPS so we have seen what is HTTP you know a client and a server connect each other sockets and then the client will send an HTTP request requesting some amount of some data and the server will send an HTTP response you know say the contents of a web page and the client will display that response on a browser we have seen how HTTP works and you use regular TCP sockets you know whatever data you write into a TCP socket is made into transport layer segments and IP datagram sent over the network and these are just plain IP datagrams you know if you have run Wireshark as suggested in one of the previous lectures you can just read through all the contents you know you can look at all the bits in the packet and anybody on the network can just loop all your packets and know all the content you are talking to that server. So this might be ok if you are just consuming public content like you know some publicly available news website that is ok there is no secret being exchanged here that is ok. But if you are doing secure transactions like you know getting your bank information and all of that you may not want to everybody on the internet to know what your bank balances for example. So therefore in such cases websites use HTTPS what is HTTPS is there are some special sockets, secure sockets which are called TLS stands for transport layer security or SSL secure sockets layer these are just the same names for the same technology two different names. When you open these secure sockets now the secure socket is just another software layer over the regular socket ok they still run over TCP but these secure sockets do not send whatever data is given to them directly they will transform the data run some extra logic over here before sending the data out over regular TCP sockets ok. So this is the TLS layer runs over the TCP layer and on top of this you will have HTTP and this HTTP that runs on top of TLS is called HTTPS. It is the same request response protocol you send the HTTP request get back responses but you will send these responses via secure sockets which themselves you know modify the message somehow you know encrypt it and all of that and send the message over regular TCP sockets. And these servers that are using these secure sockets listen on a separate port number not port 80 but port 443 for secure HTTP and you might have noticed your URL also will no longer say HTTP but it will say HTTPS indicating that you actually are talking to a secure HTTP server and what are the guarantees that HTTPS provides the following security guarantees are provided by HTTPS which is authentication that is if you are accessing a website your online banking website you know that it is actually the website of the bank when you see this HTTPS logo you know you usually see like a lock sign next to it indicating security. So, it is a secure website the website is authenticated there is integrity of data whatever data you are sending over HTTP messages nobody can tamper it and the data is encrypted so nobody can slope on your information so confidentiality is also guaranteed, okay HTTPS provides all of these guarantees of authentication data integrity and data confidentiality. And of course all of this is provided only inside the message application layer message that you are sending your HTTP request response that part is encrypted. Outside all the TCP IP headers that you are adding you know you will add the TCP header then the IP header, Ethernet header note that all of these headers are still visible it is only the data being written into the sockets that is encrypted. So note that HTTPS does not solve all possible security problems it only gives you what is called application layer security the messages exchanged at your application layer are secure but that is still of course a big step ahead in addition to this if you want security at the IP layer or you know at the link layer you have to use other different solutions that we are not going to cover in this lecture. So how does HTTPS guarantee all of these you know authentication integrity and confidentiality and all of that so here are the mechanisms you know when you access a website how will you know that the website the domain name and everything is all genuine this is guaranteed using what are called certificates, okay. So on the internet today you have trusted certificate authorities what these certificate authorities will do is they will verify information about the websites and give out a certificate what is the certificate suppose you are going to some website nptel.ac.in you know this certificate authority will sign the information that hey this is the website this is its domain name this is its public key all of this information about this website will be put in a message and this message will be signed using the private key of the certificate authority. Remember we have seen signatures before Alice signs a message with its private key then Bob can verify that the message came from Alice using Alice's public key similarly the certificate authority will sign all of this information saying hey this website is genuine this is the domain name this is the public key all the publicly available information about that website will be verified using the signature by the certificate authority. So these people we trust them to actually do some background checks and they will give these certificates only to websites whom they have verified they will not randomly give it out to anybody but they you know for a fee they will verify your information like your domain name actually verify that you know your online banking website belongs to the bank and not to some attacker they will do that verification and they will sign that information with their private key. Now when you as a user is accessing that website their website will share that certificate you know signed by the private key of the certificate authority that certificate that it is called the SSL certificate the website will give it to you know during the HTTPS protocol has some additional handshake you know some setup phase where all of this security information is exchanged. In that phase the website will say hey you know I am genuine look at this SSL certificate I got from the certificate authority then the browser can actually verify the certificate using the certificate authority's public key. Now these certificate authorities are a few of them and you know their public keys are well known we trust them and that public keys are usually also built into browsers so using that public key the browser can actually verify this SSL certificate that the website has produced therefore now we are authenticated we know that this website is secure can be trusted and once you know the website the domain name is authenticated you know the websites public key then the client and the website will then run some algorithms to generate a shared secret key. So by just open communication you can generate a shared secret key and nobody else whose eavesdropping knows that you know there are algorithms to arrive at this secret key securely and once you arrive at a secret key what will you do any HTTP request response that is being said you know any HTTP data that is being sent you will first encrypt it so that nobody else can snoop on it and then this encrypted data you will generate a message authentication code on this and all of this you will send to the other side. So the other side can decrypt your data using the shared secret key and can verify the integrity by regenerating this message authentication code together you have ensured the integrity and the confidentiality of your information that has been exchanged. So this is how HTTPS works by first the website will give you SSL certificate tell you that it is genuine prove to you that it is genuine then you will generate a secret key and you will use encryption and message authentication codes in order to guarantee confidentiality and integrity of the data. So the last concept that I want to discuss with respect to network security that is widely used in computer systems is this concept of what are called cryptographic hash functions. So a hash function is nothing but a function that maps a large message into some small fixed size value you take a large message you hash it you know there are many hash functions available like SHA1 SHA2 you might have heard of them or MD5 you take this large message and you hash it you get a smaller message. And what are the properties of this smaller message you cannot easily invert it that is you cannot guess what this message was by looking at this hash. So for example you cannot have things like you know look at the last few bits of a message and use that as the hash that is not a good hash function why because from that you know something about the message. But from actual hash functions you cannot guess anything about the message by looking at its hash they are you know there are theoretical guarantees that give you all of these guarantees that you cannot invert that is you cannot go from the hash to the message in any way. Then the other property that these hash functions have is you cannot generate any other message that hashes to the same value. So suppose the message m1 has hash to some value h you cannot easily find any other message m2 that also hashes to the same hash value h ok. Because of this hashes provide you a lot of security for example on all computers passwords actually use hash functions. On a computer for various users you do not store their passwords directly in a text file you know you have to remember the password somehow to verify that the password is correct when the user types it. But what if you store the password and some attacker will read the password file that is very dangerous you know you do not want to compromise the privacy of the users. So what computer systems will store is they will store the hash of passwords. So whenever the user sets a password the hash of that password is stored. And the next time the user logs in once again you will take the typed in password hash it and then verify with the stored hash. If it matches well and good that means the user has typed the correct password. So this will ensure that there is some privacy of the user is preserved you cannot guess what the password is by looking at its hash. At the same time you know just because you are storing a hash does not mean any random person can log in. It is very hard to generate another password that also has the same hash therefore it is still secure. And another common usage is you know websites what they do is they post this hash digest of large files. Suppose you are downloading some operating system image from a website. How do you know that the image has come correctly has been downloaded correctly nobody has tampered with it and all of that what you will do is once you download this large message you will take a hash of it. And the website will tell you oh this is the hash you should expect. So if your data was corrupted then the hash will not match but if your data is correct then the hash that you compute will actually match the hash that the website has already put up. In this way you can guarantee the integrity of large amounts of data that is downloaded over the internet as well. So with that I would like to wrap up this lecture I have told you a few brief concepts about network security like what are firewalls what are intrusion detection systems and what is HTTPS what are cryptographic hash functions. So this is only a very brief overview of network security but these basics will help you understand you know some fundamental concepts when you design computer systems. So to understand HTTPS better I will request you to once again we have discussed wire shark based practical assignments in the previous few lectures also. Now also once again you know go to a HTTPS website and browse and see what all information about the website can you actually capture you know you will see that the data itself you cannot see even though your browser is displaying the wire shark cannot see what is inside the packet. You can see TCP IP headers but you cannot see the application layer messages whereas if you go to a regular website you can see all the data that is being exchanged. So do this exercise to understand what security exactly does HTTPS provide to you. So thank you that is all I have for this lecture and this week wraps up our discussion of how computer networks work and in the next week we are going to move on to a new topic in the course. Thank you all.