 lecture on networking and HTTP. This lecture has the intention to cover the basics that are needed to understand how the internet works and how requests travel in the World Wide Web. And that knowledge is at least on a basic level necessary to really understand what we're doing in web development. It is quite abstract, so you'll get much more in-depth knowledge when you take the networking course, but this is sort of the essential thing you need to understand what's going on. What we'll cover are networking basics. So what are hosts, what are IP addresses, requests and responses in the web, and a little bit on the TCP IP protocol. And then in the second part of this lecture I'll cover the HTTP protocol, which is the protocol that really runs the World Wide Web that you are using every day. The learning outcomes are three that are official and then sort of a couple of unofficial ones. So you should be able to summarize what HTTP requests and responses contain, what they do. You should be able to list different methods and their purpose, and finally explain the features of different HTTP methods. And we'll get back to that. Now in the official syllabus there are no learning outcomes on the networking, so these are sort of unofficial, that you should be able to distinguish what the difference is between the internet and the World Wide Web, explain IP address structures, and summarize TCP IP. These are not really course learning outcomes, but as I said they are somewhat necessary to understand what's going on. Now the book we're using in this course does not cover much on the basics of networking on HTTP, so there is not a dedicated chapter there. There are some smaller explanations on client server on HTTP response codes and on the HTTP methods in chapter six in the server chapter, but these are very brief. And they are needed in the server side part, which we'll get back much later in this course, so you don't really need to read in the book for this course. And then there are a number of resources which they are all not exam relevant, but they are the official documentation for the different things. So the RFC 122 or 1122 is the technical description of all the different layers of the TCP IP. So it's really a description of how the internet works really. HTTP on developer Mozilla gives you a good overview of HTTP, so it's just a basic overview, but much more digestible than the standards and then 7231 are the official standards for HTTP, if you want to look into that. And then again, there is HTTP status.com, which gives you a brief description of the HTTP response code, so that's something we'll go into later. Now, as we discussed in the first lecture, the World Wide Web was invented by Tim Berners-Lee in 1989. It was public from 91, and we said it contains hypertext, hypermedia, all of these things. But then, of course, the question really is what's the difference between what I wrote here, the World Wide Web and the internet, because nowadays we really discuss those interchangeably. We use those however they fit, but they are really different concepts. So if you listen to the inventor himself, then the internet is a network of networks. So this is really all about the hardware. It's computers, like your laptop, your mobile phone, cables back in the days. Nowadays, it could be wireless, but it's all about how are things connected physically, and how do I get information from one computer to the next. It's sort of the architecture underlying everything. The web, then the World Wide Web, is the information space. So it's sort of all the documents, all the pictures, all the videos that you can access somehow, and that's really the difference. So on the net you find computers, that's the hardware. On the web you find documents, websites, and so on. So it's two different things. And the first part that we talk about is the internet. So a contraction of interconnected networks is really the global system of connecting computers. That you can access a picture that is lying on the other side of the world on a computer in San Francisco or in South America or so. And this all started with the so-called ARPANET, so that's the predecessor of the internet, which was originally a defense and military thing. So all of this was originally funded by the DOD in the US, and the intention was academic and military. So it was really all about exchanging academic information, for example data from experiments or military information. And already then something will go back into in much more depth. Back then already we used TCP-IP, so this is a combination of protocols to basically make sure that your information travels from one computer to the next. And this one is a nice little map of the ARPANET network, so that's in 73. This is a map of essentially the internet. So what you see here, the round circles are computers. So you see there's one in Hawaii, there is one in Stanford, there is one in University of California in Los Angeles. And then these are sort of the internet, the networking stations in a way, the cable shacks where everything would be connected. So this is really a logical map of how the internet looked like. And as you've seen it's pretty small. It's just a bit of the US and nowadays good luck trying to draw a map of the internet. But that's the background. So it has evolved from a military academic thing to really a worldwide free open infrastructure. Interesting is of course that nowadays we have a lot of countries, Iceland being number two here, where almost everybody is connected. The reason I show this map or did this figure is really to give you an impression of how this is in other places. Because if you look at 2017, then you see that we are in Europe, we are at 80%. So you already see the internet use on average is much, much lower in Europe than of course in Iceland, only 80%. If you go to Africa you're at 21%. So there are interesting things here in the sense that whenever you design your applications you should consider this that there are 20% in Europe, there are almost 80% in Africa that don't have access to it and that of course gives rather big implications. And that's really the societal dimension of web development that it's something that is not accessible to everyone yet. In Africa most likely the number has increased dramatically because an interesting thing if you look at it is the development here that you have a tenfold increase over 12 years. So it's quite a dramatic increase compared to other things. For example Europe has gone from 46 to 79, so it's not even double. So this is also where you can expect a lot of development in the coming years. But this page, this slide is really the message, keep this in mind when you develop your applications, who has access to it. Now the internet we discussed is a connection of computers and the question is how does it work? And what it really runs on is a protocol stack, a number of protocols that are called TCPIP and their purpose is to address and route data. That means well you have to have an address to send something over the internet. You have to know where it's going. You have to know how you route it, how you direct it to the right place. In many cases like when you watch a video, this video for instance, you have to segment it. You cannot send the whole thing in one piece but you have to break it down into smaller packets and that includes a lot of work because you somehow have to make sure that you get those packets together again. You have to make sure that all of them arrive and so on. Then whenever you write your applications, you somehow have to use this so it should be simple. Not everyone that creates a website for example can know all the details of TCPIP and we have lots of errors in the internet so it has to be reliable. For example, the most common thing nowadays is we have Wi-Fi and a connection over wireless is always error-prone so it's easy to lose information. You have to make sure that everything arrives if something gets lost you send it again and so on. Finally, you have a structural of the internet so not everything is in the same address space but you somehow segment the whole thing that you have subnetworks. For example, the university here has a subnetwork just for the university and then you have everything outside. A good thing about the internet is that all the standards are open so there is a task force called the IETF that discusses how to evolve the protocols, how to change them and they are published in so-called Request for Comments. These are the RFCs and that's where you can see all the standards. Many other standards are not open. For example, many of the ISO standards you might have heard of you have to pay to get but the IETF standards are all openly accessible and maybe that's a good time to actually look at one of them just to see what I mean. You can just go to this website and you'll see that you have the requirements for internet hosts communication layers. This is the specification for essentially how the internet works and the RFCs are always in this kind of format so that they look like it's an old-fashioned type document but it's really all you need. So it tells you how is the internet structured, what are hosts and so on and then it links to all relevant things. That's good. It's not very good to study because those things are long as you see. The standard alone has 112 pages but of course if you want to write an application that works for all of the internet you have to know the details or if you write something for the IP protocol you have to make sure that you cover all the cases. So that's really when you want to look into there because you get all the details. For example, how is a machine addressed, what's the address and so on. So those are the the RFCs. We might use them now and then but now we get to the question how do we actually do requests in the internet because what I can do, what all of you do many times per day is request information from the internet and a typical thing I might be doing is I am on my computer, I open my browser and I say go to Google. I want to Google something and I press enter and you all know what happens. Ideally I get the Google front page and the question is now what happens in the background. What happens in the background is that my request is going through a number of machines all over the place and at some point they arrive at Google and these machines are at different places. For example here you see ru.is so that's most likely that's a computer that is standing in the university. Here you have something that's called Nightholds Week so that's probably an Icelandic thing and again it's .is so it's still in Iceland. Technicarder so it's same thing and a computer in the internet I probably forgot the s here but it's most likely it's is. Then there can be many other computers in between and at some point I have something that is really hard to read.net. This is as far as I remember it's already a Google computer and then I'm at Google so somehow my request from my machine in Iceland travels through a number of computers through a number of cables or Wi-Fi to the target host to the target computer and that one then of course sends the information back. Now we can look at this so in the command line you can look at a request by using the trace route command in Windows it's traceRT I think but I could say trace route Google so I want to know how is the route from my computer to Google and when I press enter it does the whole process so you see that it says okay the first step is going to this machine here. The second one is going to this and here is something at the ring road so it's all somewhere in Iceland .is and then here in Nordu.net is maybe already some computer that is not in Iceland. It says Ray so maybe that's Ray Kjavík but then UK is probably a machine in the UK so you see that somehow this request travels through different computers and that's really what I've depicted here in the picture. Now what this is really doing is that I send the request to google.com I press enter on my browser the request is routed so it's somehow directed via a number of servers a number of computers that are in the middle they intermediate and then the response from Google comes back to me. An important thing is that routes can differ over time so if I do the same request again tomorrow it might go a different way and also the response from Google can go a different way so it doesn't have to be exactly the same so which route I'm taking doesn't really matter as long as it arrives and the same goes for the response and how all of this works so how my computer knows that I have to go here and so on this is handled by TCP IP by the protocol stack and that one also handles for example if something gets lost along the way that I'm sending it again. How does this handle is actually quite complicated and this is part of the networking course so we just do a small excursion here but what you typically have are two so-called hosts so a host is basically a computer that has information and in my case host A would be my computer I want to get to Google and host B is the target that's google.com and in between there are so-called routers so there are machines that direct my request and the data then flows in a pretty complicated way so you are in your application I am in Firefox for example and I type google.com and this request is then broken down to different layers and sent over the internet and then it's sent to a target application so in this case it goes back to my browser but there are a lot of things happening in between and this is something that can be discussed in much much detail so what I'm doing really here is an is an application request in this case an HTTP request in my browser I type an address then this request gets broken down to the so-called transport layer and the role of that layer is really to do the addressing so it makes sure that there is a connection between the two computers it makes sure that my packets get sent in the right way and it makes sure that everything arrives that's the role then we can go down a layer further we are on the internet layer and that's the addressing the routing so it's really addressing individual machines which address does my computer have which address does the computer here at the university have which addressed us the google computer have and how do I take this way and finally we have the so-called link layer the lowest level that's all about the hardware so as you know you can have a wi-fi connection you could have a bluetooth connection you could have an ethernet a cable connection you could have a fiber or satellite connections there are lots of different hardwares in place and the link layer has the role of handling these things so that it doesn't matter whether your computer is connected with a cable or with a wi-fi or with a fiber so that's the link layer and we won't be talking much about this layer again that's something for the networking course so we start on the internet layer addressing and routing and that's what the ip protocol is all about now what you see here is exactly the same i've shown you before the only thing is i've replaced the names of the servers with these numbers and if we go back to my console window you see the same here so here you have the name and in brackets you have an address and this is the so-called ip address they usually look like this so they have four blocks separated by a dot and the role is really to identify machines so you have this number you have this address to identify a machine uniquely and the host name so the real the name i've shown you before is really just something that is used for for human understanding so if it says google.com that's because it's much easier for me to remember google.com compared to this address here so host names are just for for human understanding but in practice in the machine they are translated into an ip address this is done using dns that's not so important here just so hey you have seen it so for example google.com has the ip address 216 58 211 110 and what you can see is you can use these as well so if you're out of some reason if you prefer to remember this number instead of google.com you could also go to firefox and just type in this number and it will also bring you to google and as you see the the browser has replaced it automatically so it's really interchangeable it's just that we are much better at remembering names than these kind of numbers there are two different formats we just look at ip version four here but ip version four is four blocks of eight bit numbers so numbers between zero and 255 and they're separated by a dot so that's an an address and if you do a bit of combinatorics on this you get that you can have four billion possible addresses so that's all you can have in the internet some of them are reserved for example you will see a lot in this course the address 127 001 this is always the address that points to your own local machine so whenever you enter this you try to access your own machine now four billion sounds like a lot but this is actually a problem because with the whole iot development internet of things we are planning to have about 50 billion devices connected to the internet in i think it's 2020 or so so already now or at least in the near future so these addresses are not enough we need more and that's why a couple of years ago they released ipv6 version six that has an extended address space so it supports much much more addresses enough to i think enough to address every atom and the universe or something like that so it's a pretty crazy number but it's it's basically to ensure that in the future we'll be able to grow the internet without running into problems because every machine has to have a unique address otherwise we cannot connect to it now the interesting question is if i have the address so i know that this is for example google how does my computer or how does a router know where to send my packet where to send the request and the answer to that is something that's called a net mask so that's an additional number that is required to to identify whether the current ip address is in the same network or not and what you see here you have the address 216 58 211 10 slash 24 and this slash 24 is the network mask and it basically says the first 24 bits in this address are the network so they identify the network and then within that network the remaining bits in this case we have just 8 bits left identify the machine and this way a computer can calculate okay if i know 24 i can say that 216 58 211 0 that's the network so this is sort of a sub network of the internet and in this network i want to get to machine 110 and now the router can basically say that if my network address is different from the request then i'm in the wrong network i have to send it outside i have to send it to the internet and that's exactly how routing works on a very basic level so a router always knows the own network and it knows some other routers somewhere in the internet and whenever it gets a request it basically checks is it in the same network and if it's not it sends it out it sends it to the next router if it is in the internet then it has a list of all the computers that are within the same internet within the same network so on a very basic level that's how routing works you check am i in the same network yes or no then just to mention this there are some ip network numbers some ip ranges that are reserved for private networks and i mainly mentioned them because you might have seen them so for example if you have a router at home for your for your internet very common ip addresses are 190 268 0 so that's that's the typical network that you have at home and then there are two more that i used usually for for larger private networks so for example here in the university you see that we're using this 10 dot and then the remainder is the machine name so those are private networks and if you see those you always know that there are within sort of the within the organization at home and so on so if you check your ip address at home you'll see that most likely it's something like that 192 168 0 okay so what we've discussed so far is that i want to send a request from my computer to google and we have a bit of terminology here so the first thing is my computer and the target is usually called the host name so that's something you should be knowing i mentioned it already so those are the machines where the request comes from where the response comes from then you have a so-called ip address so that's just the host name translated to a number that's the real address that the computer uses and you have a lot of servers that are in between when you send the request so it's routed via those intermediate servers and these routes are basically the connections between the computers and as we discussed they can change over time so maybe the next time you send this request the response goes directly over here also intermediate servers can have a lot of different roles so they can be routers as we already discussed so computers that basically direct the traffic in a certain direction then make sure your your packet your request reaches the target they can be proxies maybe you have heard that so those are machines that for example filter data make sure that you're authenticated that encapsulate so they hide part of a network they cache that's a very important concept so they basically try to reuse data so you don't have to send the same request all over again so those are typical things for example a typical thing you have in companies what proxies do is they might disallow certain connections so for example the company does not want you to be on facebook all day so they block it that would be the role of a proxy to say whenever you send a request to facebook we instead redirect you to google or we send you to a page that says not allowed or something like that there are computers that are called firewalls that are essentially security things so they make sure that there are no attacks or that if you if you get a response from a malicious place it's blocked or so on there are machines that are load balancers as the name suggests they they try to balance the load over multiple computers so google again is a good example they have lots and lots of computers that answer your search requests and at some point there is a load balancer that says okay if the first computer has too much work to do send it to another one and then finally one important thing is whenever you see that in this picture i'm using the the picture for a machine for a computer for a server but an important thing to understand is that on one computer on one physical machine there can be several server processes so basically several applications running that fulfill a certain purpose so the server is usually used as a name both for the physical machine but also for the software running for example my computer i can have a mail server and at the same time i can have a a web server that's no problem good now this was all about machines computers now we'll make a very similar example in the real life in the non-computer life and that's about post so i do something very similar when i want to send post around an answer to it so imagine i'm sitting here at reikevic university and i want to order a new computer and i want to order it from apple in in the us um what do i do well typically i go to their website but i could actually also send them an old-fashioned i could send them a letter so let's say that's what i'm doing um and what i'll do is i'll send the letter to the internal post here so there's some kind of post system that makes sure that my letter gets sent the internal post looks at it and says okay this goes to us it's not within our own company it's not within the university so we have to send it outside the mail truck picks it up it brings it probably to some central post in reikevic they realize it's not within iceland so we have to send it via ship somewhere else it probably arrives somewhere in the us or i don't know where and at some point it arrives in in the us and it's sent via a truck to cupratino and it's sent to apple directly so that's typically how my letter could could arrive and you directly see that routes can change so it could also happen that my letter gets sent by post by airmail instead so then the route here is different and most likely also the point where it arrives is different so routes can change over time um and the response then apple takes the parcel sends it back to me and again maybe the parcel is transported a different way because it should go much quicker it's going by airplane instead of ship um so this is really exactly the same thing uh what i have down here is similar to my host name i have an address that identifies me uniquely hopefully uh i have these intermediates so here they're basically post offices all over the world that process my email or my mail sorry um you have routes different ways that your your parcel your letter can take um and it's really exactly the same so intermediate servers can have different functions you could have a post office where you sort uh you could have internal post for example uh which is similar to a proxy so here at reikevig university they sort out email uh that goes to different places compared to email to mail that stays within the university um you encapsulate so if someone sends to reikevig university to my address here uh you see that there is not my office number there so they don't know where exactly i'm sitting uh so the internal post actually has to do this sorting so there's some kind of encapsulation going on and similar to a proxy they might filter stuff so if i let's say i want to get a play by boy subscription at work uh my boss might show up and say well you know that's not the right place to do that so it's a similar way to a proxy for internet that they say don't go to facebook while you're here um and we discussed routes can differ so most likely the parcel is taken a different route compared to a letter um and there's a different speed if i'm using a ship or a truck or an airplane so it's really pretty much the same uh whether you're using the internet or the posts or the mechanism is quite similar ip addresses a net mask well uh the analogy here is really street name zip code city name uh if you write a street name only then you have no idea where to go because street names are really common in many places so it's not like your street is unique um so it's not enough to just have the street name you also have to check the city the zip code maybe even the country um and that's the way to distinguish whether you are in whether you have to deliver your your mail in the same city or you have to send it somewhere else so that's quite comparable to the ip address and to the net mask that you need two elements to really figure out where to send it okay so that was the the internet layer basically the ip protocol and now we'll dive shortly into the transport layer uh which as we discussed is about end-to-end connections so make sure that uh connections are established are canceled are the right order of packages is there and they are reliable and this is the role of the tcp the transmission control protocol um so what tcp does is it makes sure there is a connection while you're sending something it makes sure that your request is cut into chunks so if you send the video is put into a lot of small packets and send the way it makes sure that talking about videos for example that there is an order so we discussed that routes can take different times so it might be that your video information at minute two arrives before the video information at minute one so tcp then makes sure that these things are reordered so that you can actually watch the video tcp has a way of checking the correctness so maybe it has changed because of inference in the cable or so on things might get lost because the wi-fi connection was bound and this is all handled by tcp so making sure that in the end you get something that is proper the puzzle is essentially reassembled uh there is an alternative to this so just that we have heard it there's something called the uniform data protocol udp um and it's it's an alternative to tcp because if you read this you can imagine that this is expensive so it takes quite some uh computing power it takes time to actually make sure that things are correct that the order is correct and so on so instead people came up with a udp protocol which is unreliable so it doesn't care that much about order correctness if a package gets lost we don't care we don't send it again if things arrive in the wrong order we also don't care um and the main reason to have that is that it's quicker so in many cases for example if you're streaming video live streaming it's more important to be quick than to be correct uh and then you can say okay i i don't care if the video is a bit messed up in between but at least it's going quick uh so many times for example if you would use skype udp's preferred because it's simply quicker okay um so that was all we cover about how the internet works it's pretty basic but it gives you an understanding of what is happening uh especially since i'll talk a lot about for example hosts about the layers i'll use the different uh terminology so we should be aware of that now we go from the internet to the worldwide web and we said that the worldwide web is the information space so it's all about documents uh pictures videos uh items of interest so something you would like to have and to be formal we typically talk about them as resources so whether you want to have a website or a picture or a video the terminology says i want to have a resource and these are identified in the internet using so-called uniform resource identifiers so those are basically the addresses that say i want to have the wikipedia page on worldwide web for instance um resource can be confusing so if you don't really know what this means uh wait until we get to back end to the server side lectures it will become clearer for now if we talk about a resource you can think of a website an image or a video so that's fine that's really what we want to do um and to make this a bit clearer we abstract from all these intermediate machines we typically talk about the client server model uh so my computer i want to have google.com requests a resource and the server the target host responds with the resource um so instead of looking at the whole picture we talk about the client server model i want to have the wikipedia page on worldwide web the wikipedia server returns the right website or if something is wrong it returns an error and as we have seen uh this is of course an abstraction usually there are a lot of different servers involved uh there might for example be the web server there might be a database server there might be a mail server and so on so it can be many machines it can be different programs but to make this simpler we talk usually about this client and server and that's it for the network part so in the next part of this lecture we look at HTTP which is the protocol for the worldwide web so that's it for now