 So hello, I'm here to talk about how web servers work. So this is not a very formal thing. It's more like just train of thought. I'll, I have some notes and I'll talk and then take questions via the HackMD you see here. And this was this spontaneous event we thought of this morning when we realized in order to do some stuff with web programming, there's some other things. Like it's good to know how the technology stack fits together in order to do stuff. And then doing stuff teaches you how the technology stack works. So maybe try to start somewhere by showing you some things. So what does it take to get something from a web server to your computer? So I sort of put it into these different layers. There's the transport or TCP IP level which is basically making the raw network connection. Then there's hypertext transfer protocol which is the language the web server speaks to the client. And then there's the server processing of stuff and then how your browser presents it. So let's sort of talk about this some. So I'm not really gonna spend much time talking about the network level but it's sort of how to make the actual connection. So let me give an example here. So actually let's search for something. Python, a simple Python web server. Yeah, here we go. So if we run this Python command we will serve a web server. And what we noticed there's, so Python three, we're running this module and 8,000 is the port number. So we see 0.0.0 8,000 and if we go to web browser and type local hose 8,000 we see the web server just responded. So in order to connect to anything the most the things you need to know are the address and the port number. And here we see it's listing this empty directory. Okay, and this is really no different than any other network traffic. There's nothing special to web here. There is a translation from names to IP addresses like here we had local host. That's the same as 127.0.0.1. And to do this mapping, well at least on Linux there's host, let's take skicom.alto.fi. And it says skicom.alto.fi is an alias for read the docs. Read the docs has this IP address. So yeah, so by default if you don't give a port number it's port number 80 or 443 for SSL. So this address and port stuff is mainly relevant. So if you can't connect to something you're debugging that means there might be something wrong with the firewall or routing between your computer and the other thing. In which case you look at the network and make sure everything is open and the things can go in. Another case that often happens is if you're developing on another server and there's no network access and you need to connect. So for example, let's say I SSH to kosh.alto.fi. And then, okay, so here I've started running this on a remote server and I wanna connect to it. So say hi to my cat. If I come to my web browser and try to connect to kosh.alto.fi it does not work because there's a firewall and you can't connect. So let's say you want to get the connection into there. So SSH to our rescue. So kosh and we say dash L which means make a local port forwarding. So we want to connect port 8001 on our local computer to a local host, 8000. So this means on my computer, listen for connections on this port, send it to the other side which is kosh. From kosh it will connect to local host which according to kosh is, well, local host itself to port 8000. So let's try this. So now it's 8001 locally. So it says cannot connect. Oh, because I haven't run this yet. And if you wonder what this SSH no mocks mean that means it's an alias I have set up to disable the multiplexing stuff. And what do you know? Here we are on kosh. So if I touch a file in this directory, it appears there. So there's other different ways of doing this. I can add one, which you can look at yourself. I don't think we need to spend a lot of time on that now but when you're doing this kind of development being able to get the network connections going to and from anywhere is really useful. Okay, any questions so far? If so, write them in here. Okay, or even comments. Is this what you're looking for? Let's go to the next phase then. So at any time if you have questions just write where I can see it in the HackMD here. No, like, interrupt at any time. Okay, next up is HTTP. So this is sort of the main language between the client and the server. And it's existed since, well, the web has started. It's gone from something very simple, like HTTP one, which basically you made a request and it sent a response to all sorts of things like web sockets and asynchronous communication and so on. But the idea has basically remained the same. So the basic idea is you make a request and then on this connection, the client sends the server git and then a path. And there can also be headers like this. So here it says request header of something. And then the server sees a request for a certain file. How does it come back from the server to the client? Well, the server responds with something like a status line 200, which is the status code. And then some message, which in this case is success. Then there's can be response headers like this. So an example response header would be all kinds of metadata. And then there's a blank line and then there's the body. So let's try to look at this in practice. So one really nice thing we can do is web developer tools. Actually, let's go to our local host one because who knows who else will be connecting to my thing on Cosh. So Firefox and other browsers have something called web developer tools. So for Firefox, I can find it in peers, more tools, web developer tools. My cat is getting too interested in me right now. So please, if you're following along, find where this is in your browser. Okay, so let's go to the network panel here. And this shows all the connections that are happening. So I will click refresh. And I see what's going over the network. So first I requested, we see there were two requests here. It went for the document root, which is well, the file path and then save icon, which is like the icon that you would normally see here otherwise. If I click on something, I see headers. First, I look at request headers. So my browser sent all of these things. It's sent except the encodings. I'm not sure what sec fetch modes are. I mean, over time or user agent, which says what will be rendering it. So all of these things have been like slowly developed over the decades that the web has existed. All kinds of standards like language, negotiation and so on. I see the request line was get this path, the server status was 200 okay, some version amount transferred and response headers. So like it declares how long the content should be. It should be HTML and this kind of stuff. See is there, let's look at raw. Okay, so I can turn on raw. So the raw request is this here. Get host, all this stuff. Raw response is, well, it doesn't include the content here, but I guess it would be right below. And here's the content they got sent back. Okay, and that's sort of what everything is built upon and using these web developer tools, you can really go in and see what's going on and yeah, what's going on under the hood. Types of queries. So there's get queries. So if we gave arguments like this, the convention is if there's a question mark and then you have argument equals value ampersand argument equals value and so on. Let's try this here. So here we see it says directly listing for argument equals value. So it clearly, this simple Python web server doesn't do any of this processing. If we look at the request, yeah, just like this. So basically this web server doesn't read, doesn't parse this value and turn it into what the options are for the call itself. But other programs that would return this would split this off and handle it specially. Yeah, this git was the, then it just post. So you can post to again, some file path. This is not specific to post, but I included it here now. So you can say what the host, like what this domain name you're requesting is and what does that mean? So this is used if you have multiple servers running on the same IP address. So it can select the right one to respond to things. And then there's a request body. These things are HAD verbs or methods. You can't see my camera, but if you're looking at the recording, then you can see some, my cat trying to eat me here. So you could say git and post are the most common verbs, but there's more. And there's nothing intrinsic to the protocol here. So you can send whatever you want. They have different context dependent meanings. So feel free to look at this and see how things are built. And many of these are added later. Like for example, there's a connect verb, which I think says, okay, we're not speaking HTTP anymore. Let's open a web socket and so on. So there's other conventions. So for example, oh, did I mention the status code? Yeah, that's 200 success. So if we open, so there's convention that a git request should not modify the server side state post does. So basically a web browser, if you refresh a page that's gitted, it will just refresh it. If you request a posted web page, it will say, are you sure you want to resubmit this page? It might say, repeat your purchase or do something else again. Okay, the status codes. So like we saw here, the response contains some status. And there's a huge list of what all these codes means. Maybe one of the most common ones is 404 file not found. Meaning, well, it did not find, like the resource was not available. And let's open this and take a look at it. The status codes. And we see there's very many of them here. And again, these are all sort of conventions. Like you could go and make up a status code. The text of them is sort of conventional but can be specified separately. But yeah, you could return anything to the web browser and it may or may not know what's going on. Maybe we often see internal server error, which meant that the code that was generating the response itself failed. Okay, so you can compose these into several things, like this location here. Let's say you want to return a response that redirects to something else. So you say, you can look up the status code for redirects and there's say, move permanently. There's move temporarily, see other and so on. And if you read these, you can see what it sort of means. So let's say we want to return a moved permanently. So now this response, the browser would look at the status code and say, okay, I need to go somewhere else, look at the location header and then redirect you there. So for example, this is how it would redirect you from non SSL to SSL. It would say, okay, the connection came in. This is not coming over SSL. So I'm going to redirect you to the HTTPS version of it. Actually, maybe let's try to do that. So let's make a new tab, open the developer tools on network and let's look at ski comp.alto.fi. Notice this is not SSL. Let's see what happens. Well, it's loading a whole bunch of resources. If I go up, I see what do you know status 302. 302 found. And it says location the HTTPS version. For some fun, just go look and see what all these other headers may mean. There's things for caching, there's things for anything I'm forgetting about the HTTP part or any questions you may have. And someone gives some feedback. Are we doing remotely what you're interested in? What's about half the time, so seems okay. You can also comment by voice. So now that we're done with the HTTP side, what happens on the server? So once the connection comes in, how does the server even decide what it's going to return? There could be, the question's coming. So sometimes you have to give the windows to narrow. Is there like some sort of, if you click something like here and it goes out, can you find? I'm starting another Chrome browser and let's take a look. So here I see it here. I basically have the web developer tools on, I know the shortcut, so you can find it. Cause I come here so often to figure out why stuff is broken. Okay. If you don't see any data in the network tab, then you need to refresh the page. So basically it doesn't remember past things, but you have to open the developer tools or the network tab and then make a request and then it will appear there. Okay. So how does the server decide what to return? As in how does it make this kind of response? Well, that's really up to the server. So every server can do its own thing. So some simple strategies would be to map to a file system path. And that would be basically a web server, like a lot like this Python one here saying, okay, let's look for a file one and it tries to find file one under the web route and then returns it with some minimal headers. But if we're here, we're probably doing something more fancy than that. We want to actually run some code to make the content. And for that, it's up to the API of whatever tool we're using. So an old fashioned way of doing this was the common gateway interface. It was basically a UNIX process-based interface. So anytime a request comes in, if the web server is told use common gateway interface to return the answer, it will run a UNIX process for that request. Environment variables convey all of the headers. So for example, HTTP, what was the header we saw? Like this request header becomes an environment variable in the CGI process that HTTP request header. The request body comes in via standard input. A standard output is the raw response, which importantly, there's headers and then there's a blank line and then there's the body content. And if there's no headers, it needs to start with a blank line. And with this, you can basically map anything from the request to code and then process stuff and then return it. So CGI is old and somewhat inefficient because it has to start a new UNIX process every time there's a new request. So web frameworks these days have many more advanced ways of doing things. Basically every framework will have its way of listening to the request. It might include its own web server or it might be some efficient connection process to an external web server. The code comes in and then it does processing and returns it. So here's an example request in Django. Let's look in. So, well. We're not going into details of how Django works but the point is once you have a view like a function will be called for every request that comes in. And in there you have an HTTP request object and there's for example dot method which would be git for example or dot headers which is a dictionary of all the headers or body which is the body text and so on. And if you look at Django request object you get documentation for it and basically see all of these things. And every one of these somehow maps to the thing I told you about HTTP. So path shows path to whatever was given. Let's look at some fancy things. So here HTTP request dot git it says it's a dictionary like object campaigning all HTTP git parameters. So what that means is let's find our git request. Here we go. It was git file and then question mark argument equals value, argument equals value. So Django knows the HTTP convention and will automatically put this into a dictionary of argument to value because that's a useful thing to do. But that's still somehow dependent on the choices the framework makes and I mean you don't even have to use that. So there probably is a way to get the raw request. Well, I can't see it here immediately but I know there's some way to get the raw, get all of this raw information from Django. And likewise, there's a way to make all of these things here in your framework. So looking at Django, we see make an HTTP response. Response equals HTTP response body content. Response dot headers equals whatever. So here I'm setting content type to text plane which is how we would say that the web browser should parse it as a plain text file instead of a HTML file. But let's look at response objects here and we can see even more. How do I get the table of contents? The response, we see response subclasses. Let's come here. And we see there's built in things for like HTTP response redirect which is a simple way to redirect. So this would set status code equals 302 with some message and then set location to whatever you're giving. And you see these other things here or you know, make them yourself. So this is the point where you need to understand your own web framework, like what these things mean. But in the end it's going to map to the raw plain text HTTP that's going over the wire. There's plenty of other fancy stuff that can happen here like web sockets which instead of being a request and response kind of thing, it's a request and then it opens the connection and then it's a continuous socket that stays open for long-term communication which I guess is used for all kinds of things like dynamic web pages or web pages that will probably this hack and dee here uses web sockets to give continual updates. There's plenty of modern standards like including multiple requests in one request and then returning multiple responses that wants to make things faster and so on. I'd like this idea. Let's look at this hack and dee. In the developer tools and see what happens. Network is open, I will make the request. Now let's see what fits in here. There's git, we see several things that are blocked by an extension. Git, git, git, git, git, git, git, git, git, git, git, git. Okay, here's a post request. Let's go into edit mode. I'm actually surprised I don't see a web socket opening here, well that just goes to show that I don't actually know that much about web stuff. Probably one of these opens it somehow. Anyway, let's continue. Any questions so far? Please write, I'll give a few minutes here to think. Okay, let's go. So then client side. So once the web server is sending something back to the client, how does it actually present things? So there's some hints from the server. Actually, let's try this. So I just made two files here. So let's try opening file one. So file one, this presents and it looks like HTML. So if we look in here, we see it says content type is text HTML. Let's look at the other one. This looks different. It looks like it's monospace text. So if we look here, we see content type was text plain. So when viewing something alone, you can, the server can send this content type back, which might give a hint about presenting the file itself. But most stuff is HTML. How does HTML work? Let's again, have the web developer tools open and do inspector. Is inspector good? So this is what's in the body here. So I'm not really gonna go into HTML. So you've probably seen it before. If not, it's like such a much larger thing. I think it's sort of besides the point of what I'm talking about here. But the basic points, the HTML is the structured content of what's in the web page. So by hovering over here, we can select each element and there's like a header. There's, here we see within the HTML, it's also sending a content type. Okay. Another common thing you see is CSS or cascading style sheets. So here's example CSS and example HTML. So there's body. We see some bold text and we see span with a certain class and some content. So here the bold elements will have color red applied to them and then notice there's a period here. Period here. So class AAA gets colored blue applied to it. Let's come back here. If I look into here, let's take this H1 object. We see styles here. And here I can sort of override styles. And there I just turned this heading blue. So this is often useful when you have a page which already sort of works and you need to modify it and figure out how to change things. You can come here by clicking on any element, you can see where the CSS comes from and then adjust it dynamically. Let's go to Wikipedia here. If I, I'm not sure why right click is doing that. I right click and go to inspect the element and now it opens here. It opens right up to the inspector. And I see there's this element here, multiple choices. Yeah. So what do we see in here? This is a DT element. And if I look in here, I can see where the different styles come from. So this DT was defined in, well, whatever this resource was. And you notice how I can turn it on and off. I said not, I wouldn't go too deep here, but this is something that just coming and playing around with stuff like this, you can start learning a lot about how the web browser is like how all these things work. So yeah, have fun, break some web pages. Okay. And then JavaScript runs in the browser. So one of the types of resources that can be sent, let's come to the network here, refresh. We have type, okay, there's images, JavaScript. So, and this is somehow loaded through the HTML. So the HTML says, there's header, script. Somewhere here, you find the HTML saying, please load this JavaScript and run it in this context. And the browser loads the JavaScript and then runs it and it can do things like edit the content of the page, make other requests, serve dynamic web pages and so on. And yeah, and at this point, you sort of need to study how JavaScript works because that's a whole other topic of conversation, which I'm not really prepared to talk about. But still, I mean, JavaScript is nothing more than a different resource that's loaded via HTTP. So overall, maybe we can summarize. So I go to Wikipedia here. My web browser looks up the IP address for yenwikipedia.org, sees I'm connecting over HTTPS. S means SSL, so it knows it needs port 443. It makes a request to the web server and says I would like this path on this host. The web server runs some dynamic content and returns one HTML page to me, which is the master document here. And this document here includes references in the HTML. It says load these images, load this CSS, load this JavaScript, which are all processed by the web server and assembled into the web page itself, which is a layer I'm not getting to. And now it's been a little bit, okay, that's my cap, pushing buttons, and now it's been about an hour. So I'll stop here and just answer questions. And if there's no questions, I will feel free to ask me a voice. It won't appear in the recording. Yes, yeah, so I guess this is a question of what server resources can you use? So you can request, for example, a virtual machine which will basically install everything yourself, so the web server, the web framework, and so on. And then you request the Alto firewall people to open it to connections from the outside on the certain address and port. There's also shared hosting, where there's basically one web server that runs many different sites and it uses this host header in order to select the right site to serve things. This works well for static files, for dynamic content, then you get to the point of like sharing, well, multiple people running code on the same server and I guess containers and virtual machines have sort of taken this over. I guess other options are to use containers at CSC. Or another option is you can set up, like start developing it on your own workstation or something. And when you need to test it, you use these SSH tricks to get connections to the inside and then you only open it to the world at the end whenever you really need it to be open. With clever programming, you can do most of this on your own computer and then you sort of like a simple, put somewhere permanent at the end. I don't know if this really answered your question. Like there's so many different things there's like where you host it, there's what you run on there, how you access it. Not exactly. I mean, for static files, there's things like users.alto.v but that's not really dynamic code. So, I mean, maybe I'm forgetting something simple, but if there was, I guess, people would probably be using it and I wouldn't know about it. I guess the fundamental problem here is when you run dynamic code, then you're running arbitrary code supplied by a user on a shared web server. So old web servers with CGI would do things like they would su to the user and then run the code as that user. And that way you could have multiple people running dynamic code on the same web server. But there's all kinds of security related things that can go wrong. Exactly, yeah, yeah, yeah. Yeah, so like there's, you see this giant cascade of stuff. Yeah, like, but it all starts from here and this somehow refers to these things. Well, for example, we see lots of images. That's sort of obvious, like an image file looks like, let's try opening this file in the web browser. So file one, so we see a broken image here. What do we think the network says? So file one, it tried to get this other image. It broke and we see the server returned 404, the status not found, but it doesn't know it needs the image until it gets file one. And that's one of the reasons for things like HTTP too. So before the server would have to return the response, process it by the web browser. And then the web browser says, okay, give me all this other stuff, but people try to do fancy things like package all the stuff together and then, well, send. The web server can predict what the client will be requesting and send it in advance. So what about dynamic sites? Like, let's say it's your uploading data to a web server. Well, then it runs this arbitrary code and then would store it in a database or store it like a pen something to a file and so on and so on. And that's basically like an extension of what we've seen here. So the question was how specific code is to a certain web server, which I interpret as meaning web framework. Yeah, I guess that depends on how standardized your web framework is. I think for many of the web frameworks I've seen, it's pretty specific to the framework itself. Like translating from Django to say tornado is not necessarily a simple thing to do. So I guess I would just assume unless you have reason to believe otherwise, you would need to change it. Of course, you would just need to change the way it interacts with the web server. Like for Django, this API of request object and response object, you would need to adjust that but the code that actually does whatever, rendering or computation, that can be the same. And I'll bet someone has made a framework of frameworks. Like you can write your code with this framework and then you can connect it to Django or multiple other frameworks but just getting to some crazy levels of abstraction and I don't know if it's any use or not. But for web server itself, let's see. So for example, Django itself can run on multiple different web servers. So, okay, that's a good question. So Django, yeah. So in most of the Django sites I make, the thing that your browser is connecting directly to is not Django itself. So first it connects to another web server. For example, Apache or Nginx. And then it says, okay, if I connect to this web server, let's take this Python example. So if I connect to here, you localhost 8000 file one, it returns this file. And then if I access the web app path, it would take that and say, oh, this is a special thing. This isn't a file. This is connected to this web application. And then it sends it to the Django internal web server. And then it renters that, returns it to the outer web server and then forwards it on. So there can be multiple ways of doing this. I was looking for here for, okay, using Django, how to install it. Let's see. So this thing we see here, WSGI stands for web server gateway interface. It's a standard Python protocol API for communicating between web servers and web apps running in the server. So it basically is replacement for the CGI that I explained before. So instead of the web server starting a new Django process for every request, it keeps some requests open and then basically is piping them into Django and Django stays running and then serves them in sequence and then sends it back. And let me find something. Oh, here's an example of configuration for Apache. So web server gateway script alias. So now here the whole root of the web server gets redirected to this Python file. Actually, I bet I can open a, and this is an example WSGI module. Exactly what would be referred to here. So we see at the bottom, it says application equals get WSGI application. So that means this web server, it knows how to call methods on this with Python that say, here's a new request and please give new response. And this is sort of like the connection between the web server written in C and the application written in Python. And I guess this involves something like, I really don't know, there's different options here. But this is just one of the options. Another option you could do is use Django's built-in web server and the outer web server could do a reverse proxy. So basically the connection comes in on HTTP and then it sends it over the localhost network to Django and then it renders the response and then sends it back. Here's a mod WGSGI demon mode. So this will start an external process and somehow be able to communicate over a, well, something. Here we see an interesting section serving files. It says Django doesn't serve files itself. It leaves that job to where their web server you choose. You recommend a separate web server as a not Django for serving media. So here we see, for example, directory static. It's granting access to things that are in there. And then this would be combined with the script alias. So if it comes in and it matches slash static, the web server serves it from this path. If it doesn't match that, then it will definitely match a single slash and then send everything else to WGSGI. So for example, images are served by the high performance web server written in C or Rust or whatever cool people are using these days. The Python stuff is rendered otherwise. And that's all a matter of tracing the pipes and how things are connected. More questions. Thanks, I hope this was helpful to someone. I guess I'll stop the recording and then we can continue discussing. So if you're watching this later, thanks and I hope this was helpful.