 Welcome everybody to this talk. How does the internet work? And our speaker is Peter Stuggen. I'm very happy that he's here to explain to all of us how the infrastructure of the internet really works. I'm pretty sure we will all learn a lot today. Please give a big and warm round of applause for Peter Stuggen. Thank you very much. Thank you for being here. This is an amazing translation into French. Wow. So I want to talk about how the internet works and try to shine some light on all the technologies that are involved when we use the internet every day. So why this talk, some motivation first, then a little bit of brief background just how the internet got started. And then we get into the details. So what actually happens between the web browser and the website? That's the starting point. So in the description I listed things from bottom up. So from the very low level packet stuff and through the various layers of the network stack up into the applications. And that's the building blocks part. But I inserted this overview first. What is actually going on between the browser and the website? Because that's what most people already know and use a lot. Some parts, well, some details about the different protocols. And in the end some recommendations for further talks if you find these topics interesting. So the reason I want to give this talk is to talk about how does the internet work, right? The mechanism that we use all the time but aren't mentioned very much. So they are sort of obscured or, well, I don't know if hidden is the right word, but we don't experience the network itself very much. We experience the various services that we use and the services they try their hardest to keep us interested, to fancy tickle our imagination. And I think it's dangerous to not talk a little bit about the network every now and then and to think about the network and to actually fight for a public network that is available to all and equal also neutral. If we focus on the service providers alone then they are going to be deciding what we can do with the network. But the point or the great thing about how the internet is neutral today is that we are all connected or we could all connect to each other. We don't really have to use these service providers. We tend to. This is somehow a human nature to sort of go towards centralization and monopolization. But the internet is a tool that would allow us to try more variants or other kinds of structures. And we need to be aware of that and the importance of net neutrality. If we don't talk a bit about the network, we might lose it. So how did it all get started? In 1970, then ARPA, they started the ARPA net. So ARPA back then is now DARPA. That's the Defense Advanced Research Projects Agency. They developed technology for the US military and they did back then as well. So the ARPA net was, as the quote says from this very, very old document, that the objective is to get all their suppliers connected into a network together and being able to exchange information so that they can, I guess, make progress more quickly, more efficiently. Right? Now it's something else. I think that's good. So let's look at what happens between the browser and the website. So we have a person using a laptop and they have a browser and they type in a web address, events, cccde, for example, to read the blog post, the latest blog post about Congress. So then the browser really does two different things, first of all, or to get to show this page. So first of all, it has to ask for the way to reach this website that we want to reach. They don't deal very well with names or text, at least not network, the network part of computers or systems. So there's this translation, somehow like a phone book, I'll get back to that in a bit, called DNS, which is used primarily, it has a few other uses as well, but it's used primarily to get from this name that we entered, events, cccde that we can also somehow easily remember, to the network address, the IP address of this website. So that's part one. And it says system DNS because the browser doesn't do all of this phone book look up itself, it can rely on the operating system to take care of this, fortunately. So that's the parentheses, that's what the operating system is doing, it's using a few protocols, UDP IP, and that becomes a network packet, we'll get back to those in just a little bit. So once the browser has the network address, the IP address of this website, it creates a connection. So it contacts the web server, and it uses this set of protocols, so it first uses IP to reach the IP address of the server, and then in particular it uses TCP for this connection type, we'll get back to those in a little bit as well, what their properties are. And on top of that, the browser then uses the HTTP protocol, we'll see an example of that in the very end, how or to get to request the web page that we wanted to see. And that's all happening on the laptop in the browser, and part in the operating system that we're using, whatever that might be. Then there's of course this long chain, or sometimes not so long, but usually several machines along the way, routers. We might have a wireless router at home, or in a coffee shop, or here at Congress, and beyond that there is certainly some more routers along the way, or between our laptop, my laptop, and the destination that I want to contact. So all of these routers, they receive some packet, they look at the addresses, where it's going in particular, and then sends it along its way, so they're just forwarding packets all day long. Finally, at the destination on the web server, there are also two parts. So first of all, the request that was sent by the browser is received. It goes through these different layers, these different protocols, and the web server software, it looks at the request, and it sees, okay, somebody wanted the first blog post, then I'll send that right back the same way that I received the request, and that's part two, so returning the response to this request. And it goes all the way through the routers, the same path, but in the reverse direction to the laptop. So let's look at all these different building blocks. All right? So let's start with the smallest one, the network packet. I've talked about packets going back and forth. So the packet or A packet is sort of the atom on the network. It's the smallest useful unit that is sent or processed by the network. I think a good way to explain packets is with regular postcards that we can send with mail, because their size, their maximum allowed size is pretty much standardized. You can't send a postcard which is one meter, right? And that's the same with the network packets. You can't send arbitrarily large network packets. One pretty common maximum size is 1,500 bytes or roughly characters. So just to give an idea of how fairly small the packets are actually, and even that might, I don't know, do 1,500 characters fit on a postcard? No, I guess not. I think that's too much. So maybe the packets are a little bit larger than postcards, but still the analogy is pretty good because you send them out and there's a little bit of structure, like there's a stamp perhaps and a recipient address. But that's pretty much it. So what you write on the postcard on the other side is really up to you, and it's the same with the packets. They can contain anything, but if you write in a language that the receiver doesn't know, then they're going to receive the packet and then actually just drop it because they don't know what you're trying to tell them. So packets, they are sent and received through network interfaces. This is an Ethernet cable, LAN port, or a Wi-Fi antenna, or maybe a 3G modem if you're on the go, out and about, and your cell phone does this of course as well, right? The cell phone has Wi-Fi if you're in a coffee shop maybe, or it has 3G if you're in the subway or on the tram. And one interesting thing or where the comparison to the postcards doesn't really fit anymore is that network interfaces, they can easily pass millions of packets in a single second. So it can be quite a lot of information going through, especially if you have a good internet connection, like here at Congress. So then the next step or sort of if we start looking at, okay, what can we put on the information side of the postcard where we can put any message we want? For this talk, I'm only going to focus on IP version 4. I know it's old and legacy, and we really shouldn't be using it still, but it's dominant so far. It won't be forever, but so far it's quite common, and I think it's something that most of us have at least seen when setting up the Wi-Fi or the new internet connection, right? This IP address that I put up on the slide is maybe the most common IP address there is, right, for the new wireless router. These IP addresses, they consist of the four numbers, and the four numbers, they range from zero to 255, and then there's four of them, and with dots in between is just how we write them. This is an efficient way for machines to identify themselves, but the reason IP version 4 isn't so great anymore is that it's quite a small number of addresses. So it turns out that the internet is pretty popular, and worldwide, the addresses have run out or are running out. There aren't enough addresses for all the devices that are actually participating or somehow connected to the internet. IPv6 will solve this. Let's see. Maybe we'll live to experience that. So what is a network then? There are different kinds of networks. I've written physical networks and logical or abstract networks. Physical network is cabling, right? If you have some kind of connection from your internet service provider, it goes to your wireless router, and if you have a LAN setup like in the Hack Center with a switch and lots of cabling, it cables to one cable to each computer, that's a physical network, and that's a tangible thing, right? That's something we can touch and we can modify with our hands and so on. But then there are also, and that's certainly one network type, and another equally valid network type is the logical network or as I also call it, the abstract network, which is defined only by the addresses used by some set of computers that are communicating together. So here's an example of an IP network that might be used with the wireless router that has the IP address up on top, right? There's sort of a pattern, right? The first three digits are the same, and that's the network address, and the very last part is zero with the slash 24, meaning the 24 first bits of the 32, so now it's technical maths and binary and sorry. But essentially the 24 means the first three numbers are always the same, and within this logical network, so within this group of computers or systems that can communicate with each other, only the very last digit will change. And as long as this is the case, we don't need a router yet. All these computers or all these systems, they can communicate directly with each other on the local network or on a Wi-Fi or whatever. And the slash 24 and with the 255, 255, 255, 0, that's just two different ways to express exactly the same thing. So where do these IP addresses come from and how, who has them and so on? So if we get a wireless router, then we have some IP addresses, but me and my friend, we both have the same, perhaps IP addresses, because we have a wireless router from the same supplier, right? This is a little bit of a special case. Those aren't internet IP addresses. They're used only very locally, so only in one home network, only in one company network perhaps. The public IP addresses are the ones that are on the outside of this wireless router that I got, and the wireless router typically only has one. Some internet providers give you a few, but it's very easy to have a lot more devices in your home or in your office than public IP addresses that you get from your internet provider. So the IP addresses, they're assigned to the internet providers or the other way around. Internet providers, they apply for some range of some number of IP addresses, and here in Europe there's an organization called RIPE in charge of allocating a block of IP addresses to the internet companies that are actively connecting to other internet companies and maybe are also your internet providers and mine. And RIPE, they of course have colleagues in different parts of the world, so I think there are four or five, maybe even six of the RIPE organizations, the regional network centers. They assign IP address blocks to the internet companies, and by internet company I don't only mean internet providers that we use at home and at work, but also really any larger company that has a service available on the internet. So all the streaming sites that you can imagine, all the most, well, several large just websites that are used every day will also have their own IP address range and will be active in finding different ways to connect to the internet providers so that the end users can have as good an experience as possible when they're visiting there or using their services. So I talked about the internet companies, they are trying to find good ways to connect to each other or to make it possible for users with one internet company to reach either users at another internet company or some service provided by some internet company, and that's the routing that's going on both in the wireless router at home, but just as well and even more so in all these routers on the internet that are handing packets back and forth. So starting with the wireless home router, it typically has one local network, at least it might have more, so I had a home router that had both the regular Wi-Fi network and I was also able to configure a guest network or a guest password. So that's actually two, it's Wi-Fi so it's not really so intuitive but those are two separate physical networks because if you're connected to one you can't communicate directly with the other network without a router. Now there's some chance that the wireless router will do this, will enable this communication but it's not for sure and it's not certain and in fact it's more likely that it won't work because this guest access, you're supposed to be able to give that to somebody who's just visiting and maybe you don't want them to access your printer or your storage cabinet or whatever so it's quite likely that this guest network doesn't get access to the main network. So two different networks, even though it's the same radio waves or the same air that's carrying the radio waves but the key property with a wireless or a home router is that it almost always only has a single internet connection so it has a single connection to some internet provider or in the direction of the internet. Typically that's the telco but in some cases there's even, especially in the US there's the situation where the telco or the internet provider is also a content service provider and that's a pretty bad situation in particular if you have no options, no choice. So we have the home router with a single connection towards the internet, to the internet provider. Let's compare that with the internet routers that are further out on the internet and operated by the many different internet companies. They will similarly have one or more local networks that belong to them the same way that the wireless network belongs to the home router. A wireless company, sorry, an internet company or an internet organization let's say like the CCC as well has some equipment, some servers with the events CCCDE server for example is part of the CCC slice of the internet and the router that's responsible for all of CCC's networks is responsible for also this IP segment where the web server is. Now the big difference here is that those internet routers that are further out on the internet than our home routers they typically connect to at least two but usually many more other internet routers. Exactly how is different in every location. There are some norms and some common topologies but this is... So the connections that exist are determined by peering agreements between the internet companies and their internet organizations they can of course have agreements with whoever so it's not so easy to tell beforehand what a particular organization how a particular organization will do peering. This is an interesting topic there are some more talks on this as well that I'm referring to later. At least one model is to have a site some data center somewhere where an internet exchange is running so this is an organization whose sole purpose is to enable many different internet internet companies or internet organizations to somehow make their way there and cables to this data center and all connect together and be able to exchange traffic between each other efficiently and maybe even at no cost. So that's an interesting topic because there are so many different business models for the peering agreements. So the internet exchange is one model there's a handful of them in Germany and that's about the scale of it private peering is of course possible too where organizations just have a direct connection between each other and okay so these connections they are then established somehow and how do the routers know where to send what and that's a good question this is managed by routing protocols BGP is one such application or some bird is one application and BGP is the protocol so there's some rules you can configure what to prefer what route to prefer but you can also just say I don't really care so much and just use whatever is available and of course this depends on how much you have to pay for traffic that you send which way if you have a really good peering agreement with another internet organization and you're able to send a lot of traffic there away then without having to pay very much extra or maybe anything at all then of course you're going to try to send as much traffic as possible that way alright so now we're getting we've looked at IP addresses and IP addresses we know some systems on the internet or connected to the internet all systems connected to the internet they have some IP address and if we know the IP address we can try to reach that system that's a bit unfortunate the first bullet point is UDP now we're talking about okay so on the postcard when we're writing stuff there we put the IP address because we know what system we want to reach but we want to send it some kind of message as well there are a few different ways to structure messages and these are the most common ones or the ones that make up almost all of the traffic on the internet so the first one is UDP it's quite like postcards so it's just a single message there's no context there's no connection between two different messages and there's also no guarantees about how this message will or this packet will perform on the network so if you send out a UDP packet it might arrive or it might not and you'll never know and that can seem a bit useless but actually it's quite good in many cases for example if you're doing real-time audio or video streaming UDP is a good choice because it's real-time information so if something is missing maybe there will be a glitch in the audio or there will be some glitch in the video but it's not so important to wait and delay the image to fix that glitch it's better to get the next image and just replace the image so just keep on going and for that UDP is a really good fit just send it along if it arrives it arrives most of the time it does arrive most of the time it works fine so sometimes a good choice the next point there is TCP so maybe you've heard the term TCP-IP and TCP-IP is exactly the so specifically it's the combination of this TCP that I'll get into in a second with the IP addressing with TCP and UDP they have the concept of a port so that's a second address you could compare that with let's say the IP address is the street name and the port is the house number on that particular street so it's a bit more precise you know it's that system but that system might offer many services if you want one specific one so for each of the common services that we use e-mail and web and Jabber and whatever there are typical port numbers that are allocated and always the same so that I don't have to guess or look up what it is so with TCP what are the properties in that that's more like a stream of letters that you have to go to the post office and acknowledge that you've received so the recipient of a TCP packet or a network packet with IP and TCP inside of it will always confirm reception to the sender so this allows this concept of a connection that I mentioned where both sides talking to each other are synchronized and know where the other party is in this communication or in this connection what data has been received and what has not yet been received so the packets, TCP packets can of course also get lost there's no guarantee with any network that it will always function correctly you can just pull the cable and it will not be possible to send any packets so TCP will recognize that oh so I sent some packets out but they haven't been confirmed they haven't been acknowledged okay I'll try again I'll send again a few times and it's usually adjustable how long TCP will be retrying to communicate and finally it will give up and say sorry that seems that this connection is broken it's not possible to communicate anymore over this path but if you're quick and you plug the cable back in then maybe everything will heal or the connection will just continue functioning just as if there was never an interruption because the network software is keeping track of what has been sent, what has been received and can recover from this loss of communication and the third one on the bottom is SCTP this is not quite so widespread but it's still a very powerful mix it's a lot younger than the other two so UDP and TCP, they're from those I'd like to say 70s and 80s yeah so quite old whereas SCTP is I think the standard was final or the first version of the standard came in 2000 so it's quite a lot younger this protocol but it's a powerful combination of properties from the older ones so you can have whereas TCP you just have a constant stream of text essentially or image or whatever content you're transferring with UDP you had this message that's on the postcard it's one postcard that you're sending that's the fixed message TCP doesn't have that concept it's just information all the time until the connection closes SCTP you can have a connection concept where both sides are aware of the communication status or the position in the communication but you will still be able to send messages like on the postcards or like the postcards so you have a fixed size piece of information that you want to transfer and you can send that as a unit whereas if you're only using TCP like we do on the web all the time you have to build a lot of stuff around or on top of TCP in order to achieve the same thing so if I want to transfer an image or when my browser wants to download an image there's quite a lot of extra work that has to go into making that possible with the regular TCP protocol that is being used for now so advantage, SCTP certainly it also has the retrive the reliable delivery if you want to and you can also use multi-homing so that's not so common yet as I said, typically the wireless home routers they only have a single internet connection but that might change we might in the future see several different kinds of internet connections that we're using and SCTP would be able to take advantage of that quite easily whereas the other ones cannot so SCTP can send the same information over several different connections and whatever comes first arrives first that the destination is accepted this is of course a bit wasteful but in some cases maybe it's not a problem so that's an exciting I think exciting new feature let's see what the future brings it seems that it seems that TCP is going away slowly but surely let's see what happens the some some companies they're providing systems where they want they want to control much more of how the software is using the network how the software is communicating on the network and the way that these systems are built cell phones, typically or smart phones it's not so easy to do that with either TCP or SCTP but it's quite easy to do it if they're using UDP so I think that's a big motivator for them to try to move away from TCP and use UDP even more let's see sorry so then we'll get into some applications now we've written on the postcard we've written IP addresses the system that we want to communicate with and we've chosen either UDP or TCP depending on what is most suitable actually it depends typically on the application so some applications require one or the other and a few applications can do either or the first thing I'd like to mention here is DNS I call it the phone book, the internet phone book but there's one big difference a phone book is something we get from one publisher the phone company typically or the POC here at congress and they've just collected or they know all the phone numbers and they send us the list with the names DNS is different in that anybody who has a name in the DNS in the domain name system so anybody can register a domain name and anybody who does that can publish some information there you can decide what you publish actually you can decide if you publish so let's say you have a thousand IP addresses you can decide if you want to publish names for all of those thousand you just maybe publish a few of them that are going to be interesting for other people to use and 90% of them are just internal internal systems so everybody gets to choose what they publish and everybody can publish also can run the infrastructure storing this information on their own so it's not that you have to send this in somewhere necessarily and they publish it for you you can actually do that on your own so it's decentralized very good still it's a super old protocol also from those days of from those early days of the internet and nobody was thinking about security and nobody well nobody had done a lot of attacks on whether it's on these protocols whether it be reliability attacks or just forgery attacks and so on that wasn't a concern because this was remember designed for companies working for the government so everybody was interested in collaborating and there were no bad actors the internet now is again quite different so some of these old protocols actually aren't so great anymore basic functionality of DNS or the phone book is to publish IP addresses but you can publish other things as well if you're interested in DNS there's a good talk about that later on I'll mention in a bit so the next application I want to talk about is SMTP or the next application simple mail transfer protocol this is what is used to deliver every single email in the world all the time all day long now one thing that's a bit interesting or quite interesting but also problematic I'd say about email and not SMTP per se perhaps but the scope of SMTP is that SMTP is used only to send email SMTP doesn't have anything to do with receiving email this means that there's a separate mechanism for receiving email and the way these two these two different protocols or mechanisms work end up putting the cost of email with the person receiving mail so I have to pay in order to either with information or with money to get an email address where I have some gigabytes of storage whereas people sending email they don't have to pay anything they just need an internet access and then they can send all the email they want all day long to every single possible address email address in the world and that's why we have a spam problem on the internet so this is that's a bug I... let's see if this can get fixed email is so tightly integrated into our everyday lives that I'm not sure but let's see that would be great so on the last one the last application protocol I want to mention is the HTTP hypertext transfer transfer protocol that's used for web you recognize it from the web browser URLs web pages used to be just hypertext so text with some links in them that's all they could do in the very beginning and I'd like to show an example of SMTP actually this... I have to do something about this because it's not so it's not so easy to read let's see I should have done that already sorry about that so this is an example of an email delivery this is all it takes to send an email on the internet and the left and the right on the edge there is so the arrow pointing that way right left is received from the email server so from the SMTP server and the arrow pointing this way is what we send to the email server when we want to send an email so if we connect to an email server for example mine it will say send us some text by the way TCP and we're using port 25 for SMTP so we get a stream of text going back and forth the server tells us 220 and it's name that's some kind of welcome code we say hello my name is laptop because I'm doing this from my laptop the mail server says okay good to meet you and then we say I want to send an email where the sender address is test at stillgib.se and if you're paying attention here the sender of the email gets to gets to say what the sender address is so this is why it's super easy for anyone to forge email from any sender address it's just part part of the message or part of the the part of the message server accepts the sender even though the sender might not even exist I tell the recipient this is for me the server says okay then I say here's the data for this email and the server says go on start sending me the contents and then I send send an email where the sender is and just some fake sender address whatever subject and some text and in the end I finish with a dot to say okay end of message and the mail the server says okay and then I say to the server I want to quit now I don't want to talk to you anymore the server says closing goodbye this is email on the network last example I've got a little bit of access over HTTP so this is even simpler I've simplified this even a little bit more if you try this yourself please do HTTP is also TCP and port 80 I tried talking to the events cccde web server and I told it same thing here arrows pointing this way so connection opens I send get slash and HTTP 1.0 because I want to get the main page and I'm saying I'm speaking HTTP version 1.0 and then I tell it okay I want to access this start page on the host name events cccde then I send it an empty empty line that's to say I request and then there comes the response comes back for the arrows going in the other direction where the web server says actually so what you're asking for it's not available here where you're asking for it you have to go somewhere else it's a redirect that's the 301 is a code for HTTP code for redirect and this contents that you're asking for this page it's been moved permanently it is HTTP s events cccde so I was using an IP and TCP connection with no encryption and that's why I can just type in the get and the host line but the web server tells me sorry I don't want to talk to you without encryption so you have to go to this HTTP s address instead thank you event cccde I like encryption that's good and thank you also to all the angels that make congress possible because without them and without you who are here who are angels that there wouldn't be any congress and also I want to say huge thank you to you in the audience for being curious and wanting to learn something new thank you very much Peter now we have some time left for Q&A so if you have questions please do line up at the microphones that you find here if you want to ask anything do we have a question from the internet no the internet is out of questions do I see anybody standing at any microphone please make yourself known if I overlook you any questions oh at microphone 5 please do ask your question you mentioned that you think SMTP has a kind of bug in the sense that you can just send an email and the responsibility is on the side of the receiver and so if you call it a bug it seems you have an easy solution I'm sorry but no I don't so there was I mean I wish that would be great it's not so easy to fix because it is it is a property of SMTP right and of the email system that we're using so there was a proposal long long time ago by somebody much smarter than me called internet mail 2000 where where actually the whole thing is switched around so that the sender has to store the message and the receiver can go and pick it up so there the cost is is placed on the sender and I think that would go a long way to solving the spam problem but it's also it's not compatible with the email software that we have today so I don't like there's it's not clear to me how we would be able to migrate in a in a good way unfortunately thank you do we have any other questions that does not seem to be the case so please give another warm round of applause to Peter Studeur thank you