 Surprisingly, it's about the crowd I expected. I mean, entry-level computer networking isn't the most exciting thing at this conference. So yeah, cool. But thank you all for coming anyways. But anyways, I am Jack Min Ong. I work at MoneyLion. MoneyLion's mission is to rewire the financial system to provide financial access and education to every consumer to help them build better financial habits. At MoneyLion, I work as an SRE, mainly on the AI DevOps side, and so mainly I work with data scientists, data analysts, ML engineers, and actually this talk actually comes from that. Because I'm not sure if you guys have worked with data scientists before, but a lot of the time these data scientists are not from computer science background. So sometimes they're from statistics, sometimes they're from civil engineering, and they can't find a job in engineering and they come to data science. So these people don't have foundational knowledge in networking, and I feel that this talk can benefit them with that. Okay, cool. Yeah. So this will be a talk on computer networking. I say computer networking because I told a friend before this that I was going to be giving a networking talk at an open-source conference, and he asked me how to talk to people, and I'm like, no, how to talk to computers. So yeah, this won't be about real networking, although maybe you could apply some of the things you learn here outside, but do that at your own risk. Okay, cool. So now we shall get started. Okay. So first, we'll start with reasons you might want to learn computer networking. There's a few reasons over there, hosting LAN parties, that's always really fun. One I get a lot, like being the computer science guy, usually when I'm hanging around with people who aren't from computer science, usually they always tell me like, oh, I want to be able to host a blog. So if you want to be able to host a blog, yeah, computer networking is also something you will need to be aware of, some of the fun stuff over there, and even just really simple things, like being able to transfer files between devices. If you own a bunch of Apple devices, then I guess this is trivial to you because you can just air drop around. But most people don't have a bunch of Apple devices, and also I don't think any servers run Apple, I don't think Apple sells any servers. So if you want to be able to exchange files, yeah, you would also need to know how to be able to access the files on the network. But these are like everyday reasons you would want to know networking. This is the open-source summit. I think most of you are developers, so why do developers care about networking? So it's because of this, I think most backend developers, you've probably seen some graphic like this sometime before in some other talk, and the basic idea is that in the old architectures, network complexity was relatively low because you would just have one monolithic application with all your logic, and the most amount of network interactions that you had to manage was like, maybe you had to talk to one database, maybe you had to talk to an authentication server, and so if you were a backend developer, you could get someone to set this up for you, and you would sort of be set for multiple months, you wouldn't need to know how to do networking yourself. But for the modern stack, I think most of you already know, we have the architectures have mostly moved towards distributed types of architectures. So things like microservices, CQRS, I'm not sure if anyone uses that, and so you can see that the network complexity has vastly increased. Now you have these services which are having to manage multiple connections to multiple databases to multiple services. Some of your databases suddenly became cues, and so for the modern developer, having a foundational knowledge of networking has become really important, and just knowing how to code is nowadays not really enough. Even if you're in the data science field, you're not exempt from needing to know about networking, because for those of you who work at companies who do data science in production, you may have noticed that part of your data science team has suddenly called themselves ML Ops, and so ML Ops is this observation that people who have been using machine learning and production have made, that when you're deploying a machine learning to production, most of your code is actually not machine learning code, and so that's represented over there with this really small black box with the ML code, and what you end up spending most of your time on is the surrounding infrastructure. So things like how do I serve my models? How do I monitor my models? How do I schedule to make sure that my features appear in my feature store so I don't serve things in like two seconds so I can serve them faster? And so all of these surrounding infrastructure, the common component that they have is all of them interact with each other using networks, and so even if you're from data science, having a foundational knowledge of networks is going to be important to you, and I foresee that in the future, if you want to work in this field, it is probably going to become more and more important. In fact, I'm wearing this Kubernetes shirt. Someone gave it to me as a joke because last time when I was in ML Ops, me and my manager, we had this joke that Kubernetes was this taboo buzzword that all the data scientists didn't like, and so if you showed up at the data science team wearing a Kubernetes shirt, they'll beat you up, and so as my parting gift, he gave me this shirt. So I guess he didn't like me very much. Okay. And so yes, what this tutorial will cover, so really simple with just six things, really simple things. So we'll be going over ports, then we'll be covering IP addresses, how to address devices, subnets, how to select certain parts of the network, network address translation, then we'll briefly cover DNS, and then we'll go to firewall. Despite this sort of being like the six topics that I want to introduce you guys here today, this is actually not the goal of my talk. Like the goal of my talk isn't actually to tell you guys a bunch of technical things because actually I think a lot of the time when people want to learn networking, like especially these data scientists, they very much realize that they need to know about networking in their jobs, and so they will try to pick up networking, they'll find the network calls or some blog, and these courses start at a very theoretical low level. So it starts off like, oh, this is the physical layer, this is the data link layer, this is what ethernet cable looks like, this is cat five, it does 10 gigabytes per second, but you probably don't need it because the internet's not that fast. Yeah, so, and then they get really bored, right? Because these are things that they will not use every day. And so today I will be covering these topics, but I will try to cover it from a more applied sort of view. So things you can do every day, and so hopefully this makes you feel that networking is easy and that you can do it too, and that it inspires you to go and learn the stuff. Okay, cool, so let's go. So today the application we'll be deploying is just to show people pictures of my cat. Sorry, cool, you'll always want to be able to host the cat website. And so, oh, if any of you want to follow along, there's a GitHub link over there for the prerequisites. Yes, the prerequisites are really simple. You literally just need Python, yeah. I think most data scientists already have Python, if you know data scientists, I think you might still have Python as well. That's literally all the requirements. If you actually go to that repository, there's actually no code. The repository is just pictures of my cat, because that's how simple our app is going to be today. Okay, so if I go there, okay. So I already have the repository cloned over here. So just go into the cat folder, and then we'll start by starting up the server with Python 3-M, hgb.server. Okay, and so now it says it's started the server on port 8000. And okay, cool, I still don't see cats. And so, how do I access this server? What exactly did it do, right? What's a port? Okay, and so, this is sort of what happened. So we started off the simple hgb server, and so then it makes a system call to the kernel and tell the kernel, hey, if anyone talks to you on port 8000, forward them to me. And so now if we're on the browser, and we try to go to the port 8000, so go to localhost. In the browser, if you want to reference a port, you just use colon, then 8000. And okay, cool. So we should be able to see pictures of cats now. Okay. Oh, this is the public one. Oh yeah, for the slides, if you guys open the repository, oh, I don't have the repository open. Let me copy it myself. So for the repository, there's actually two slide links. So one of them is static, where you guys have view only, and one of them is comments enabled. This is sort of a thing that I wanted to try out, like what would happen if you enabled comments on your slide. Another reason I did this is that there will be parts of this talk where I will actually lie. The reason for that is that when you try to introduce people to topics, you sometimes very much have to lie, right? Like for example, when you learn math in middle school, they tell you that, oh, for square root, it's this, if you have a square with this area, then square root is the length of this side. But then you get to college and they tell you that, oh, you can do square root of negative numbers. And then you're like, I don't own any negative squares. What does that mean? And then they tell you, oh, the reason is because square roots is actually defined in terms of rotations. We have this 2D number system, and then you're just mind blown, right? Then you feel that your middle school teacher lied to you. And the reason that she lied to you is because if they told you about complex numbers in middle school, you would not learn any amount of math. And so there will be portions of this talk in which it's not exactly that it's, like I'm not telling you things that don't work, but like I'm lying a little bit. Like it's a little bit more complex than that. But for the most part, for people who are doing applied, like if you're just a data scientist or you're just a front-end developer, you don't actually care too much about the internals and the nuance around that. And so if any of you who are really experienced in networking, you guys want to go comment and be more specific, go ahead, go to that slide. And anyone who wants to read that can go ahead. Okay, so let's continue. Briefing to do, where am I? I was over here, okay. And so just now we just called our service on local hosts at the port 8,000, yeah. And so these ports are the ways that your services can be assigned to the network interface. But when I call a server, I don't want to also have to ask it what port it's using, right? And so because of this, most services have this concept of a default port. And so that is illustrated here. So basically these are some very common services. Oh, sorry, I dropped the mic. Maybe I should put this slide. Okay, I'm not sure if people can hear me better or not, but hopefully it's fine. Okay, so yeah, so if you're using HTTP, which is the standard protocol for visiting websites, so whenever you open a website, you will make a HTTP call to it. Just now when we loaded this, this was a HTTP call. And so for that, the default port is port 80. For HTTPS, which is the secure version of HTTP, default port is 443. SSH is this, I think some of you may be familiar, is this way that you can remote access to a device. And DNS will cover it later, it's on port 53. And I think most data scientists know what Postgres is, yeah, that default port is 5432. And so the point of these default ports is that when I call the service, I don't also want to ask the port. And so the tools by default will use this port, yeah. And so if we want to be able to access our cats without having to specify 8,000, what we can do, so you can stop the server, and try to put it on port 80. But if I try and execute this command now, you will see I get a permission denied. The reason is that for the ports that are under 1024, they are privileged ports in Linux. Privileged ports means just means that you have to be root in order to be able to put a service that listens to these ports. This is a security feature that's built in because most of the protocols exist at the port levels under 1023. And so if you're talking to this server and it replies you, then you can be sure that whoever hosted that server is actually root. Because what could happen is maybe someone has been able to infiltrate the server, but they probably can't get root. And so if they try and host the server on any of these privileged ports, they will be denied. And so this is a safety feature that's built in. And so if I want to be able to host on port 80 on this device, I will have to be able to cast the sudo, sudo, whatever you're saying. Okay, so now we're serving the server on port 80, which means that if I now go to local host and I don't specify any port, I will still be able to reach the server. So cool, let's see more pictures of cats. Yeah, very cute. Okay, and so you could specify the port if you really wanted to. So if we go over here and we were wanted to be really specific and go to port 80, yeah, it's the same. And actually, you see, it even corrects you, right? It takes out the 80 because it was redundant. You didn't really need it. Okay, and so, okay, cool. But you're able to host the server now, right? But let's say I want to be able to host multiple servers. Like, let's say for whatever reason, the neighborhood sugar glider ends up in your house and your mom took pictures and she wants you to be able to host it. Okay, so let's try and host another server. Okay, so that's cool, let's go in. Okay, so I have a sugar glider folder as well. So let's go there. Okay, and so now if I try to do the Python, p.server at 80, okay? And I'll get an OS error, okay? I mean, this sort of makes sense, right? Because if I'm trying to host two services on the same port, like when I go to the service, which one is it supposed to host? And so this thing is called a port collision. Okay, that's all you talk about next. And so basically what happened is we tried to spawn the new HTTP server and it asked the same thing to the kernel, make the same system call, hey, if anyone talks to you on port 8000, fold them to me. But then the kernel realizes that, oh, I'm already listening for this previous service. So I'm not available on this port. And so we'll reply, nope, another process has already claimed that port. The HTTP server panics and it passes on the panic to us and gives us an OS error. But this is actually usually the best case scenario. Like if you get port collisions, at least you know, there are some services in which when you have a port collision, it does not tell you. And then it starts having very weird behavior. So actually one of them is Docker. So if any of you are going to be using Docker, especially if you use Docker on Mac, for me, I use Rancher. And I believe that if Rancher, if you try to map a service to the same port, it actually does not error. And the reason is that actually Docker does not open ports on your kernel. Docker actually just modifies the IP tables to do a forwarding. And so the IP tables does not have a check for port collisions. And so because of that, if you try to assign the same port to multiple containers, what will happen is that the first one will be served and then once you kill the first one, then the second one will start being served. And so sometimes people get a very weird behavior where they thought they updated the container, but they actually didn't. The reason is that they have a port collision and the operating system did not tell them about it. And so if you want to be able to serve the two servers without having this OS error, it's pretty simple, just go to another port, right? So let's say I choose like port one, two, three, four. Okay, for this one, I don't need to do anymore. Okay. And so now my this cat one still works. Did my reload work? Okay, I guess the reload is too fast. It doesn't actually do buffering. Okay, so see more cats. Okay. And so now when I go to local host at one, two, three, four, let's do the first image. Believe it's JPEG. Yeah, okay. So then now we are sugarcruisers. Okay, so now you're able to serve these two services as long as you map them to different ports. Okay, cool. But right now all I'm doing is like serving this server on my local host, right? Like that's not very fun. I want to be able to see my cat remotely. Like let's say my laptop's on the desk. Like sometimes I'm in the bed and I'm on the iPad and I want to be able to see my cat as well. So how would we do that? So how would we access the server from another device? And so for that, oh, by the way, if you guys have any questions, feel free to ask at any time. Just raise your hand and I'll try my best answer. Okay, so for this screen share from an iPad to a Linux laptop is a little bit hard. So I'll be doing a weird Google Meet meeting between my iPad and all my devices. So this will take some time to set up. So hopefully you guys are okay with that. Yeah, I probably should have set this up before the talk. Oh well, it's okay, we can take a little break. Okay, cool. Okay, so yeah, there's three versions of me now. So I'll try and share the screen from the iPad. Hope this works. I mean, I tried it out just now but you never know with these things. They'll suddenly not work. Okay, cool. So everyone can see my iPad now, okay? Okay, now we're on the browser. And so let's try the link that we did just now, right? So just now we saw the cats at, not this one. This one probably doesn't work anymore. So yeah, so you can see that if you close the server, obviously you can't reach the server anymore because our server has moved to port 80, yeah, cats. Okay, so now if we're on the iPad and we try to reach the same thing, so let's say we try to go to the local host and obviously it's going to say it's not going to be able to connect. And so why is that? Okay, so. And so the reason is that the local host actually refers to the device itself. And so when we were on the laptop, this was fine because the browser just called port 80 on itself. But when we're on the iPad, then it doesn't really work out, right? Because you call port 80 and there's no server there, right? So we need some way to be able to address the laptop from the iPad, okay? And so to do that, we use an IP address. So an IP address uniquely identifies a computer on the network. It consists of four AP numbers separated by dots. So for example, 192, 168, 100.1. Because it's eight bits, if the number is always between zero and 255. So if someone tells you that their IP is 900.1.1.1, they're probably lying. And so yeah, so there's also this binary representation of the IP address. So we represent them in octets. This will become important later, but at the moment it may seem a little bit arbitrary. But it's important to remember that this representation of the numbers being separated by dots just makes it easier for humans to read. When the computers read it, they very much read it in binary because it's easier for them. Okay, and so, oh, I haven't reached that part yet. Okay, so now we need to be able to find the IP address of this computer. So what we do, I'll go the GUI way. There's a CLI way you guys may have seen me try to do when I was setting up. But I'll just try to try to do it using the GUI. So it's more friendly for beginners. So I think most of the time, Wi-Fi settings should have it. So if I'm on here, settings, IPv4. No, it didn't tell me anything. Oh, okay, so cool. So my IP address is this 10.119.26.25. And so I will try to reach there. Okay, can I get that back? Okay, so let's just move, oh no, it's stuck. Okay, so we're here. Okay, so let's try and reach myself, okay? So now I'll go to 10.119.26.25. Okay, cool. And so now I can reach my server cross-device. So I'm on the iPad, I'm able to reach a server that is on the laptop. And so let's look at more pictures of cats. No, we already saw this one, six more cats. Okay, so yeah, that's cool. And so now that I can access the cats cross-device, now everyone can see my cat, right? So any of you want to try and reach the server, I'll be able to see you guys make the request there. Oh, I do see that some people have tried already. Yeah, they'll be able to see the cats. Yeah, actually this demo, I was supposed to start off using my own mobile hotspot, but then what I ended up doing is my hotspot for some reason can't connect using this laptop. And so I was forced to use the conference wifi. And so because we're on the conference wifi, you guys would actually be able to reach this address. But if I had been using my hotspot, you guys actually would not have been able to reach the server. The reason is that this address is not a public IP address. And so you would not have been able to reach it because it was not on your networks of that. Yeah, but unfortunately I can't show that because technical issues, I can't connect to my own device. But you guys can have fun looking at the cats. Also I'm pretty sure that this is a security vulnerability. Like I think if you're connected to public wifi, you're not supposed to be able to call into a device like this. You probably don't want to be supporting this use case because then hackers can just enumerate the IPs and be able to figure out what's on. And if they accidentally expose any servers, this is probably a security vulnerability. But yeah, that's cool. The public wifi, you can actually just, you can call your own device. Yeah, which doesn't happen very often. But I guess sometimes people don't configure their subnets to be super secure. Okay, so yeah, originally what was gonna happen is that you're not actually supposed to be to see the cats, so no cats, much sad. Okay, and so the reason is that there is this concept of subnet. And so subnets is pretty simple. Subnet is just a range of continuous IP. And the subnet defines the computer's local network. So where it can reach in the local. And what local network means is just that devices on the local network are separated by hubs and switches, but no routers. So if the connection between your device and our device requires a router, then you guys are not on the same local network. And so communications between devices on the same local network do not need a router. Yeah, and so hence, if I connect to the conference wifi, since all of us share the same router, you guys will actually be able to reach my laptop without having to go through a router. But for those of you who are following virtually, if you actually try to go to this address, you will actually probably see that you can't see it because you're not connected to the conference wifi. And so your connection to my laptop would require you to go through routers. Yeah, so how do the devices know about their subnet, right? And so there's this concept of a subnet mask. And so now this idea that I introduced before of the IP addresses being represented as binary becomes important because the subnet mask is basically, it's always ones on the left side and then zeros on the right. So you always go in that direction. You sometimes do go in the other direction, but you almost never do it. So I think it's like a legacy, they allowed you to subnet in weird ways. But for the most part, you always subnet from left to right. And so it's a 32-bit mask that defines which part of the IP represents your network and which part of the IP represents your host. And so yeah, the parts that are ones are the network portion of your IP address. And so you can see here our network mask is 255.255.255.0. And so the front 24-bits are one. And so this 192.168.100 will be part of our network. So if there's another device that's on 192.168.100.2 or .3, those will be part of our network. But for example, if you see an IP that's like 192.168.2.1, then that's not part of our network because the network portion of the IP address is not the same. And so packets with destination address, they're the same network portion, are connected directly. So they can access each other with local network. And if you aren't on the same local network, so that network portion is different than you would need a router. And so for this, what your device will do is they will try to send the packet to the default gateway. And it just sort of hopes that the default gateway knows how to reach that IP. We'll talk a little bit more about this later. Okay, there's also this concept of a sider notation. So writing out all these subnet masks, it's nice for computers, but most of the time if I'm a human reading this, I want like a shorthand to be able to write it, right? And so usually if you ever worked with cloud or anything, you notice that you never ever have to enter subnet masks. The reason is that there is this sider notation. So for sider notation, it's pretty simple. You just have your IP address, then a slash, and then the number of ones. So you can see here we are 24 ones. And so for this address, we will reach it at 192.168.100.1 slash 24. There's also this concept of sider ranges. So just now in this notation, the dot one is not the first address inside the subnet because the first address would be dot zero. That would still fit because if you imagine like, the range of addresses is there's 255 of them, right? So there's 192.168.100. Then this last octet, you can choose any number from zero to 255. But actually you can't assign zero to any of your devices. The reason is that zero is reserved. So zero is reserved as your network address. So whenever the devices want to be able to talk about an entire subnet and not just a particular device, they will always use the first address. And so because of that, you're not allowed to assign the first address to any subnet. So I mean to any device. And so just now that subnet of that device, if we wanted to talk about its network, we would use 192.168.100.0 slash 24 because that's the first address. Actually, the last address is also can't be assigned because that's used for broadcast. So if you're using slash 24, then you have eight bits of space. And so eight bits, you would imagine you had 255 IPs. You actually only have 253 to be able to assign the devices because there are two that are reserved. If you guys end up subnetting things on the cloud, some cloud providers will actually steal more of your IPs. So I believe if you're on AWS and you subnet, five of the IPs are reserved. Yeah, so if any of you get confused by that where you try to make a really small subnet and then you suddenly realize that, oh, my IPs are being stolen. What happened to my five IPs? Yeah, that's what happened. That sometimes the protocols, they will reserve certain IPs that you're not allowed to assign to any devices. Okay, and so there's also this concept of private subnets. So private subnets are subnet ranges that we reserve for private addressing. And so these are defined in a specification and it just makes it easier for everyone to understand when an IP is private. So whenever you read an IP that starts with 10, then you know that this IP is private. So it's probably not accessible on the internet. Actually not probably, it's not accessible on the internet. And so there's also another range, 172.16 to 31. This is a little bit of a weirder range. And then there's 192.168. anything. So as you can see, they're all from different sizes. So usually if you're on a home network where you don't have that many devices so you don't need to assign a very big subnet, usually you'll get 192.168. And if you're on enterprise levels or companies, usually they will try to assign into the 10 range. And then sometimes for certain services that use networks like for example Docker, Docker will try to assign to that 172 range. A decent amount of VPNs also use that 172 range. And so these aren't things that you need to memorize because you can always look up the ranges, right? But over time as you do more networking, you actually kind of memorize these ranges just because whenever you see an IP you want to immediately be able to know, can I access this on the internet or will I need to be on the local network in order to access it? And okay, so for those of you who are online you won't be able to see it. So yeah, what are your solutions? What can you do to be able to see the cats? First you can host your own server if you're following along at home. But if you want to be able to reach my specific cat server, then yeah, you'll have to come to the conference, right? Or if you're doing this at home, you'll have to invite your friend to come to your house. Another solution is you can plug a really long ethernet cable between you and your friend's computer. So any of you at home, if you have a really long internet cable, feel free to give it to the CCD, they can plug it to my laptop, then you'll be able to reach it. But actually it probably won't work. The reason is that if you had a really long internet cable, I think the signal actually won't be able to go so long. So you probably have to put switches in between. But I find it funnier if the idea that you'd get a really long internet cable to do this. The other solution you have is to use a VPN. So this is actually what VPNs do, right? Is that basically you guys are actually not on the same local network. And so it pretends you guys are on the same local network using network address translation. I'll talk a little bit more about that later. The other solution you can try is to get a public IP address. Because if I have a public IP address that is routable on the internet, then yeah, people on the internet can reach me and I won't need to do all this weird stuff. And so yeah, for us today, we're going to be doing that. So getting a public IP address. How do I do that? Pretty simple. Nowadays, yeah, you don't really need to own any service anymore, just use the cloud. So I already have an instance over here. For any of you who want to be able to spin up instances, I think all of these cloud providers, all of them had tutorials on how to do this. But for those of you who don't know, I guess it's a little bit hard to follow along, but it's fine. You can just watch me do it, okay? So let's refresh the status, should be up, okay? So we have this instance running. And so it has this IP address. So AWS is nice and I already told us that this is public. But if you remember the ranges, does not include any range within 54.172, 187, 247. So yeah, this will be perfectly addressable. And so, I'll just copy it from here, okay? So we need to be able to access our server. So let's SSH there, easy to user. The address should work. I set up the keys, okay. So let's pretend this is not here. I will delete it, okay? So once you have access to your server, you need to clone down the repository. So we'll just quickly do that. So I spell HTTP wrong, okay, github.com, slash my name, slash OSC, 2032. As you can see, actually the links for GitHub repositories is not that complicated. So a lot of the time you actually don't need to go to the browser to copy paste that. I see a lot of the time, whenever they want to clone repos, they always go to the browser to copy paste it. Actually the link's not that complicated. Literally, you just go to github.com, slash your user. If you're on organization, this will be slash your organization slash to your repository. If you're authenticating using git, you will do git at git.com, and then your user slash OSC, 2032, yeah. But I don't have the SSH authentication setup here, but yeah, if you have SSH authentication, you can see that link is not very complicated. Also, you can tell if people actually know about this, because if you copy from the browser, it adds an extra dot git. This is sort of cargo coding. You actually don't need a dot git. It works without the dot git. Yeah, which is pretty cool. Okay, so anyways, I have my server over there. I mean, I have my cat pictures. So let's go into the cats, and let's serve my server, hb.server80, and I need to do, okay, cool. So now we have the server running. We should be able to reach our server if we go to the public IP address. And yeah, okay, cool. So yeah, see more cats. Okay, let's continue. Yeah, so those of you who are online and not on the conference Wi-Fi, you should be able to see the cats now. So you just need to go to this public IP address, 54.172.187.247. Yeah, and you should be able to see the cats. Okay, and so actually this was kind of weird, right? Because I just said that if you had public IP addresses can be accessed on the internet and private addresses can't. But then if you can't route to a private address, how did the server reply? So it's like I send the server and access the server on the public, but then how does it reply me if I'm on private? So let's sort of explore that. So first we'll cover the simple case, it's the LAN case. So if you're on a local network, it's pretty simple. So you start off by trying to send a packet. So let's say currently I'm on this computer, this really cute computer, 192.168.2, I'll just refer to it as .2 from now on. So let's say .2 wants to reach .3. And so what it will do is it will craft this packet and says that, okay, it's from .2 and I want to be able to reach .3. What happens is it'll send out a packet, the switch will receive it and the switch will, okay, I know about these people, I'll forward it to them. Okay, then the server will receive it and basically just switch the from and to. So it knows about its reply, then now it's from .3 and I want to be able to send it to .2. So then it puts it onto the switch and then our computer is happy, it's able to receive the packet. So whatever you requested, it's able to receive the response. But let's see the internet case. So for the internet case, we started off from .2 and let's say we want to reach 1.1.1. For those of you who don't know, this is Cloudflare's DNS server, okay. So we do the same thing, we write out the packet and says that it's from .2. I want to reach the quadruple ones, okay. It reaches the router, goes onto the internet and reaches 1.1.1.1. This is fine, right, because the destination address 1.1.1 is public and so the internet will be able to route it. And then the server will look at it and try to do the same thing and so reverse it. So now it's from 1.1 to 192.168.2. I put that face over there with the server not being exactly satisfied because sometimes the server will actually drop this packet because the server will realize that you're asking it to reply on a private IP address. And so it knows that there's probably something wrong. So either you're trying to make a malicious attack and make it call something on its local network or yeah, you've made this mistake where you're actually asking it to route something that isn't routable on the internet. But let's say the server doesn't want to be bad, it doesn't want to drop the packet. So let's say it does this, right? So it just flips the destination address, sends it out. But the internet will look at it and see 192.168.2 is not on the internet. It's a private address. And so what it will do is it probably, none of the routers will ever be able to find that laptop and so the packet will just get dropped somewhere. And so it gets dropped and our computer is very sad. Okay. And so in order to solve this problem, your router actually does something called network address translation. So your router will change the source IP of the local packets to its own public IP because this makes sense, right? If it changes it to its own public IP, then now people can reply. And it will remember the port that it used and then when the packet replies at the port, then it will change it back so that it can reach back to the local. So let's try and go through an illustration of this. So we start with the same thing. We have this computer sending out the exact same packet. But now you can see that I've been more explicit with defining the router. So the router actually has two network addresses. There's a local one that exists inside the private subnet and then one that exists on the public. So let's say it's a quadtunes, right? Okay, so it sends it out. It reaches the router and the router realizes that this packet reached me. It's from the local network, but it's not intended for me. But I know that if I send it out without modifying it, it probably won't come back. And so what it will do is it will change the from. So before this from was from point two. And so now it changed it to its own public IP address. And then now it can send it on the internet. The server sees it and say, okay, cool. I can probably reply to this, flips it around and then it comes back. And so now when it comes back, it needs to translate it back because now this destination address is not the point two that originally sent the request. And so it will remember which port it used to make the request. And so if someone replies on that port, then it will change the address back. So we change the address back to point two and then our server is able to receive it. And so our server is really happy is able to see the content. And so yeah, so we're able to sort of show this over here. So you guys can go and access that public. Yeah, yeah, because I'm not on my mobile data, I can't actually show you that it's my public IP. But what you guys can do if you guys came in a pair is both of you try and access the public IP. I'll show it again. Where is it? Yeah, both of you try to access or you use two devices to try to access 54, 172, 187, 247. And you will see on this log that I'm showing over here that it will be the exact same address. So even though you're on different devices to the server, it will see the same address. The reason is that when it does the network address translation, it loses information about the local. So let's say this local network has another device, right? Let's say 0.4 is on it. If it went through this whole process to the DNS server 111, it would still see this coming from 222. And so because of this, there are some implications of doing network address translation. And so yeah, usually a router does this. And so one of the implications is that the server can't actually differentiate you from other users on your network. So if one of you who's connected to the conference Wi-Fi, one of you decides to spam Google, next time when we're connected to the conference Wi-Fi and we go on to Google, it will probably be challenged. Hello, can you hear me now? Okay, great. Yeah, yeah, yeah. Okay, actually for all the technical details of how network address translation works, like what your router does, changing address and stuff, actually all that is not really important. What's important is just to know that network address translation happens and what are the side effects of it happening. And so yeah, this is the important slide. So if you didn't really understand what I was talking about with the happy computers and all that, this is the thing you should remember, which is that because of network address translation, the server can't actually differentiate you from other users on your network. So if one of you over here who's using the conference Wi-Fi, you decide to go spam Google and Google blacklists us and starts challenging us, that will affect all of us, even though it was from your device. The reason is that all of us share the same router, all of us share the same public IP. And so to Google, we are all the same user. The other side effect of this is that devices with a private IP, so can request to the internet, as I've already shown, you're going to request the internet through the network address translation, but it can't listen. So if someone on the internet wants to talk to your device, they can't actually reach you because if they talk to the router, the router doesn't know to give it to you, right? Yeah. And so this is one of the side effects. Okay, so now we'll talk about DNS, the main name system. It's pretty simple in practice, but it gets complicated if you try and look into it. But we'll go for the simple route today. Okay, so DNS is just this basic idea that machines like numbers, they like all these IP addresses and stuff, but humans, we like words, right? We don't want to enter like 192, 168, blah, blah, blah, but we don't remember numbers, right? And so what the DNS does is DNS tries to keep a mapping of string to IP address. So basically we map the domain names to some certain IP address. And so it does that by storing a bunch of records. So eight records will store the IPv4 address and then there's also a name servers. This I'll talk about. Yeah, I'll try and talk about it now. So, okay, let's try and see how DNS works, right? So if you're on a, I think this works on Mac as well, any sort of UNIX based system. You can do a dig. So let's say I try and reach this domain or let's not do this. Okay, let's just do Google.com. Okay. And so, yeah, so you can see that it'll reply you with a bunch of A's. And so all of these are IP addresses in which you can reach Google. So let's say I try and copy this one. Let me go there. And there, see it even translates itself back to Google. Okay. And so that's what the A records are for. The name servers are because DNS lookups are recursive. And so because of that, it has to be able to pass through multiple name servers. Let's see if my trace gets blocked here. Oh, this one. Okay, yeah, it does. Oh, it works out, okay. Okay, cool. So the basic idea is that you have to go through multiple levels of the domain. So I think most of you know how to reach websites, right? You do like something blah, blah, blah, dot something, dot something, okay? And so all those dot somethings are because the DNS system is hierarchical. And so it's like a tree. And so first you start off at the root name servers. So it will try to reach the root name servers. And so let's say we were trying to reach Google.com. It will go to the root name server and ask them, oh, do you know about the com name servers? So then the com name servers will reply. Okay, then you ask the com name servers, do you know about Google? And then in this case, the Google name servers actually didn't know about Google. They gave you even more name servers to ask. So it says, oh, you can go ask these other name servers, okay? And then eventually you go down the hierarchy and eventually you do get to a server that actually knows your answer. And so that server will reply you, okay, here's the answer, and then your browser will go to it, okay? And so because of that, if you're ever configuring DNS yourself, it's also important for you to understand how name servers work, okay? And there are some other stuff over here that I'll briefly go through. So if you're ever using IPv6 addresses, then you will use the quadruple A record. So IPv6 is, just now I showed you guys IPv4. IPv4 is a 32-bit, but 32-bit isn't enough, right? Because 32-bits is about the length of an integer. And for those of you who know about overflows, then you know that the maximum number for the 32-bit is two billion, right? In this case, since it's not signed, I think it will be four billion, yeah. So the maximum amount of IP addresses, public IP addresses that you can ever sign is four billion. And so this isn't enough, right? Because like most of us, that we have more than one device these days, and yeah, there are more than four billion people in the world. So people have realized that at some point, we'll run out of IPv4 addresses. And so they just made it longer. And so they said that, okay, addresses are now 128-bits. And so that's four times longer. That's why the record has four times more A's, yeah. They're really funny with this. So they started with A, it's like, oh, it's four times longer. We'll make it four times longer on the DNS record as well. So yeah, that's the record for IPv6 address. MX is the mail service. So if someone's trying to mail you, like you try to send a email, or yeah, you send to someone on gmail.com, what the email server will do is it will ask the DNS for, hey, where are the email servers for gmail.com? And they will look for that at the MX record. CNAME is used for redirections. So sometimes you have multiple addresses for the same DNS record. Sometimes you do this to protect yourself from people spoofing your domain. So they might buy a domain that's really close to yours to try and scam your users. And so what you could do is you can CNAME them to the correct domain. But then you had to buy all the domains. There's also this freestyle record called TXT. And you can literally just store anything. So you can use this as a key value store. I think there's a joke blog I wrote before where you can do a global data store, just store everything on DNS. You probably not do it. DNS has its downside, I'll talk about later. And then there's also the SOA record, I think it's Statement of Authority, that basically just defines the zones. Okay, and so yeah, for this, yeah. So I already sort of briefly talked about this, but I guess I can go over it. So name servers are the servers that reply to DNS queries. And yeah, domains are hierarchical. And so when you try and register for a domain, there's, you will usually go to a registrar. And so what does the registrar do, right? And so the registrar basically helps you to register your domain's name server as an NS record in the top level domain server. So top level domains are things like COM. So usually you, unless you're in your, yeah. For most developers, you're probably not going to be assigning any name servers directly on the root level. Most likely you're going to do something in top level domain. So like I want my domain and then.com. So I will need to be able to tell the.com name servers where my name server is, yeah. And so the registrar's help you out with this. And basically if you register your domain with the registrar, your registrar just basically tells the com name servers that, hey, this guy bought this address. And so the name server is over here. Yep, and then since now you can point to your own name server, you then just need to manage the answers on this name server. Okay, I guess I can sort of show you. Yeah, I should have just started with this. Okay, so for here I'm using Cloudflare. Okay, so you can see I have some A records over here. So this points to the IP. So I have an unsafe one. So for this, these are subdomains. So when people reach here, they will already be at openmail.tech. And so over here I'm trying to send them to a subdomain. And so if they want to be able to reach that IP, they would have to do unsafe.openmail.tech. And so let's go for that, unsafe.openmail.tech. And yeah, so then you'll be able to reach the server. Sometimes when you do DNS, some DNS's like Cloudflare also provide proxy. And so yeah, I've also made a proxy version on safe. And so the difference is that if you go to the safe one, that it looks exactly the same, but what's happened is that now you get this lock. And so when previously when I was here, it has this warning not secure sometimes that scares people away from your website. And so if you want to make people trust your website, you would want to have SSL. And so actually what's the difference between here is that this is actually being called using HTTP. And this one's actually being called using HTTPS. Yeah, and so if you're lazy to set up your own SSL certificates and stuff, sometimes you can do it using the DNS because the DNS does the CDN and the CDN can help you sign the certificates. And so yeah, let's go. Okay, and so you could also use this to load balance, yeah, because you don't actually have to only put one A record. You can actually put multiple A records and then it will be the DNS server will reply with all of those IPs. And then your client will choose a random one to query. Actually this is sort of shown when I tried to query Google.com, right? So when Google.com replied, it didn't just reply with one server. It replied me with like six servers. The reason is for load balancing because if everyone calls your site and all of them call the same server, then your server is going to be overloaded. So you can actually do a load balancing if you have these multiple A records and then every client, they will basically choose a random one. And so using this you can distribute your workload out. But you should probably still have a gateway. The reason is that if you use DNS as your load balancer, there's no health check. So if one of these addresses, these servers are down, so out of these six, then one out of six of your users are not going to be able to load the site. So if they were lucky and the randomized chose a IP address that was up, then they will get a reply. But if they were unlucky, they would actually end up on an IP address that isn't alive and they will end up waiting. There's a time to leave and then after that the clients will try another IP. But that person will load that website really slowly because the first time that they query it, they actually query the server that is dead. And so if you use a service load balancer like NGINX or CON or something, these usually have a health check inside. And so if the server stopped responding, it won't redirect the users to that server. Whereas if you did it in DNS, if it dies, the DNS doesn't know about it. And so then, yeah, you don't get the nice benefits of doing a health check. Okay, and there's a lot of problems with DNS. So you can see that DNS is quite slow, right? Because you have to start off at the top level domain, like if I want to know about save.openmail.tech, I have to ask.tech. I don't know, I had to ask the root server first and then the root server tells me where it's.tech. And then tech tells me where it's.openmail. And then openmail tells me where it's safe and then I finally get the answer. So you can see there's a lot of back and forth. And so in order to make DNS queries faster, DNS queries will usually cache the values. And so then when, for example, if I'm looking for Google.com, if my DNS server has already seen Google.com, it won't start at the root server. It will just straight reply. And so caching makes it that the DNS is faster, but it also has problems because it makes it that the cache can be wrong, right? So like if you change your domain name, not domain name, but you change your IP address, it will take a while for the cache to propagate. So at the start, when users query the DNS server, they will be getting the wrong response until that cache expires. And so there's negative caching as well. So negative caching means that even negative results are cached. So if I try and query a domain that there is no IP assigned to it, I'll get the NX domain reply. So there's no IP address, even this is cached. So then afterwards, if I assign an IP address to it, I still have to wait for the cache to expire. So usually if you try and host a new server, you should not try to reach your server until your server is up. The reason is that if you don't try to query it, then it won't cache. But if you've made it cache, then yeah, you'll have to wait for the cache to expire. Sometimes there's also security vulnerabilities where people are able to poison the cache. And so the idea here is that on the global, if you ask the root name servers and went through everything, you would get the correct address for Google.com. But what sometimes people can do is they can find where your local DNS is and change that cache to their own malicious IP. And so you think you're reaching Google.com, but it's actually reaching a different IP address. And so this is called DNS poisoning. And yeah, because of all these weirdness around cache, usually whenever really big services are down, it's almost always DNS. And so yeah, it's become a joke that it's always DNS. Yeah, and so back when I was working as an intern in ML ops, there was one time, I think it's like, before Christmas, I really wanted to deploy this service and the application load balancer kept like not working. And I asked one of the SREs like, how do I debug this? Did I configure it wrong? And this was his answer. If your issue is DNS, just wait. And it was sort of true. I waited one hour later, it was fine. So yes, for all of you who have to deal with DNS, application load balance kind of thing, a lot of time you don't get that feedback as fast. When you change something, it can very much take you an entire hour to get your feedback. The worst case I think is two days because I think time to live for DNS, the highest time that can be set is two days. So yeah, the DNS is very hard to debug. Okay, so we've reached the last topic. And so the last topic will be on firewall rules. And so this is the idea of least privilege. So you want to expose as less ports on your server to as less subjects as possible. The reason is that if people have unnecessary access, then this increases your threat surface. So this was what I was talking about with the conference Wi-Fi. So we are able to access inter-device like this. You probably don't want to spoil your use case because most of us are connecting to conference Wi-Fi, we just want to reach the internet. And so by giving us this cross-device access, you expose an extra threat surface for everyone who's connected to the conference Wi-Fi because now you can call into devices. Yeah. And so you would want to be able to set firewall rules that try and limit the access that you don't need. So users are only able to access the things that they do the daily work. Yeah, if they try to do anything outside of that, it's either they've been hacked or they're trying to do something without permission of the teams who own the resources. Okay. For Linux, the way that you set firewalls is through this thing called IEP tables. If you're a normal developer, especially those who your company just uses the cloud, you probably don't need to notice. The reason is that clouds implement firewalls on top of the servers. So if you're on AWS, there's this concept of security groups. And so yeah, I can briefly show it here. So I can see my instance. There's the security, let's go to the security group. And yeah, so the firewall is implemented over here. So if you're on the cloud, firewall is something that is separate from the server. So you probably don't need to use IEP tables. Yeah, but if you're a person who's like self-hosting or you just want to be able to set like two layers of firewalls or something, then yeah, on the Linux server, you can do that using IEP tables. So I'll briefly cover IEP tables. So IEP tables, the concept is quite, you just need to remember like four base concepts. So I actually will start off with chains. So the idea is that when your kernel receives a packet, there are hooks inside its routing logic. And so it will go through all those hooks and make checks and then decide where it should go. So like, should I give it to a process? Should I send it out? Should I forward it and et cetera? And so these are chains. And these chains are collected into tables. And so each of these tables are collections of chains with similar concerns. So there's a table, there's a filter table and that collects all the chains around the filtering. There's a network address translation table that collects all the chains around network address translation. And then on top of these chains, what you do is you will define a policy. So the policy is the default behavior of the chain. So by default, when a packet comes in, what should it do? Should it accept it? Should it drop it? And then rules define specific behavior of the chain. So like for example, if your default behavior is that things get rejected, but you do want to be able to white list servers that you trust to be able to call you, then you put that in rules. And so rules because they're more specific, they have higher precedence than the policies. So if the rules allow it, but the policy denies it, then it will be allowed. But if the rule denies it, policy allows it, it will be denied. So the rules is always higher power than the policies. Okay, and so for the most part, I think unless you're trying to configure your own net, the only table that you actually care about is the filter table. So filter table has three chains. So you can see when a packet comes in, it goes into the forward input. And then when local processes are trying to call out, then they will go through the output chain. Also inside the filter table, again, unless you're configuring a router, the only two chains you care about is input and output. And so this is really nice with security groups on cloud providers, because the cloud providers literally only give you these two chains. Like you can only do input, do output. Yeah, no routing. There is routing if you have a transit gateway, but yeah, that's another topic. Okay, so let's try that out. Okay, so yeah, there's a few commands here. So we can try that out. Oh yeah, but I'm not connected on my device. Yeah, it's a bit hard. Yeah, so I guess you guys will have to sort of trust me. I'm unable to do this demo now. But IP tables is actually really simple. Oh, actually all these commands are kind of wrong because you have to be pseudo in order to be able to do IP tables. Yeah, so just depend, I mean prepend pseudo to all these. The other solution is to just give yourself root then you can cast all the IP tables. Yeah, and so if you want to be able to change a policy for a chain, so for example you want to make it that by default I deny all inputs, then what you would do is you would do IP tables, dash P, so dash P sends policy, and you'll put the name of the chain and then you'd put what happens to it. So here you can put drop or you can put accept. For dash A, this appends a rule to your chain. Yeah, even though I can't show you that it gets denied, I guess I can still show you that it appends. Okay, so let's do that. I thought it's kind of boring, it's just like telling people things and like this is sort of arbitrary. Okay, so let's split it out. Okay, so in order to be able to see the tables, you do, oh, I'll just give myself pseudo here. Okay, what was it? Okay, yeah, I want to be able to watch the IP tables. So let's say I do this, so IP tables, dash G, filter dash L, okay? And so now this is what my current IP tables looks like for the filter chain, so you can see the input forward output. Obviously if you have other daemons or daemons, I'm actually not sure how you pronounce that, but if you have other daemons, sometimes they also write chains to your tables. And so you can see if you have docker, you also have the docker chains over there. So like I said just now, of the docker port collision things, actually docker uses IP tables. So whenever you cast the port forwards in docker, you actually see them appear here. But anyways, for this demonstration, we're particularly interested in this input chain and also this output chain. Yeah, so let's try the first one. So right now our input accepts all, right? So anyone who tries to call this server, they'll be able to call it. Yeah, but what we can do is we can do IP tables dash P to our input chain and to make it drop it now. Okay, so you can see right now it's accept. I cast this again, like I told you, it needs to do the, okay? And so now you can see that the input chain, the policy definition has now changed to drop. And so if anyone tries to call this server, yeah, their packet will be dropped, the server will not reply. Yeah, you can also append chains. So let's say there's a specific server that you want to be able to allow. What you do is you do through the IP tables dash A. Probably stands for append, okay? So to your input and then let's say I only accept from source address. Let's say I choose 192.16. 8.10.1, something like that. And then for these, I will accept them. Okay, is it going to show up? I append it to the right one. I stop this. The IP tables broken. IP tables dash T filter dash. Yeah. Okay, yeah, I've been having this problem. I really don't know how to fix this. It works before, but yeah, it says, oh, okay, it just replies really slow. Something's wrong with the demon, maybe. Let's give me more space. Okay, so now you can see the rule. Yeah, I don't know why. It suddenly decides that it's going to reply really slow. But you can see that now you've added a rule over there. So previously there was no rules for the input chain. And so now you have a rule, accept. And so if this address calls you, so 192.168.10.1 calls you, they will be able to get a response because it won't get blocked. Because again, even though the policy is dropped, the rules have higher precedence. So this accept will be casted first. Okay, and let's say that we no longer like this IP address. So we want to deny them back again. And so you can delete things from the chain by using dash D, the name of the chain, and then the rule number. Okay, so this is on the first one. So what we'll do is we'll do through the IP tables, dash D to our input and delete the first one. And so it should disappear. It might take a while to load again. Oh, okay, so deleting is fine. Just adding takes a while. Yeah, I don't know why. Okay, so yeah, you can see that now we no longer have rules. So if anyone tries to call the server, all of them will be dropped. Okay, and if you try and delete indexes that don't exist, I think it complains. Yeah, it says the index of deletion is too big. And okay, cool. So yeah, we covered all six concepts. But like I said, even if you didn't really understand the technicals, the details of it, it doesn't really matter. What I really wanted to show here is to show you sort of an overview of networking and to show you that, yeah, if you wanted to be able to do it yourself, that you can do it too. And so hopefully you guys learned something and hopefully you guys had fun looking at cats. Yeah, so we'll go to a little bit of conclusion, okay? So yeah, like I said, yeah, I keep jumping ahead of myself because I don't remember what was in my slides. Yeah, maybe I should have practiced this. Okay, so yeah, most of the time you only need a high level understanding of networks in most cases. Like I said just now, with the network address translation, you pretty much don't ever need to know details of how network address translation is implemented, but you need to know that it happens because you need to know the implications of it because sometimes you'll be like, I'm on my local and I call out to things that's fine, like why can't people call me? Yeah, because you're behind a net and if you're behind a net, you can call out but people can't call you back in. Yeah, and also that some of the times, some of the demonstrations, the networks break and you'll notice that most of the times the networks actually break because the requirements change, yeah. Not sure about you, but from my experience, I think most of the time, whenever your networks break, it's usually not because people have infiltrated your network, it's usually because your requirements have changed. So 70% of networks are actually broken by product managers and 20% by developers maybe, and like 10% by actual hackers, yeah. And so having a foundational understanding of networks is important because then whenever the requirements change, you can very easily be able to see that something might break or that you might need to change the configuration to make sure that things don't start behaving weirdly. And also this demonstration is also somewhat to show that you don't need physical equipment to get started because a lot of time when you go to those networking courses, they start off with switches, ethernet cables and whatnot, but most of the time you just need like a cloud instance, a little bit of money to be able to spawn some of the stuff, spawn some VPCs. And yeah, you can try all the networking stuff. Even if you're on your local like I did just now, yeah, if you have a Wi-Fi, you can be able to test out networking just connecting to local device. And so most of the time you can very much do this without needing to have physical network equipment. Yeah, and so yeah, these are some of the resources you guys can go to. So that first one is a YouTube video, so it's IP tables demystified. So if you guys want to know more about IP tables, how to do the filtering, how to secure your servers, you guys can go there. There's also a network address translation one by the same YouTube. I think most people who are back-end developer or development already know who this is, who Sena said. He makes a lot of those back-end tutorials. They're really long, but it's usually because he has a lot to say. And there's also this magazine I quite like, Networking Act by Julia Evans. And so that one is sort of a bring-through of like basic networking concepts that you can go through and it's a magazine format. So maybe it's a little bit more fun. There's also this book, High Performance Browser Networking by Ilya Grigoryk. So this one is specific to front-end developers. Because when you're a front-end developer, you have a very different set of concerns from a back-end developer. And so most of the time you're thinking about, okay, making HTTP requests, how do I optimize for the TCP? How do I make it that I can load the site fast? Because I think the average metric, I think most people quote is that if your site loads longer than, it takes more than three seconds to load, your users are probably gone. And so yeah, if you're a front-end developer who wants to optimize for your website loading, then yeah, this is the book for you. High Performance, I think it's a free book as well. The High Performance Browser Networking by Ilya Grigoryk. I think most of the time when people are learning about networking, the hardest part for them is the IP addresses, subnets, and site and notation. Once you know that, the other stuff sort of click into place and it all works out. So I found a blog for you guys by DigitalOcean from Justin Ellingwood. Personally, I actually have not read this blog. Yeah, but I just wanted to include an extra resource in case any of you want to know more about IP addresses, subnets, and site and notation. And then the last book is the Linux Bible by Christopher Negas. This is some comprehensive overview of Linux administration. So this one has all sorts of things. So it's how the graphics stack works, how to configure servers, and it also has like, I think like three or so chapters on networking. So there's one about configuring networking and then one about like securing networks. Yeah, and so yeah, then that's the book and that's the linker. And yeah, that's the end of my talk. I hope you guys enjoyed the talk. Then see you guys around. Thank you. Hey, what do we do now? Yeah, I forgot to ask you this question, but I guess there's no question. Nothing happened. Everybody already understands this stuff, but that's good. If everyone knows this stuff, then that would be great. I wish everyone knew this stuff.