 Welcome to the Fossum 2020 Distributions Devereux. Our next talk is software distribution, new points of value in the censored world with Alexander Patrikov. Thank you. Let me start by talking a bit about myself. I am a freelancer. I work from home. Previously, I worked for a certain company as a software architect, and I'm giving this talk as a software architect. So the audience of this talk is people responsible for any kind of code ecosystems. It is not a secret anymore that programming language modules, operating system packages, and all other sorts of code are now distributed from the internet, not from the CD-ROM. And people create new such ecosystems, not every day, but close to it. And the purpose of this talk is to give some guidance to avoid a mistake that would result in this ecosystem eventually becoming only for US and Europe. So we also need to clear some legal stuff, technical opinions expressed in this talk on my own, political opinions are maybe, maybe not. And I don't represent any of the projects mentioned in this presentation. So let's start with a good interview question for a new developer. So what happens if you try to clone a Git repository or install an operating system package or something from NPM? What happens at the network level? The next question, what would go wrong? And finally, what actually went wrong during the recorded history? Let me also give the answer to the network part. So first, the client creates a DNS request to the ISPs DNS server. The DNS server does the name resolution magic by sending more packets to the authoritative name servers. Then once the client gets the reply, it initiates the TCP connection. Then more high-level protocols go on the wire, such as TLS or HTTP. And finally, the package is installed. You see lots of moving parts, so obviously lots of places that could go wrong. And let's first discuss why it is important. I'll bring an example from China. Five years ago, Xcode download, which is an Apple tool, that download was too slow or in some places even completely broken. And this forced developers in China to get Xcode from unofficial sources. And one of those unofficial sources replaced the original Xcode package with a modified one that injected malware into software that was built with that modified version of Xcode. So that's how a simple availability problem has evolved into a bigger security issue. Well, as I said, there are many moving parts. There are many trailer modes in the network, broken cables, overloaded networks, misconfigurations. The list is, of course, incomplete. But as we all know, internet usually works because resilience and redundancy are built into its infrastructure. And even more importantly, there are humans responsible for fixing whatever is broken. So now there is a new kind of network failure. Governments do not want their citizens to be able to see certain information. So they pass the laws that, say, this kind of information should not be accessible to citizens and access to websites containing that information. For example, information about drug abuse. So such sites should be blocked. So they create centralized lists of sites to be blocked. They distribute such lists to the internet service providers. The internet service providers block those sites. The problem is that governments want to restrict such information at all costs. So in Russia, it happens since 2012. Let's see which sites you will not be able to access. Or if you travel in the past, you will not be able to access. So you see, it's not only sites that contain information on drug abuse. It's also sites that distribute software. There are blogs. There are standards documents. There are bug trackers. There are no laws that prohibit citizens from seeing such information. Nevertheless, such sites are blocked. Well, on some ISPs, some of the sites are actually accessible. That's because different ISPs use different block technologies. So those sites on the previous slide are not the targets of their censorship. They are victims of, let me call that, technical overblocking. Let me explain this phenomenon. Why is this blocked? It is not technically possible to pass this through without also blocking that. So without also passing through that. And the government explicitly tells ISPs to block that. If you don't block that, you will get your ISP license revoked. So this is also unfortunately blocked. The problem is if this is part of your infrastructure. So how does this happen? ISPs typically block stuff by IP address because in our age when everything is encrypted, when with TLS 1.3, there is even encrypted SNI, they do not actually have much choice. So shared IP addresses, how does this happen? Mass hosting for static files. Some chips in a content delivery networks. The DOS protection services. There are many more examples where shared IP address is given to a customer. Finally, there is a telegram robot. Well, it's a subject for another talk, so I will not go into that. This is not specific to Russia and China. I can bring examples from Iran, from Egypt, and because how politicians work, this can only get worse. So how to deal with this breakage? Often an advice is given to use VPN, Tor or whatever other circumvention technology. However, I would not say that it is a politically acceptable answer because there are people who simply cannot be convinced to use any censorship circumvention technology. Maybe because of propaganda that only bad guys use such tools. Maybe because it is actually illegal in some places. I would also say that it is not a good, a technically good answer. If you have the situation when your servers are blocked, then you have a point of failure in your infrastructure. And in some cases, it is actually easy to fix. So for technical domains, mirrors help. Mirrors are used by many Linux distributions. They were not designed for dealing with censorship. They were created to distribute the load, to move the load away from the central server, to make sure that the user downloads packages from a mirror which is near him, which is usually faster. So they provide the needed redundancy and how does it look like? So in the installer of, for example, Debian, there are screens where you can choose the country where your mirror resides. Then you are presented with a list of mirrors in that country. There is also an option to enter the address of your own mirror, which can be unofficial. In Fedora, they went even further. They do auto-detection of the fastest mirror by default, which creates a really great user experience. So what could go wrong in this setup with mirrors? So remember the slides where I listed the moving parts. They are still there. Still everything can go wrong with any of those parts. But it only affects the selected mirror. This mirror is not the target for the sensor. So actually there is one official Debian mirror right now blocked in Russia, the Spanish mirror. So why, because I don't know, I can look it up. So still it is not a problem because there are more than 300 other mirrors, Debian is still installable in Russia, so there is no single point of failure in the whole ecosystem. That's good. That's, I would say, a perfect solution, a perfect situation. But recently another solution to the original task of the Debian mirror. Of making sure that the load is spread on multiple servers and the user downloads from a nearby server became popular. Content delivery networks. It's a network of mirrors run by someone else. So I will describe how this is different from the classical setup with mirrors. I will use NPM public registry as an example. So let me first describe the apparent CDN benefits. There is a single domain name behind the whole mirror network. So there is no need for the user to select the mirror manually, which is a great boost in usability. Also there is no need to design the security of your system with untrusted mirror operators in mind because all the mirror servers are operated by a single legal entity. They can even share the same SSL certificate, which is also great from the operational viewpoint. So let's see how it works. So if I try to install an NPM package, then NPM client resolves registry.npmjs.org, which is the default registry. Then it downloads the package metadata over HTTPS. Then it downloads the package and installs it, done. Let's see how it looks like in the network. So registry.npmjs.org has, last time I checked, there was 12 A records, which are for IPv4 addresses, and there are 12 AAA records, which are for IPv6. So the IP addresses belong to Cloudflare, which is a major CDN provider. Cloudflare uses anycast, so each of those 12 IP addresses actually are hosted on multiple servers, directly distributed. And normal internet routing, such as BGP mechanism, ensures that the user really gets to the nearest server and downloads the package from there. So how does this survive censorship? NPM is not blocked in Russia, so I had to simulate it by misconfiguring my router, to return TCP reset packets to half of those mirrors. And result, it was possible to install packages. It was slow because of their tries, because of the delay between their tries, but nothing broke. That's great, especially for a system that was not designed for this use case of circumventing censorship in mind. So why is it slow? Because, as I said, it was for a different use case. It was for a use case of overloaded server or overloaded network, where adding a delay between your tries does help. Also, it helps that I blocked the servers with a simple TCP reset. Not all censorship does that. There are also cases when they helpfully try to present a page which says this site is blocked, and they present it using an invalid as a self-tificate. So if I try to do that with an invalid as a self-tificate, then, of course, NPM will fail to download and install packages. This is, in theory, fixable by changing NPM code. I'm not asking the NPM maintainers to do that, because it is, well, for a different use case. Still, this example demonstrates that the client-side failover, as implemented in NPM, does a great job of circumventing censorship. But let's also highlight one more important difference between traditional mirror setup and the CDN. Let's go to China. Actually, the inaccessible registry is a common problem in China. If you go to ping.pe, you can ping the registry.npm.org server from many places, including China. And you will see that in many cases, there are many lost packets. TCP is not designed to deal with that, and so the download fails. So how do Chinese users use NPM? The answer is that they don't. There are alternative NPM registries in China. They claim to mirror the official one, so that two registers are on the slide. However, they are not exact mirrors. In particular, they strip the whole integrity checking JSON elements that are in the registry API. So packages installed from there cannot be trusted. Still, Chinese users use that, so it's an incident waiting to happen. I hope that somebody from Taobao, or people from CNPMJS, is listening to this talk over the internet. Could you please fix it? Thank you. Okay, so we have looked at NPM. There is another service that uses a content delivery network, and this is Flatpak. I will use that to demonstrate that not all content delivery networks are equal, and you should really evaluate the setup. So Flatpaks, I usually download it from FlatHub. FlatHub uses Fastly as a CDN, and Fastly operates as CDN using a CNAME, so dl.flathub.org is a CNAME for some shared DNS name in the Fastly.net namespace. And that long name results in one IPv4 and one IPv6 address. Those addresses are different for different clients, so that's how they do the geographical spreading thingy. So they are relying on DNS, not on any custom routing. So for the original purpose of spreading the load, that's a valid solution. But for the case when some of the infrastructure can fall victim of a sensor, there are simply too many single points of failure here. So there is no possibility for clients that failover. If the government by accident blocks dl.flathub.org or that long name in Fastly.net namespace, or that single IPv4 address that is returned to my client, then I can no longer download packages from FlatHub. So don't do that. It is too easy to block such setup by accident. And this also applies to failures not caused by the governments. So think about it too. So the takeaway from my talk would be, if you want to implement counter measures against accidental blocking in your software ecosystem, then please add proper redundancy. Please implement client-side failover because it is only the client who sees the ultimate truth whether the server works or not. Then it would be great if you allow unofficial mirrors in your ecosystem because well, that's what happened with NPM. CNPMJS is an unofficial mirror even though NPM does not want mirrors. And because you have to allow unofficial mirrors, you have to design the security model with them in mind. So that's all for me. Are there any questions? Pepe, please. As a service provider, how can one check if I am blocked elsewhere? There is no way to do that. You have to rely on reports from users. How can they talk to me? So they can still email you because for example, the servers distribution packages and the email servers are usually not the same. Other questions? Is there any difference in NPM packages served by the Chinese NPM and the main NPM? Has anybody checked that? So I haven't checked that. Chinese users use it. So I think it's a good idea to test that but because of the quite complex API where each package has its own API endpoint, it would be a quite difficult task. Well, my only point when I worked for a Chinese company, I told them explicitly not to do that and installed Tor on their server and told them to use Torsox NPM and install something. Other questions? No questions. So we finished five minutes earlier.