 It works. So good morning, everyone. My name is Anurag Bhatia. I work at Hurricane Electric. Today I'll be talking about the routing security and quick introduction on the talk. So I'll give a brief background. Hurricane Electric operates a global IP backbone. So we operate a network in around 40 countries as per the last count. And a large part of this talks comes from our experiences of route filtering, implementing large-scale route filtering as well as the challenges associated with it. So to start with, you may ask yourself a question on when you are communicating between machines, when you are communicating, say, with your server, can you be sure at this point of time that the IP you are communicating with is actually the IP which belongs to your server? The stock will try to answer that and then cover that as we proceed with the talk. So I have divided the talk in four brief parts. So one is the fundamentals of the routing just to give a brief understanding of how the global routing is working before we get to the challenge of filtering. Then is the part with IRR, which is internet routing registry, then some current statistics around it, and the future of routing security. So fundamentals of global routing, internet as we all know is network of networks. They are usually taught, you know, this in school and colleges, excluding the fact that how exactly these networks are stitched together. So the logic here is all these networks are autonomous in themselves. So they have an autonomous system number, an autonomous network and it can have, it's on specific routing policy. So as of now, there are around 65,000 autonomous networks in the world and these are known by their AS numbers. So the 65,000 is the number in IPV4 world and then you have around 17,000 autonomous network in IPV6 world. Any autonomous network around you, well most of ISPs you see around you are the autonomous networks, say for Airtel or ACT or BSNL or most of the content networks including Google, Facebook. So they have control on their own routing policy, they can interconnect as per their own requirement. How these are stitched together, how you make sure that you are able to reach any location in the world, that's based on the fact that you have something called transit-free zone, which is roughly a list of around 15 ISPs and I had to use the term because it's quite abused by marketing guys, it's pure one networks. So these networks are the ones which peer with each other. So around 15 networks, they ensure that they peer with remaining 14 networks, they maintain reasonable circuits with them and all other networks in the world are essentially their customers or whether they could be a direct customer or they could be indirect customer. So when you say 65,000 autonomous networks, all of them are on direct or indirect customer of either one or more of these networks. So that's how you ensure reachability across the internet. You can find the list of autonomous networks, I think the Wikipedia has a list that is reasonably decent list, although what it signifies is the networks who do not buy IP transit, there could be separate arrangements across them and some of the networks you may be interested in. So from India, it's data communications and there's entity from Japan, you have AT&T and Verizon from US. So it's usually super large networks who have submarine assets as well as who are connecting multiple continents. It does not play that significant role in present traffic because a large part of modern traffic that flows from limited set of ASNs which are mostly the content networks towards the eyeballs. So a large part of traffic does not really go via tier one or it may be going via tier one but then there's a cash fill happening which is adding the effect. So they play an important role only on the stitching side of the internet, not essentially on the traffic side. Just an example, you may have AS1, 2, 3 connected, all of them speaking BGP which is border gate with protocol to communicate with each other. So they exchange routing tables dynamically and that's how it works. Another way to understand the global routing can be in this room, let's imagine we have a complete blackout and you want to map who is sitting where. Very simple and good practice can be, we just pass on a paper, everyone writes down what's the name of that person and who is sitting on their either side. So you may say my name is A and B is sitting on my right and the same thing is done by B, same thing is done by C and so on. Within a minute you'd have a table of who is sitting where. That's essentially how internet is working at this point of time and it works well as long as no one is faking who they are. So beyond the BGP part, the other thing is of course DNS so DNS ensures the translation. So on the core side, DNS is essentially relying on 13 root DNS servers. These are run by 12 organizations because one of them is running multiple servers and most of these organizations are there because of historical reasons and this is how you start in the DNS chain. So there are around 980 plus instances which are anycasted of these 13 root servers. So 13 is just a logical number, real number of servers is much higher. In India you'll find server in Delhi, Mumbai and Chennai with Nixie. So you have K-root in Delhi, you have F-root in Chennai, I-root in Mumbai. Plus there are a couple of other root instances as well. So there's one in Kolkata, if I remember correctly. There's another one in Chennai separately outside Nixie. So there is reasonable resilience in that infrastructure. And now again, these 13 root server IPs are hard coded in the DNS resolution software. So you need to make sure that these IPs are the ones who you're speaking to. If they are not the ones, then you have a problem. So how does trust in BGP works? And I'll get back again on the same example which I used for this room. If everyone writes who they are and who is sitting on their side, it works really well. But as soon as anyone of the person fakes, it becomes a problem. So in case of BGP you have options that's again on the fundamental side of things, you have options around filtering. So networks can filter what they learn, what they do not learn. The filtering can be based on an IP prefix. So you may say I want to learn a specific IP only from one of the peer. I do not want to learn anything else. Or I do not want to learn a specific thing and learn everything else. They can be based on AS number which is autonomous system number, as you just mentioned. And it can also be based on AS path. And with respect to filtering, it's much easier when it is implemented near to the edge. It becomes a problem as soon as you move towards the core of the internet. And you'll get an idea as we proceed with the slides on what I mean to say when you move beyond the edge. Let's take this example. You have two networks. These have AS1 which is the provider and you have blue AS2 which is the customer. Now AS2 may say I have this prefix, I have this slash 24, please route it. Please accept the announcement. AS2 would set up a BGP session with AS1, would start announcing it. Here on the AS1 side, all you need is just a simple policy. Just make sure you allow this prefix, right? It works. It becomes a problem when AS2 adds another customer, say AS3, the green customer. Now what happens to AS1? It also needs to add the customer information for AS3, right? So it quickly adds up on the internet scale. It becomes a trickier problem to solve. So if I am filtering a small network operating, say in Bangalore only with a couple of prefixes, it's very easy. But what happens if I'm trying to filter, say, someone like Airtel or someone like Tata who have multiple, multiple customers behind them? Here's some data on the AS path length. So Mr. Jeff Huston from APNIC, he did a research and he found that average AS path length is around 5.7 across the internet. What that means is on an average, you'll see as many AS paths as 5.7 ASNs away. And that's where you get the challenge. So when you have AS1, 2, 3, 4, and 5 along the chain, it's very hard now for AS1 on the corner to find what to accept from AS5. How can you have a system where, say, AS5 is communicating reliably and in some digital fashion to AS1 what it wants to do and make sure that filters are updated? So this brings up to the part on how to make sure that the route filtering works at the internet scale. And that's what brings us to the current setup, which is IRR, internet routing registries. I'll first give a brief introduction of IRR and then we'll discuss about the statistics of how is the situation with IRR at this point of time. So quick introduction. These are essentially the registers and when I say register, this is something which surprises a lot of people. Internet is still in a way on the BGP layer, acting, working as tins and cans, hanging together and just communicating. So what you have is these registers where you will say what you want to do and then you do it. There's no way for these register to authenticate you in most of the cases. And then anyone can actually look at these registers and try to verify what you are doing is actually either matching or not matching these registers. So IRR use something called RPSL, which is Routing Policy Specific Language and you define route objects here. So you say this is the route object, it contains your prefix, it contains your AS number, it contains some other information like description, et cetera. It also uses something called AS sets, which as the name describes, a set of ASNs and you can define which set of AS numbers are your customers. So that's the brief idea about the IRR. I'll show with an example. So I'm jumping the slide, sorry for that. So here you have, we are querying one of the IRR. So we are querying RADB for one of the prefixes which belongs to us. So what you get in reply is the prefix, the origin AS, you also get a description which can be just anything. You have certain other fields, including the notify, changed, maintained, et cetera. And these are the route objects which are visible in public for various networks to use digitally to generate filters. Here's an example of AS set. This is example for ACT, which is a local provider here. So we are querying again RADB for the AS-ACT. You see the member ASN. So ACT saying AS 5577, AS 131269 and so on. All these are its customers. You can also find that it's maintained by maintainer which is Beam Telecom, which I think is the older name or probably some acquisition happened in between. So that's how the AS sets are. Who runs these routing registries? Again, this comes from the historical part. So there are around 25 IRRs at this point of time. A lot of them are already dead. They are still, the database is still there, but they are not changed from almost years. And what stitches them together is just a vague syncing which is there with RADB. So RADB comes from Merit Network and that's a non-for-profit IRR. And RADB mirrors all these IRRs remaining 24 and makes that data available. So that's what brings these IRRs in picture. So you may go and run IRR on your own. The daemon is available. It's an open-source package called IRRD. So you may pick it from GitHub and you may run on your own. If you can convince RADB to mirror your database, then you can practically have an IRR which most of the networks would be using. RADB is more popular because it mirrors everyone. So it's easier for various networks to rely on RADB instead of querying all 25 of them. Besides RADB, entity is also mirroring them. So there are IRRs from large networks like entity, level three, Bell, Canada, Telstra, which got to reach and so on. So most of these assets you'll find are the transitive assets which they got as the companies went with acquisitions and so on. It's more popular right now for networks to use IRRs which are run by RIRs, regional internet registries like APNIC or EDIN. So in this region, APNIC handles these resources. So if you need IP address or AS number, you go to APNIC. So besides running the registry, these operators also run IRRs and it's common to use them because they know who they allocate an IP to. So APNIC knows that IP belongs to you. So they can give you access to create route objects and that's actually one of the reasons RIRs run IRRs are quite clean and secure. One can define which IRR one is using using the syntax. So you define RADB or APNIC, colon, colon and then the asset which you are using. Here's another example you're querying to RADB for one of the prefix. So in the bottom, you'll see the source and I think should be visible all the way to the back. So it shows you. So in this scenario, Airtel is using APNIC for maintaining its route objects. How do you generate filters with the IRR? So you have a tool called BGP Q3. There are multiple tools because these databases are open. So you can write a tool on your own. BGP Q3 is one of the popular open source tools. So you can use it to generate syntax for Cisco, Juniper. Plus it also supports a custom output like JSON output as well as, you can pretty much define any syntax you want to put for the hardware which you are using. It also supports adding the AS path in the filters and the support for IPv6 as well. Here's the BGP Q3 in action. So I queried my AS number which I use for R&D purposes for IPv6. So it just generates the filters. It tells you that these four prefixes are registered in IRR and it helps you in generating the Cisco or Juniper-like syntaxes for them. What BGP Q3 is doing in the back end is just formatting. So you may query an IRR like, here we are querying RADB. I'm just doing the same query. This time I'm just querying RADB with a specific syntax. So I use exclamatory mark six to define this IPv6 and then the AS58901. It replies you back with the prefixes. It's quite old and in a way ugly and we'll get to that in a few slides. That is also one of the reasons why there are challenges with that. So how well IRR-based filtering is working? As now we understand that IRR is in place, is internet secure with respect to the fundamentals of BGP routing? Can you be sure that the server you're communicating with the IP actually is reaching your server and IP is not being hijacked or the traffic is actually flowing through the expected path and not from some unexpected country? And the answer is it's not so well. So what we did was we looked at all the entire routing table and tried to match it against the IRR to find how clean the routing table is. And this brings us to the present statistics for the current setup. And this is the present state. So there are around 7.5 lakh prefixes visible in the routing table. This includes V4 and V6. It keeps growing, so number would be outdated very soon. Out of these, around 79.54% of routes have a valid route object. So when I say valid route object, I mean that it has a route object first and second it has a route object which matches the ASN. So it's a case where provider says this is my prefix and this is the autonomous system number I want to use to announce this prefix and it matches that. Around 7.73% which is almost 58,000 routes do not have a route object. And you may ask at this point of time how they are even visible in the routing table which is the challenge. It has become like a slump problem it's there and there's limited things people can do about it at this point of time. Around 96,000 prefixes which is 12.73% of the routing table has a mismatching route object. So these could be the cases where these are genuinely the cases where someone just migrated a resource from a company one to company two or someone might be running different organizations. It's also common for large telcos to have different ASNs for different parts of their network. So they may have an AS number for their 4G network, they may have a separate AS number for their fixed line network. So they might be migrating those resources and missing to update the records. So again that adds to the junk plus there can be real cases of hijack where someone is intentionally doing it. This reminds me of an interesting story where once a senior engineer from Yahoo told me that they were finding somehow their mail delivery was getting affected. They were having a large mail queue being generated for just a few minutes, just five to 10 minutes every day at random times. And when they started digging further into it they eventually found that a couple of spammers were just hijacking their IP addresses. They were just announcing their private exchanges so that they can have the TCP session to work to start transmitting spam mails. So right now if you implement a strict IRR based filter across the internet, you're going to black hole almost 1.5 lakh or 20% of the routing table, which is a problem. You don't want to do that because so the IPs to whom these problems belong to, it could be those operators directly responsible, it could be their customer side of things. So you don't want to end up in penalizing the customer by just blackholing them, which adds to the problem that you cannot fix it completely with a single click. Here's some of the statistics specifically for India. So I'm not sure if you'd be able to see the slides on the screen, but you can later download the slides and see. I'll just quickly read them. So for Airtel it's around 1.7% of the routing table which has no route objects whatsoever and then 7.5% mismatch. And this includes the route which Airtel is originating on its own as well as its customers, which includes a large number of Indian networks. Same thing with Siffy, with Siffy it's around 1.4% mismatch, with Vodafone it's 4.2% and so on, with Tata it's 7.8%, Geo 0.5%. With Geo I think it's relatively clean because they don't sell wholesale services yet. So they don't have customers who run BGP behind their network. And that's one of the reasons it's relatively clean on the IRR side of things. A historical network like Tata, which acquired VSNL has more of this legacy problem where they would be routing lots of customers over the time who may have mismatching or missing route objects. So this brings us to the question, is there a real world problem here? Does it really matter? Oh these are just the statistics. So these statistics are visible. You can see that a large part of internet, almost one fifth of the routing table has the problems with the route objects. Does it really matter? Oh it's just a plain stats. So the answer to that is it's actually really bad and you might be following the cases of various route leaks as well as route hijacks which are happening. And before I get to the route leaks and hijacks I want to briefly introduce both the concepts. So route hijack is a case where someone hijacks your prefix. So if you have an AS number, you have an IP prefix which you are using say in your data center or your ISP network, someone else goes and just starts announcing that from its on AS number. And that's a problem. So someone may be doing that far away from you and it may not be that easy to detect. And you may not be having clean route objects or even if you have clean route objects the upstream provider may not be filtering the hijacker so the problem can go on. In the other scenario is a case of a route leak which is that the routes which someone is learning from you instead of just keeping it to their network or instead of keeping in the expected manner they start leaking it to someone else. So you may be operating a data center here. You peer with another network, another local network in Bangalore. You expect the local network to keep your route within Bangalore while the other provider starts leaking it to its upstream and it further starts going out and you have this problem. So here are some of the few route leak cases and trust me I cannot go on the individual route leak cases will take like ages. There have been probably a few hundreds so far. Some of the popular ones are so this issue got a good coverage back in 2008 which was YouTube route hijack by PTCL Pakistan. What they did was at that point of time PTCL was given orders to block YouTube and back then YouTube used to originate slash 23 for their prefixes. So Google was originating slash 23. PTCL put a black hall route for slash 24 on their router which is bad but still okay. They have to follow the regulation but instead of keeping it restricted to their network they started originating it from their AS number to their upstream and their upstream wasn't filtering them. So their upstream back then was PCCW Global which was not filtering. So PCCW Global carried that slash 23 announcement far away all the way to major exchange points as well as major transit points in Europe and US and the leak was heavily visible. So instead of dropping traffic for YouTube only in Pakistan there was a global outage for YouTube almost for half an hour until Google realized that their slash 23 has been hijacked and someone else is announcing a 24. The effect of this scenario was even worst because in routing you follow more specific route. So if a router is learning two routes it's learning a slash 23 less specific and slash 24 more specific it will always follow a more specific. So PTCL was announcing a more specific as soon as anyone was sending them traffic they were just dumping it. So by the time Google realized it they started originating a 24 as well to make sure that now whoever is directly paired with Google or whoever has a short AS path to Google at least gets to Google but the problem I think went on for I think four or five hours as per the documentation. And then there was another case of Google's route leak by Airtel actually it's not exactly Airtel but Airtel was part of it. So it was the original leak came from Hathaway. Hathaway was paired with Google Hathaway is paired with Google in India and Hathaway was learning their routes. The arrangement in these pairings is often that Google is not buying from Hathaway neither Hathaway is buying from Google. So they are supposed to keep routes local only. So Hathaway is supposed to learn routes just make sure that Google is able to reach its customers and its customers are able to reach Google over. Google does not expect Hathaway to announce their routes globally. Hathaway leaked those routes to Airtel Airtel further leaked those routes to its upstream and no one in the chain was filtering so the announcement just carried on and the impact was visible globally. So for a large part of time there was an outage while there was outage when the traffic was actually flowing because it was totally unplanned path. So there wasn't enough capacity on that path. So it was choking a large part of these networks during that time. There have been cases in recent time after Bitcoin there are more cases of targeted attacks on the Bitcoin system. There were notable cases where some of those systems were targeted either by announcing the prefixes on which those systems are hosted or by targeting the DNS of the provider or IPs of the DNS of the provider. So that's again a problem. So and then of course another route leak which came a while back from China Telecom. Oracle covered it quite well. Oracle has Dine which does this intelligence work and gathers the routing table and tries to analyze these route leaks in quite details. So this in itself gives you an idea that it is a real world problem. It's just not in statistics. So at this point of time, whichever IPs on your provider is using they can be hijacked and traffic can be altered. So why this is happening? When you have IRR in place, why the problem is there and why you have 20% of the routing table with no or missing IRR route objects or missing assets. Well IRR is old so it started long back and it's not easy to integrate with routers. I mentioned the tool BGP Q3. You have lots of similar tools but these tools can only generate config. You still need a mechanism to push and update your routers with the config. So operators need to have a system in place where they dynamically generate filters. Keep on updating them. Keep on filtering, keep on doing it at regular basis. If not 12 hours and maybe at max 24 hours or something of that sort. There is no direct incentive for operators to do it. So if you have a small operator who just starts by buying capacity from a provider and the provider does not ask them to create route object and starts routing them, as soon as that happens, now they are part of the problem. And there's no incentive left for anyone to fix because right now it's being routed. So very few networks at this point of time are doing strict filtering if you move beyond the edge. On the edge side is there, so a large operator like Tata, Airtel or Vodafone would be filtering their customers who have only a certain set of prefixes, five or 10 prefixes. But beyond that, if you have a regional operator, it's very hard for them to filter. The software speaking to the routers and pushing config, again that's a very common thing while in the DevOps world it's still not that common in the networking world. And industry overall lags behind that. Lots of still older hardware, lots of older hardware even with people using Telnet to access it, pushing and using expect scripts or something of that sort. So it's still tricky with the overall ecosystem which is in place beyond the IRR problem. So this gives a depressing picture. So what's the future? What eventually would happen which will ensure that you don't have this tins and cans kind of thing connecting the core. So some of the recent developments, there are some large networks which are filtering at this point of time. I missed to mention entity here in the first list. So entity at this point of time has IRR based filtering, I think, and at Hurricane Electric we implemented large scale IRR based filtering. It was tricky. And because based on the challenges I mentioned we did had to make exceptions. We do have strict IRR based filtering for most of the networks who are announcing us less than 5,000 prefixes at this point of time. Any network who is announcing us more than 5,000 prefixes it's hard for us to filter because then we are talking about the route servers or we are talking about super large regional networks and it's just not possible to filter them at this point of time. So the chain goes on, you start with small, you start filtering all the way from edge, all the way up to the core. There are some notable cases which will help to fix this junk is, for instance, Google, they have announced that they'll start IRR based filtering in September 2019. They announced that back at Nanog a while back and that will help because while a lot of providers may not be connected to large tier one operators, they do connect with content players. So content players have a lot of traffic which is flowing towards these providers and if content players go and tell these providers to fix their problem it's going to work because there is a direct incentive here. If Google stops sending in traffic they have a problem or if Google stops sending them traffic over the peering link which is a free interconnection, instead starts pushing the traffic over their transit link it's a problem for them. So there's a good incentive for them to fix over there and this always comes as a surprise for many people that networks do not filter at the large scale like Google or Microsoft because again based on these IRR challenges. You have RPKI which has come up over time. RPKI is the resource PKI so you're bringing cryptography all the way to routers to match the signatures and to ensure that routes which are coming do have a signature and RPKI is much more easier to implement instead of IRR with the routers. Here's just a quick example in the config so what you're doing here is you are putting different local preference for the match, mismatch or valid cases so RPKI does give the hope and at this point of time there's only one large network which has a deployed RPKI which is entity, sorry which is AT&T. Back in November 2008 in the announced that they are rejecting anyone who has RPKI invalid. There are many smaller regional networks which are coming up which have implemented RPKI and that's again helping to solve part of this problem. And I cover the last one's slide on how you can check for RPKI and as well as IRR so you maintain a free tool which is bgp.he.net so you can go to that link and it will show you the network from which you are visiting or you can put your own ISP's name or AS number or the prefix and the tool reflects you with these icons on whether the IRR is valid or invalid or whether the RPKI check is valid or invalid. So here's an example say for Cloudflare you have a green key which tells you that RPKI check is valid and a green tick mark which tells you that IRR is valid. So the route switch Cloudflare is originating from their ASN that matches the public register as well as we are able to authenticate them based on the signatures coming for these. Here's an example for Beam Telecom. In valid there's no RPKI so there's no key here but there's a red check because it's announcing the prefix other than from its ASN so it's using AS18209 to announce while the prefix have something else written in register. Anyone can actually filter them. You can help to contribute in the clean up. I'll probably skip the slides. You can read the slides as I'm already out of time and essentially it's about reporting the junk entries and helping to solve the problem and the references to whatever I mentioned during this talk. Thank you.