 So there's been a couple of Cloudflare outages in the last week. On July 2nd, yesterday, 2019, Cloudflare Outage was caused by bad software deploy. There is a good post mortem on this. And it's not long because there's not much to talk about. They basically had made a mistake in a firewall rule that caused a high application usage inside of their web application firewall devices. And this broke them. It caused the CPU to spike 100% on the machines worldwide and 100% spike caused errors. This is just a bad software update. What is way more interesting is how Verizon and a BGP optimizer knocked large parts of the internet offline today. Now, they point the finger at Cloudflare because it was noticeable due to Cloudflare's size that they were down. But if you worked on the IT side like I do and hear customers complaining, you notice that lots of other things were down too, including a lot of things hosted in Amazon that put it in the cloud. They said, it'll be reliable. They said, okay, it is reliable. But it sometimes is very noticeable when you have nothing on-prem and everything's in the cloud and the cloud that you're in, this section of the cloud you're in goes down. So this starts with the story of border gateway protocol. The essential way to look at the simplicity of border gateway protocol is a very complicated thing, but let's try to break it down very simple like they do here in a Wikipedia article. The border gateway protocol is a standardized exterior gateway protocol to exchange routing and reachability information among autonomous systems on the internet. Basically, this is the road map to how do we get from one network? Let's say a network owned by Comcast, a network owned by Verizon or a network owned by DigitalOcean. How do we get from here to there? There's always better routes and you may hear the term OSPF for open shortest path. The goal is I need to get to, let's say the Lawrence Systems Forums and you're on a Verizon network. There is a series of steps that networks and steps that it needs to follow and border gateway protocol is that map. Now this map gets updated constantly. It changes all the time because, well, there's always systems moving around, IP blocks are owned or bought or sold by companies, et cetera, et cetera. So it's a really important critical protocol. The fun part about this is where everything goes wrong with BGP is it's an old protocol based on trust. We trust that Verizon will look at the routes sent to them before they advertise them back out as, yeah, this is the best route across our network or across any network and publish them properly and any of these companies should be doing that. So let's give you a couple of visualizations here. This is what Verizon looks like. So this is AS 701. Everyone's assigned an AS number to understand how they're connected in the autonomous systems. And I'll leave links to this. This is kind of fun to visualize this. This is just the Verizon network. This is just all the little pieces and nodes that are in there. And if it looks complex, it is. So as much as I said, BGP is a road map. These are what the roads look like that are between all the machines. This is just inside Verizon's network. This is the internet. And when you look at the number of routes on the internet, you're like, wow, the interconnections are amazing. And you can pick any one of these in here. And these all represent different companies, different places that have routes. Let's just pick one. And here is the pathing for this little section here. And it goes to a bigger path, which gets here. And these are how those hops go out. So you hop from this system to system and get into where you wanna go. So it's fun to play with and kind of get an idea. And now you get a better idea of also the scale and scope of what we're doing with. So Verizon being a fairly large piece of this network, they have a big part to play because a lot of data transports across Verizon to get to other networks. Here is the little company and their little network that I say little, it's still way bigger than my network, but still in concept, it's not too many routes that they own and this is where the mistakes were made. An internet service provider in Pennsylvania, DQE Communications was using a BGP optimizer network which meant there were a lot more specific routes in their network. Specific routes override more general routes in the, as they have a whole ways analogy of finding the most efficient path to something. Well, it announced very specific routes that are customer, Allegheny Technologies, things like you say that, all the routing routes are then sent to their other transfer provider, Verizon, who proceeded to tell the entire internet about these better routes. Basically picture someone poisoning the map, they just put the wrong map in. Oh yeah, the fastest way between these two points is through our network is essentially what they said. This was massive and I found this really cool BGP replay and we can look at what they changed because BGP announced the way the routes work and they can be announced and then logged and we'd have to log all these changes because they ought to be pushed to routers and this is like a replay so we can see what happened on that time. So here's what their network look like at 9 a.m. Here was all those route optimizations. I tried rendering it and it actually locks up in my browser. So if we wanted to go and see how these route changes were then pushed up to Verizon, it kept hanging up to do any of the rendering because they were so many and that's where the problem came in. So Verizon and they're completely just, I don't know how to do it. It's a human error at the next level. A human layer, human error at scale is what happened here. How could this risk have been prevented? Well, there was multiple ways this leak could have been avoided. A BGP session can be configured with a hard limit of prefixes to be received. Basically when this tiny little company made the announcement, Verizon go, oh, the internet routes better through your network? Sure, no verification, no limit. They accepted thousands of reroutes with very specific ones and as they kind of point out and like I said, I'll leave the whole link so you can read the very detail if you want to get real nerdy into this but essentially they said we have very specific routes that are the most efficient way to get to where you want to go on the internet and a percentage of the internet then routed through Pittsburgh and it was just a big mess. So Verizon basically did everything wrong. Verizon did not validate these. They did not say, wait, why are you all of a sudden claiming that the internet would be faster routed through you? They did not like just stop the whole system based on a rate limit of this many changes coming across that affect a bigger part of the internet more than the specific company. Everything that could have been done was not done. Like no fail safes in the course. To go a step further, Verizon wasn't answering the phone. That would be a good way to describe that. So they didn't validate, they didn't rate limit and they also didn't answer the phone to fix it. So everyone affected. Cloudflare was directly impacted but as I said many other companies were too. Linode, Amazon, lots of these hosting companies and people who keep their infrastructure in one of these hosting companies were taken down by this as well. So as many people thought, a lot of these last couple of days of attacks or outages of the internet where, you know, some type of attack and breakdown, they weren't. They were just human error at scale. One caused by, oops, we had a firewall rule that peaked our CPU to 100% and Verizon just not even, no excuse. You can't even play devil's advocate and defend the behavior of Verizon of why did you accept all this, you know? You might expect this type of behavior from a very small company that says we didn't know any better, we're a small company but Verizon for better or worse owns a major piece of the internet backbone and so tons of networks traverse their network and have to listen to their BGP announcements because well, it's a big component to how this works together. The other flaw is the fact that BGP is still generally speaking based on trust. It's a matter of time before more people there has been small incidents in the past where people have abused BGP to announce routes to try to, you know, do things nefarious but most of the BGP outages and I'll leave links to a couple other times I've talked about this have been just massive amounts of human error in human error at scale due to the scale and size of Verizon. So I'll leave links so you can read about all this if you want to dig in further, play with the visualization. This is kind of fun to look at and kind of get ideas of just the scale and scope of the internet which is just outstanding and amazing and fascinating that it all works at the same time. I have several friends that work directly with dealing with BGP routes and stuff like that. They're always fun and interesting to talk to because they say just that like they facepalm about what they do with their job that it's like, you don't know how bad this is. Like we have to stop stupid all the time like someone gets in there and never let the new guy do BGP. It just goes very wrong and using an optimizer to try and do it because we do need more automation tools to optimize these but they still have to be vetted because the stakes are high. The internet's become an integral part of how we do business and how we transact things online and how we talk to each other online. So when this all goes wrong it affects all of us in a very bad way. All right, I'll leave links to all this so you can do some more reading. It's a fun topic to dive into and have fun with it. Thanks. Thanks for watching. If you liked this video, give it a thumbs up. If you want to subscribe to this channel to see more content, hit that subscribe button and the bell icon and maybe YouTube will send you a notice when we post. If you want to hire us for a project that you've seen or discussed in this video head over to LawrenceSystems.com where we offer both business IT services and consulting services and are excited to help you with whatever project you want to throw at us. Also, if you want to carry on the discussion further head over to Forums.LauranceSystems.com where we can keep the conversation going and if you want to help the channel out in other ways we offer affiliate links below which offer discounts for you and a small cut for us that does help fund this channel. And once again, thanks again for watching this video and see you next time.