 So my name is Martin Holst-Svende, and I work with security for the Ethereum Foundation, and I also work with a Gath client. And I'm going to talk today about basically things that have happened during the last three years. So I became the security guy in basically when I was DevCon in Shanghai, and we're now in Osaka. And I'm going to talk about the base layer, the infrastructure of Ethereum. So what is the Ethereum infrastructure and the building blocks of Ethereum? So Ethereum is a blockchain, and one of the things that can be attacked and is a vulnerability or can be is the block processing. So a client gets blocks, and the block consists of headers and transactions and some other things, and they need to be validated according to a very particular set of rules. It can't be too much validated, and it can't be too little validated. So it's very intricate rules. And it's extremely consensus critical, because if clients do it differently, we have an immediate chain split. There's this EVM, which I think is the one thing that everyone knows exists in Ethereum. And this is where the smart contracts operate. And these smart contracts execute on top of this thing called the state, which is this global state of where everything is right now. And basically the accumulated result of all the transactions in the history of Ethereum. Also highly consensus critical. And aside from that, there can also be denial of service attacks that causes slow blocks. And this is what happened during the so-called Shanghai attacks. We have networking protocols. One of them is the ETH protocol. It's not actually a networking protocol. It's the Ethereum specific application protocol so that two nodes can communicate over each other. They need some blocks, and they can do announcements on new blocks, et cetera. And this protocol contains the building blocks that clients can use for their syncs. There are neighboring protocols for LES, there's light lines, and Parity's own custom warp sync. This piece is not consensus critical per se, but of course it's very important that it works, because otherwise we can have problems getting nodes to sync. And there can be denial of service, and there can be lost blocks and large reorgs, and generally bad functioning of Ethereum. And the ETH protocol operates on top of the peer-to-peer protocol, the P2P protocol, which sits on top of TCP. And it has just some basic guarantees and identify nodes, verifies node identities. So not consensus critical, but it's the backbone for the upper layer peer-to-peer protocols. So if this were to fail, then you'd have the same failures as the ETH protocol. Ethereum peer-to-peer networking obviously is built upon the premise that I, as a node, can find another node on the Internet, which is not a trivial task, but for that we have the discovery protocol, which currently is in version 4. There is a version 5 being worked at. And this uses a cadamlia distributed hash table to announce presence. And using this DHT, we can find nodes according to the distance metric they can hone in on each other. And the distance has nothing to do with geographic location or network distance. The discovery protocol is not based on TCP, but UDP. So it has no built-in guarantees from the lower layers for spoofing protection. It's trivial to spoof IPs. And this protocol can be successful to a couple of different types of vulnerabilities. One of them is Eclipse attacks, where a node is partitioned off from the rest of the network. Only once you communicate with our nodes controlled by the attacker. This protocol can also be used to do distributed denial of service attacks against third parties that are nothing to do with Ethereum. And amplification attacks, which kind of is part of the DDoS. An amplification attack can be said to be an attack where I have a small resource that can be a small network bandwidth. And I can send my end bytes out to the network, but the effect that is wreaked on the victim is amplified by a factor of something else. So basically, this is one way to develop the attack surface. I omitted some parts. There's also the sync protocols. But there were only four things in this slide template, so I had to div it up like this. So I'm going from the bottom up here. And during 2018 and 2019, there was actually quite a lot of research into the discovery protocol. There were two papers sent to us before they were released to the general public about eclipse attacks. And some people also reported via the bug bounty, vulnerabilities amplification attacks, and Felix and Frank on the get team found even more issues. One of the eclipse attacks that was found in the first paper is that Geth as a client had dedicated half the connection numbers to the incoming and half to the outgoing. However, if it hadn't already filled, done all the outgoing connections, it allowed the incoming connections to fill up all the slots. So an attacker could simply just overwhelm a Geth node with incoming connection and have all the connections to the victim. They could also see that there were other ways to fill all the slots by pinging them and polluting the identity database. And the third attack they found was that if you manage to disturb the timing or the time that a node has, other peers will reject him and forget about him, and he will reject other peers, and only the malicious peers will accept him. As for amplification attacks, here's one example that Geth was susceptible to. So if a malicious node spoof and spoof pinged the target, the target would reach out to the actual victim with a ping, with a ping, sorry, a new ping, and the malicious node could send in a ping pretending to have seen the first outgoing ping from the target. And he's supposed to supply reply token, but Geth didn't actually validate the reply token. So at that point, target believed that malicious node resided on IP address 1234. And if the malicious node then sends a so-called find node request, which is only 213 bytes or so, it causes a find node response called neighbors to be sent to the victim, which is 1.5 kilobytes. There's an amplification of about 7 on these attacks, which is not trivial. More complex example could be made also, where you set up the bonding with two nodes and then had one of them lie about the location of the first one. The first one then do a spoofed find node request from that IP address, which he lied about, and cause the same effect. Moving upwards in the network stack to the Ethan peer-to-peer layer, in the beginning of 2017, we saw that the network traffic was really up here, so much noise, and we started analyzing and looking into what caused that and found that during about two hours a node would typically reject 165,000 transactions because they were invalid, so 19 invalid transactions every second just streaming into the client. Problems were that one of the clients, Parity didn't remember who they'd sent transactions to and another issue was that they didn't actually garbage collect invalid transactions as the state progressed over blocks. So if they were included, they were removed, but if they became invalid because of the progression of blocks, they were not removed, so they were just constantly being re-sent. And after we looked into that and nailed out, ironed out those issues, you could see that the graphs on these monitoring nodes, which we put up, started leveling out, and that the incoming traffic on a node went down by a factor of 20 to 30 during January to March in 2017. And also on this protocol level, there is the sync protocols, as I mentioned, and earlier in April this year, we saw a new kind of attack, which we hadn't seen before, which was a tarpid attack of the fast sync protocol, which was a grieving attack where as soon as a node joined the network, the attacker who was scanning the DHT tables for a new node would instantly connect to that node and tell him, hey, I have the blockchain here. I'm full of sync at block five. And the victim, having no other source of truth, would tell him, yeah, please send me over to the state on the blocks. And that, of course, happens very quickly since it was a very small chain, and the victim would then be fast synced to block five or so. And after that, switch over to block by block sync, effectively doing a full sync to block one, the remaining 8.5 million blocks. So it was an interesting attack because in no way does it benefit the attacker. He needs to actively run nodes and run some custom software to scan the DHT and do this attack and gain absolutely nothing other than watch the Ethereum world burn a bit. A bug bounty from Juno IM was that there was a vulnerability in the protocol in Gaff where someone queers state entries with a so-called kind of a null request requesting the tri-metaroot and instantly crash and appear. So these kinds of bugs, if an attacker has them, he can iteratively shut down the Ethereum network by just connecting to them, mapping them out of the DHT and connect to them and shut them down iteratively, which can be really, really bad. As for block processing, in 2018, there was a live consensus flow that happened on Robston. Robston is one of the testnets. What's happened was that someone had sent one of these EAP 86 transactions on Robston. Now, EAP 86 was an EAP that proposed a new transaction format for account abstraction, and it was never actually rollout. It was never actually included in a hard fork, but Parity had implemented it, and they mistakenly enabled it in the client, and not only for testnets. So this was kind of a real critical issue, and we were just waiting for shit, someone hacked this, someone did this, found the flaw, exploited the testnet, and we were just waiting for it to hit the mainnet. Fortunately, it did not, and Parity quickly released a fix, and we could see eventually that actually there was no attacker per se. It was basically the researchers who were playing about a bit with these kind of transactions. So not all incidents are caused by malicious intent. And in the block processing area, there was also in March a very long-lived consensus bug. I think that might have been the most long-lived consensus bug we've had. Found in the block processing engine, whereby both geth and Parity did it erroneously. So blocks have timestamps. A normal block cannot be too far in the future, because the client won't import it. However, an orphan block or an omar block doesn't have any restrictions on the timestamps. They can be how far, much in the future, they want. And if they exceeded the UN64, geth would just wrap it around on UN64, and Parity would cap it. So both would be incorrect, and they calculate the incorrect block hash, and it would cause a consensus failure with the quirk that none of them would be correct. This would be kind of hard to exploit though, because you would need to mine both the quirk, the ankle, and the block within a period of seven blocks, and then roll it out to main nets. So you would have to have quite a lot of mining power to actually do this kind of attack. I'm going to have to skip a bit on the EVM portion. Most of you might know some of these. So Shanghai attacks, which I mentioned a bit, caused a bit of problems for us. During about a month, there were denial of service attacks, because of, first of all, suboptimal client limitations, and secondly, because of gas mispricings. And there were two hard forks that followed on that. And there was a main net chain split because of how the reversion was implemented. And how that whole chain split happened. We know how it happened. We haven't really figured out a good rule to explain how it happened and put it into the yellow paper. Right now, the situation is that all the clients have particular ripe MD special rules. So you can see here a bit of code from the Go Ethereum code base and a bit of code from the Aleth code base. And they basically just refixed this a little bit. Don't look at it. This is ugly. We just know that, fortunately, this exact situation will never again arise, because the conditions that would enable this to arise can never again happen on the main net. In January, I had to actually remove this slide, because each sea was still vulnerable last Friday. In February 2017, there was a really interesting consensus bug, which was a combo consensus bug that could either be used to cause an instant chain split, or it could be used to crash all the geth nodes. So the stack requirements were misconfigured for the swap dot and balance of upcodes. And the thing is, if the stack requirements are off, if it leads to, I'm talking about the EVM stack, not like NL stack, if it's an underflow, geth would immediately panic. And all the nodes that were processing this block, all the geth nodes would immediately drop off the network. If it led to a stack overflow, that would lead to a consensus issue, where geth had a different state root and parity, and you would split the chain. So any attacker which found this could just choose which route to go. And it would have been trivial to exploit and trivial to detect if it had been passed naively. So what we did was we refactored the entire code that configured the EVM upcodes. And two weeks later, we reversed the fix, reversed the whole refactoring, but left the fix in. In 2017, some attackers were hitting the geth jump test analysis. So we saw in our live monitoring system that some blocks took like 24 seconds to go through. And yeah, so we had to rework the whole jump test analysis very quickly, bring it down a couple of orders of magnitude and processing time. And there were some ideas that this was possibly the Shanghai attacker doing this. We don't know. I don't have time to go through the whole Byzantium. But in Byzantium, when we were approaching that, we really started getting payoff from fussing. And it found all these issues. And as you can see, the last one here happened on October 25. That's actually nine days after Byzantium fork hit. And not only that, not until February did we uncover the hopefully last Byzantium bug. It was Casey Dieter who found an issue with our bioethaneric pairing, which was we couldn't actually fix it until a couple of months later when we swapped out the whole VM library for an assembly version written by Cloudflare. Some takeaways from this is that monitoring and metrics and graphing up things really helps to see the overall network health and see what problems there are and what problems there might be. And Peter Szilag is giving a talk later about monitoring a live Ethereum network. Here's a chart that can display very detailed what happens when we process the blocks and exactly where the time is spent. And these kind of things really help out to find what's going on. Similar for the network traffic, so you compare with a naive one, which I showed earlier. Here's a much more detailed one. Yeah, we, of course, need really good tooling to find out what happened when something happened and why it happened. And we need good communication clients between teams to quickly coordinate, to fix stuff when the shit goes down on the mainnet. And one thing that's been really important from day one, and I think it really has been great, is that we have this node multi-culture where we don't just rely on one client to hold up the network. And I hope that that node multi-culture will continue on Ethereum. That's it for me. Thank you.