 Hello, I'm super happy to be here or not so happy I would prefer to talk about something else, but you know, we do what we can write so First we have to start with the basics. I'm not here on behalf of illusory, which is the company behind nomad So the views express are my own and do not reflect those of the company As it's an active investigation right now an active incident We will be not doing any Q&A, unfortunately My name is Odysseas. I'm a protocol engineer in nomad I've worked in web-free as a consultant and I've also been working as a DevOps and IoT So today we'll be talking about nomad. What is nomad? How the protocol works? We will need that to be able to talk about bridges how they work on the incident finally What are the learnings we have? What did we learn from? Losing about having a hack that resulted in 190 million dollars in tokens being evaporated Nomad is not a bridge nomad has a bridge, right? But if nomad is not a bridge and has a bridge That's nomad Nomad is an optimistic protocol. First of all, which means that it has an optimistic security model It's a protocol for interoperability Which means that it allows applications to be able to meaningfully React to an event that happens in another blockchain right meaningfully. That's super important We don't define how Your application will react to some event basically the nomad protocol would just send arbitrary bytes from one domain to another So it's you the developer to interpret those bytes So nomad is an optimistic protocol for interoperability that supports arbitrary messages between domains How nomad works? The first thing you have to know about nomad and probably the last It's that it's super simple On the sending chain all the messages that are being sent they're added to a metal tree, right? So and why we do that is because it's very easy with a medical tree to prove whether a message Belongs to the tree or not the information of inclusion In theory and in practice is included is compressed into the root So the protocol only had it only has to do is to send that root from the sending chain the receiving chain If we do that securely then anyone can go to the receiving chain and prove that the message Indeed belongs to the tree That has that route So all that the protocol does really is that's passing this route from the sending game Domain to the receiving domain domain is a chain, right? Let's see the life cycle of a message nomad First we go to the home contract You see here the home contract on the sending chain and we send the message right then a new route is generated Then that route is relayed to the receiving chain and you will find itself in a contract called replica Then we must wait for them to be seen optimistic window to pass Afterwards, we can go to replica to the receiving domain and say hey Here's the proof of inclusion. Here is the message. I want to prove that this message was indeed sent And after we prove a message we can process it And we will process a message basically the replica contract will take the message metadata hold the contract Destination contract and pass the message payload super simple Now that we have in a very basic sense how understand how nomad works. Let's talk about bridges What's a bridge a bridge is a super simple application built on top of any interpretability protocol Basically, you go to the sending chain, right? Let's say a theorem you go to the contract and say hey bridge I want to send my native tokens with for example, it's a native token to a theorem It has an intrinsic value and I want to send it to pot to evmos, right? Then the bridge will hold that With and it will send a message to the receiving chain to the Nomad bridge or receiving and and say hey you should mean you should create a new Mad with a representation token, which doesn't have any toy any value in itself The value is derived from the fact that we can do the opposite But the user that was the mad with can go to the bridge and say hey I want to burn that token and I want to transfer my mad with bug into a theory So the bridge was in the message bug and say hey unlock a new with Huck is when The all this locked collateral in the bridge is stolen, right? So now all these Representations that are flying around they're essentially worthless because they can't be redeemed for the original asset for the asset has intrinsic value And I think that's why we have we sing so many hugs and bridges not only because they are indeed complex systems And they are but because also they have so much collateral locked inside them They're very juicy targets. They make good targets for hikers to pry and test for the vulnerabilities Let's talk about the incident. What how was the nomad bridge possible to be hacked? We'll talk about two mappings in the replica contract two mappings was all it took The first mapping connects a route to the timestamp that says that after that timestamp that route is indeed valid You can start proving messages against the route the route of the mercury Right, so a new route comes and a new timestamp is generated so when now I can go to the replica and If the time has passed I can say hey, here's the message. Here's the proof I want to prove it against the valid route and if that happens and the message is proven now the message and When proving a message you connect the the message which about 32 The route which was proven under solidity and the default values of my things All the default value for by 32 is zero and for a number is zero as well So in a way if you look at the second mapping here in a way all messages are proven under the zero route proven but of course in the code we You know we we set an authentication flow that said that of course this is not a valid Of course, this is zero and if the the timestamp is zero then it's not valid because also the number zero is the default value The problem is that when we deploy the contract we set that to one non-zero value so What these results is that the route zero is active after timestamp one And all messages are proven under the root zero so all messages are proven Against an active truth So what users did they forged messages that were meant to the bridge that all the bridge hay and all that collateral and Sees any arbitrary message that has not already been proven is proven under the root zero They could prove and process whatever they wanted and that's right there a hundred and ninety million bug So why did it come up now, right? The normal protocol has been active for months. We had an upgrade And during the upgrade we changed the semantics of the second mapping So we used to store an enum here like numbers numbers So if the number is one it's proven if number is two is processed So we didn't connect messages to the roots under the under which they are proven, right? So even if the root was active We you know through the authentication flow, we made sure that we didn't authenticate that but we changed the Semantics and That is why the bug was so hard to find or testing didn't find it or auditors didn't find it because it needed The old state with a new code so old state with old code secure New state with new code secure old state with new code not secure Nomad now Now the protocol is posed We'll restart the bridge. That's the hard part. How do we restart an uncollateralized bridge? We'll be sharing more information soon. The TLDR is that users will be able to access to collect some of the recovered funds as the recovered funds continuously flow into the bridge and And they will do so fairly You'll be able to read more about it in our coming weeks in our blog posts and Twitter accounts So what did we learn? What did we learn from this? Like history Bismarck says that stupid people learn from their own mistakes Why speak to learn from the mistakes of others? So be wise We'll talk about not improvising that and overcome But we'll talk about test observe engage and communicate You can think about as different layers in the defenses of a castle, right? Hopefully these layers of defense will stop that the attackers from reaching the citadel the king or queen It's like the Swiss cheese analogy security, I'm sure most of your You researchers will know about that security Analogy, but I think castles are way cooler than cheese. That's why I prefer to do castles Yes, the bread and butter of every developer, right? There although they weren't any best practices. I think the industry is now slowly aligning on these on some, you know best practice on this So we're having the unit tests property based tests integration tests for king test and invariant It's I want to go a quick rundown through them. First of all concrete test super simple I want to make sure that the function if I give it five I will get 25 also name it a concrete function concrete tests Property based tests are more advanced. They force you to think about the properties of your code. So basically Verify that this function will give the input multiplied by five always Then we have the integration test where we want to test bigger picture futures user flows Forking test which is like a web-free Facility all the other test you can find them in other paradigms and forking test Basically could be an integration test or a unit test But we test against the on-chain state and this is very Important because as you saw a bug can surface itself only using the on-chain state and Finally warrant is my personal favorite in variants are these Equations phrases that should always hold for your protocol, right? If that phrase doesn't hold at some point the protocol is should be paused the protocol has broken in some way So this is a big project You'll do it in two phases first phase you will define the invariance such a very theoretical phase Not easy definitely and then is testing the variants using any tools. There are a lot of tools out there for example in nomad the invariant that broke was that All messages that are processed Right received must have been sent People were processing messages that were never sent This invariant broke and finally I would like to mention static analysis tools This could be very useful especially for Newer developers as they can find some simple vulnerabilities on storage layout analysis Basically, you can use tools to verify that the storage layout of your upgradable contracts will not change Without you noticing and this is a very common source of vulnerabilities In my view you should prioritize unit test obviously a property-based test Unforking tests this should be your primary focus with tools like foundry foundry It's very I think it's easier than it used to be of course You should always audits not use a you are not it although it's not a silver bullet Just to audits and of course always check the storage layout always Observe the second pillar now. We have tested alerting If you receive the huge upsets and you aren't already up, it's too late like your alerting has failed You shouldn't wait from a certain Twitter account to tell you that there's a problem with your protocol This is a solved problem in web 2 web 3 we like to invent things again and again But it's a solved problem in wealth too. So go and read the Google SRE handbook You'll get a ton of input also talk with your DevOps engineers if they have worked the web 2 before they will have a lot of insight for you You should start with an object, you know the business objective and then define actionable alerts Actionable that's the key word actionable That means that if an alert a is activated then you should do be It should be a very simple if they if then you know close You should have a playbook for every alert So if alert a happens you should do that and that's the way you should do it and here's the script You run super simple if you don't do that You will get alert fatigue because you will start not paying attention to alert because you know It's not that important and by not a paying attention You will miss that alert that will erect your protocol A nice mental model for alerts in web 3. I think is heuristics Environment-based alerts heuristics are rules They require human intervention to make to understand whether that's a false positive or not Environment-based alerts where you have an off-chain agent Running and continuously taking the variants of the protocol More complex but can be automated because invariants shouldn't produce false positive Like if the environment alert is on that means that either you didn't define the environment properly or be protocol is Broken in other way probably it's a good thing to pose the protocol Now engage Testing has failed alerting Maybe so now we're in the engaging. We are in the first minutes of the of the incident. What do we do? A poet or hill who stole does that would not rise to the Level of our expectations. We fall to the level for training. So if you don't have an incident Playbook if you haven't gone through that You will not be able to You know act appropriately and you will get wrecked even if your alerting was good so a good Incident management means explicit ownership Very specific persons would have very specific ownership of the management of the incident Outcomes every person should know what should be the outcome of their work during an incident game day game day game Do that again and again go through simulations the entire organization should create game days for this incident because they will happen they will happen and You shouldn't be you know during the incident you should you shouldn't read the incident playbook for the first time Here internally has a very nice blog post about it. I highly suggest you look at it and you adapt it to your organization We didn't engage in proper in proper time. We lost money, right? So it's now time to face the music. What do we do? I'm sure everyone would like yeah, let's talk to the users, you know, we have to be transparent We have to tell them what happened. We have to be no Tell them before they read it on wrecked No Nope, you shouldn't talk to anyone To talk to your legal thing. You don't do a commit. You don't do a tweet. You don't to your mother If you don't do their lawyers, you should think about what the users want But you should talk with your lawyers first When you do communicate you should be honest no sugarcoating, you know, you just insult them You should have pre-approved messaging because that's easier like during a crisis after the first hours. You're shaking You haven't slept your high on caffeine. You can't properly create communication or PR output You should tell them what you do what are you doing right now and what you plan to do Let's see a quick timeline of the first days after an incident First of all, we talk with our lawyers and we formed them of the situation so they can start talking with law enforcement agencies for asset recovery We do the first batch of public communication will tell them what happened What do we want to do? What are you planning to do? We do we talk with our partners ideally through more privileged Communication group like telegram. Maybe we can share it more than we share with the public because of NDAs and all that Because as you are losing money, they are losing as well Then you should talk with a trained analytics firm That is very important. I will tell you why because apparently people They suddenly have a change of heart when legal enforcement's energies are zeroing in on your the real identities, right? Suddenly they just want to return the funds back The only reason and the only way for to recover assets is using a chain analytics firm I will go on and analyze all these tornado transactions and find correlations and they would be able to point The law enforcement agencies to centralize exchanges so they can freeze funds and request Data personal data So if you want to recover your assets, you will need a chain analytics firm then you will Set up a recovery address where you can You will start collecting the funds and Align with your team on the bounty And of course the bank has many legal externalities. So again your lawyers will be your best friends