 Hello and welcome to COSTA, Coinbase Secure Trade Analyzer Talk, where we'll discuss some of the challenges that we face here at Coinbase, listing new assets as fast as possible, as secure as possible, and the solution, which is the tool, COSTA, that we have developed to address it. In case we have not met, my name is Peter Kaczaginski. I work on blockchain security topics here at Coinbase. I analyze, secure, and review different blockchain systems and smart contracts. Mostly publisher of the Blockchain Threat Intelligence newsletter and the founder of the Open BlockSec project. In the past, I was a malware reverse engineer and a penetration tester at the Federal Reserve. So one of the things that we do here at Coinbase is that the security team is responsible for secure asset listing. There are, in case you haven't noticed, we are listing more and more assets every day. And I kind of wanted to reveal what happens behind the scenes. How does the asset listing and review process works here at Coinbase? So it all begins with the asset issuer sending an application, filling out a questionnaire that we have published on the asset hub on Coinbase website. The questionnaire includes things for smart contracts is the address on the Ethereum blockchain, for example. If it's a standalone blockchain, we ask folks about the blockchain explorers and various consensus mechanism information and so on. Once we get this initial request, it goes to three different teams. On one side, it goes to the compliance review to make sure that the whatever asset will be, will allow us to be compliant with the regulators. We perform legal review to make sure that the asset is legitimate and to make sure that it doesn't expose us to any legal liabilities and, of course, the security review. So this is what I'll focus on today. After all of those reviews are conducted, all of that data is sent to an asset listing panel, which goes through all the different characteristics of an asset and eventually decides to list it on Coinbase. So let's talk about the security review challenges. Well, for one, it's a pretty costly and time intensive process, at least it was in the past. On average, it took us about a couple of weeks to go through a single blockchain standalone blockchain asset and about a couple of days to go through a smart contract token. Consistency was always challenging because as you go from one reviewer to another. People have some variance in how they review risk. And at last, as we list more and more assets tracking all of them, what are their risk scores, what are mitigations that need to be applied, how are things changing through time is becoming harder and harder as we build that library. At last, we try to use existing tools out there and there are a lot of amazing tools which are primarily geared for security analysts. They're performing audits, standalone audits for a given asset, but we needed something that allowed us to really push forward and automate things as much as possible. When we're talking about automation, the scale that we're talking about here is going from about one asset a week to up to 100 plus assets a day that we can analyze, go through, make sure that we trust their security and eventually list. So that's our ultimate goal. I would say we're not there yet, but we're definitely in the 10 plus assets territory. So let's talk about Ethereum smart contracts. As I mentioned, we list two types of assets. One is the smart contract based assets like Ethereum tokens and standalone blockchains. The reason why Ethereum tokens is easier to automate is that one, they use standardized interfaces like ERC20. There are a lot of existing libraries like Open Zeppelin. The languages used, the smart contract languages are fairly limited, so it's possible to analyze them completely. And at last, it's a great target. It's a very large market cap. DeFi projects are popular these days. There are more tokens out there that come online every day. So it's a fairly good target for automation. So a normal smart contract review is we basically pull down the source code from Etherscan. We go through it function by function. So this is a standard ERC20 interface and we'll review to make sure that it's safe and secure. This is a very manual process. What we started noticing is that, if you look, for example, at this particular pausing function, is that, okay, so this particular function can only be called by the owner. We have to make sure that the asset is not already paused and then it sets the flag for whether or not this asset is paused to true. We see the same function over and over and over again. As we go through the analysis, we started noticing patterns like why are we reviewing the same function multiple times, if only there was a way to detect it automatically. The reason why this function keeps on popping up is because a lot of projects, they import existing libraries, such as Open Zeppelin, and of course, the Theorem smart contract community is fairly small. So everyone is looking at each other's projects. They're learning from each other. They're iterating slowly on the code. The way that we decided to approach the automatic detection of known patterns is by looking at them through abstract syntax trees, ASTs. So if you take a previous function, the pause function, you can decompose it into just the grammatical attributes. Just the grammatical attributes. We have two variables. There's a sign operation. There's a call to an event, pause, and there's some attributes to this function. What this allows us to do is remove all syntactical metadata, such as comments, or how many spaces you have between the equal sign and variables, and focus on what does this function actually do. Next, we generate it, we take the abstract syntax tree, and we basically flatten it. This allows us to create the next step towards the signature generation. We took an existing library like Open Zeppelin, and we performed the same operation where we generated ASTs for 3,000 different contracts, and we generated 2,800 plus signatures from all those contracts. Next, we took all of the existing smart contracts we analyzed and generated additional 500 plus signatures. What we did next is something to say about our team and that we all like each other. We manually went through all of those 5,000, more than 5,000 signatures and individually assigned them risk. What are their characteristics? For example, the pausing function that we saw before, we would assign pause tag to those functions. If we see something that can upgrade a contract, we would tag it with upgrade. If it can confiscate funds and so on, we would label them one after another. After we generated this massive initial library, pre-seed library, we can move on to the next steps, which is we took those labels, so let's say pause, and we assigned them what is the risk. We created a risk core between 1 through 5, 5 being the most critical, most dangerous. We assigned for each risk, we assigned mitigations, which can be automatically applied for any given risk. We did this for every single function, so we build those massive libraries of signatures, risks, mitigations, and of course scores for each one of them. Let me show you a capability of what the tool does manually and see if you can do something like that manually better. On the left, we have a signature that was generated for this pause function. You saw that earlier. We're looking at a contract which has another one, another pause function. What the tool does is that it compares them, but after it compares them, it says, okay, those two match perfectly. And I can pass it. I don't need to analyze it manually. Let's look at another example. So two identical looking functions. You know, the auditor will need to review them line by line. And in this particular case, the tool automatically detects that it's actually someone tried to backdoor this function by replacing letter O with zero. Something that an analyst may miss, but the tool because it's looking at the grammatical representation of those functions will detect quickly because the signatures will not match. Let's look at another example. Can you tell me quickly if it's matching? It is. Just because you introduce a whole bunch of spaces and comments and rearranged things. Again, the AST trees will match. So those two will be equivalent. We don't need to review them manually. A few more examples where if you rearrange things, that will be detected as false or you change different Boolean variables to your sign. All of that will be detected automatically. So where does this take us? We generate those flattened AST trees. Next, we generate a hash. So just a shot 256 hash of those things. So it makes the comparison very quick. These are, as you can see, the sources where we previously saw that particular signature show up. So this signature listing grows slowly as we see more and more contracts that are matched. And of course, we assign what was the tag to it. So we assign that this particular function causes pausing. So what this allows us to do, instead of manually reviewing the validity of each smart contract one after another, we turn our processes basically in a machine. Like we can go through hundreds and hundreds of smart contracts quickly, as long as a good portion of them have signatures that we've seen before. And if we don't, then we build mechanisms which I'll show you on how to quickly populate that signature database and make it smarter. So that's the foundation for the tool, Costa. There are currently 8,500 different smart contract signatures from all the different smart contracts that we have analyzed so far. We never have to, if we see a new signature, we add it to our internal storage and we never have to review it again. It contains codified feature risk and mitigation repository. So for every signature that we detect, we can immediately describe what is the risk here, what are different mitigations, how bad is this thing? Like if we see something that can, let's say confiscate funds will immediately label as dangerous and it will affect the overall score of the contract. We can track mitigations and how they're applied across all the different assets. So for every risk, we can apply different things, things we can do in the case of maybe monitoring or some additional controls that we can implement that make that risk not as bad. So we can keep track of where those things were applied across our entire base of assets. And finally, we can generate nice reports that are used for our managers and leadership so they can make educated decisions about the safety of the safety and the current status of the smart contracts. The last but not least feature of Coast is that, think of it as a metasploit for smart contracts. It has a small arsenal of tools built inside it that is useful for also manual analysis, not purely fully automatic, such as pooling source automatically, flattening, querying the blockchains, parsing the solidity and so on. So let's take a look at how you can analyze smart contract with Costa. So when we receive a new asset, we get this basket of all sorts of different functionality in that thing. What we do next is we decompose it into individual functions, we generate code signatures for each one of them, and then we extract all the different risks associated with all those signatures. Next, we compile them all together, take whatever the highest risk was, and generate reports. And finally, ship the analysis to our leadership so they can make a decision about the asset. So let me illustrate this through a demo where we'll look at a sample asset. So right now I'm about to run the Costa analyzer against an Ethereum address which is associated with the chain link token. So all that is needed is the address, it will automatically pull the source code to generate ASD trees that I described, create signatures, and extract all the different risk points from the contract. So let's see how this actually works. So there you see it loaded all the signatures, it downloaded, it already downloaded the contract previously so it doesn't need to do it. It detected a total of 19 signatures and all of the signatures in that contract are well known, so it was able to successfully generate a report. So if we open a report, let's take a look. So this is what the report looks like. We have an executive summary, well first of all the smart contract address. The executive summary with the score, like what mitigations were applied or not. Inherent score is the score as is without any mitigations, residual is whatever remains if mitigations were applied. So in this case it's score two, so the risk is minimal. And then we have description why this risk was scored the way it was. So for example, we have a finding which is the external address call. These are the functions and associated signatures that were detected that produce that score and a brief description. So what else we have in the report, we have assembly instructions. So we noted as risk score one just informational, we want to keep track of those things. And then towards the end of the contract, we have a full listing of all the signatures detected in this contract. And I guess the remainder of them did not have any risk associated with them. So let's take a look at an example contract, which does not have all of these signatures known. So we need to do a little bit of a manual work. So in this case, we will take a look at the year finance token. And notice that I added a new flag dash dash interactive. And the reason why is that it's a brand new smart contracts and I previously ran it and it failed saying that there's some missing signatures. I can actually show it to you. So if I just ran Costa without any parameters just as before. So I detected a total of 31 signatures. And then it says, hey, like there was one unknown. The contract has one unknown function signature. So you'll need to do some manual work and define it before I can generate a report. So that's where the interactive flag comes in. It will present us an interface where we can label this function with whatever risk is relevant. So let's rerun it with the interactive flag. And this is the function that was that we had a missing signature for it was a mint function. I guess it was unique for it looks similar to what we've seen in open Zeppelin libraries. But there was something unique about the combination the compiler and the way that's used within this contract. And we we see that there's a fuzzy match a fuzzy match engine automatically compares new functions which we don't have a precise signature for. If there's some kind of match with with a known function signature. So we can see that we previously analyzed a token YFI I and that obviously match with the YFI so there's there's a correlation there must be compiler difference which is why this is different. So you can see that it's a different offsets. So at this point we're satisfied that this is a safe function as minting functionality so I can put in a label minting. This will allow me to store that in store so anytime we see this again I don't need to analyze it. And now we can generate a report. So let's take a look at the reports. This is the report we just generated for the year and finance token. The score here is again to I guess some things that started popping up this is the minting function that we just added. There's also like super user account and privilege management functions and an external address goal. So let's take a look at those in more depth. So minting functionality the risk here is of course that it could be some an administrator can inflate the stock and so we want to make sure that we understand this risk. And the other functions is the super user account or privilege management. So these are there are a few functions there said governance remove mentor adminter and those perform some kind of governance function so we want to know that as well. And of course, the external address call so call optional return. That's a well known function it does external call out so we're tracking that as well for functions which have governance risk. We have our own slack monitoring system so we call it circuit breaker. So these three functions whenever they're called we will automatically get alerts, and we will investigate what kind of governance action is taking place. We also have a major duty setup for more critical functions for example if someone is shutting down a project or is running something more dangerous will wake up at 2am and investigate what is happening there. And of course, the, this is a complete listing of all the different function signatures that matched in this project so we have our safe map functions in here, and a whole bunch of other things that we didn't need to analyze. So notice that as we were going through this smart contract. There were more than a dozen of functions that we would normally need to scroll through and review and analyze. Instead, we completed the report in a few seconds because the tool immediately identified what was unique about this function. The contract is almost verbatim ERC 20 and try to implement one little new thing. The tool will razor focus you on just that one thing so you can be efficient with your time. There's no need to verify safety of safe math library if you if you've seen it again and again and it's coming from well known library like open zeppelin. You don't need to analyze that multiple times. And that's the power of the tool that it removes duplicate effort road map. We have a few things planned ahead of us. One is we want to remove the the manual review step as much as possible so you saw the power of the fuzzy matching so instead of trying to go through the entire contract we just had to compare it against something that matched pretty closely using formal proofs to automatically deduce like this is the functionality within this particular blob of code and we can we can prove that that's all it does and nothing else. So this will allow us in the future to automate this step further. augmented mode enhancement so we want to continue integrating different tools into Costa and and try to sneak it really like one one stop shop for analysts both automated reviews and manual reviews. And of course, it's definitely in our road map is to open source to still give this to the community to help them. Evaluate security of their projects other projects and just help the whole ecosystem that's way to become more secure. There's some risks and limitations within the tool that it's important to discuss. One is the database reliability. So we are manually editing our features risk and mitigation so we take care to make sure that we do regression testing and other running other heuristics to make sure there's we're not missing anything or mislabeling things. So there's some kind of inconsistencies we catch it pretty quickly as we do run our tests on every on every commit. ERC 20 code reuse. So it could be that in the future contracts will start deviating more and more from ERC 20 contract pattern. So this will require us to do more manual addition more code that cannot be reused and used to be manually analyzed defy and logic bugs and much harder to analyze. So notice that they were all the projects that I looked at. They were pretty straightforward tokens. Once we go into the defy territory where you have hundreds and hundreds of different functions which are very complex interaction between them. It becomes much harder to catch and certify that this particular contract is safe. So Costa may be a great tool to let's say if you certified something like sushi swap and you want to look at something like cakes pancake bunny and see like where the two projects deviate. But beyond that if you have a brand new project that you want to look at, then it becomes harder. And of course we rely on ether scan to pull the source code. So there's some degree that we want to make sure that ether scan is being honest but we have mitigations against that as well to make sure we can verify that the generated source code matches the bytecode. That's all I have. This is the newsletter that I published. Check it out subscribe to blockchain thread intelligence. And since this is a recorded session, please feel free to reach out to me on Twitter or elsewhere if you have any other questions. Thank you.