 So, I'm going to introduce Olivia and Nicholas for their live demo, live demo of white rabbit which is combining threat intelligence, public blockchain data and machine learning to go down the dirty money rabbit hole. Thank you. Hi, I'm Olivia, an engineer at TrueStar and this is Nick, my partner for the team. Yeah, I'm a lead data scientist at TrueStar. So, we became interested in doing this research and building this tool. Since every day we work on finding better ways for security teams and analysts to make use of that information. So, with the rise of cryptocurrency in general, but also in the security context in these last few years, we decided to investigate that further in our tool. So, as you all probably know, ransomware campaigns have been evolving their payment methods from prepaid debit cards to cryptocurrencies such as Bitcoin and Monero in recent years. And there's plenty of reasons for why Bitcoin, which we're specifically looking at in our demo, could be appealing to these attackers. So, first of all, it's decentralized and pseudo anonymous, which allows for them for attackers to easily hide their tracks and for them to easily be able to launder their funds through Bitcoin mixers or tumblers once they've received those funds. But thankfully for defenders and specifically for us as we're going through this tool, it's also an immutable public ledger, which allows us to, as you'll see in the next few slides, to use a few Bitcoin addresses associated with this illicit activity to trace all transactions that were related to it to pretty confidently create a cluster that we identify as a ransomware campaign attacker's wallet. And that brings us to one of the points of our tool and research is that Bitcoin balances can be predictors of ransomware campaigns ramping back up. We've tried to incorporate this idea into what we built as an early warning system, which we do by monitoring Bitcoin wallets and their balances and seeing if any kinds of anomalous activity or any patterns that we can find from there are indicators of activity happening out off the blockchain and if we can map that back to other sorts of observables such as IPs or hashes. And we do this by using a four-step workflow. We start off by harvesting seeds, which are, which is what we identify as Bitcoin addresses that victims have paid the ransom to in a couple of crypto ransomware campaigns that have used Bitcoin as a form of payment. From these addresses, we form clusters that we then identify as the attacker's wallet and we monitor these clusters for cyber teams and analysts to then use to actively defend. So just going back to what we were first saying, we use seeds to begin the cluster and an idea of a seed could be the address you see in the top right in this screenshot from WannaCry, where the attacker's asking for $600 worth of Bitcoin to be sent to that address. And so most of the time these addresses are randomly generated and once the funds are sent to the attacker, a decryption key is sent back. We were able to find these seed addresses from about two or three dozen open sources. So we then take that address, which alone is really just a random data point, and we expand this data set using what we call a multi-input or co-spend heuristic. And to expand this data set, we actually use an open source blockchain explorer called BlockSci. So here we start off with the seed address and recursively look for all addresses that shared inputs and transactions with that original seed address to where we have this final cluster, which is what we identify as the attacker's wallet. So Nick is going to go into our specific demo that looks into the Cryptolocker campaign. So yeah, as Olivia said, we're going to do like a deep dive into how the Cryptolocker campaign evolved. Cryptolocker was a ransomware that first appeared in September 2013. And then it evolved, there was a couple of releases after that of Cryptolocker, which basically leveraged that initial version and built on top of it. The second one was CryptoDefense, which was a release around February 2014. And it was short-lived. And then afterwards, CREAM, CryptoWall, with all its version 1 through 4. The interesting thing is that you can basically track that evolution of that specific ransomware by looking at blockchain data and connecting some of the cluster of Bitcoin addresses. Exactly, they map out to the releases of these different tools. So I'm going to go to the tool now. So what basically the tool allows the user to do is to look at the different malware families that we were tracking. The latest one actually was something that was observed a couple of weeks ago. It's called CryptoWallet Hijacker. It was a tool that was changing the Bitcoin address that was receiving the funds. And so if it would, let's say, but Olivia wants to send funds to me, CryptoWallet Hijacker would flip the Bitcoin addresses and put the attacker's wallet address to it. But going back to Cryptolocker, so if you click on one of the malware families, the thing that we showcase to the user is the balance associated to the cluster of Bitcoin addresses associated to that campaign. So as Olivia said, starting from a seed that appeared within that malware, specific malware sample, you can reconstruct a cluster that you can hypothesize that is associated to the wallet of the attackers. And to gain confidence in the result, you really need to map out whatever you're observing on the blockchain with events outside the blockchain. So for example, we know that the first release of Cryptolocker, as I said in the previous slide, was in September. And it ended, and that campaign ended, or CryptoDefense was released around March. So you would expect that Cryptolockers that they would stop receiving funds between these two dates. So and indeed, if you look at the result from that cluster and the balances, you see that that cluster of Bitcoin addresses started receiving funds around September 7th of 2013. So two days after, technically, that software was released. And then you can see that the balance really evolved September through the end of 2015. And you have a peak here of where the attackers had around $140,000 in that specific wallet. Then that campaign started ramping down around February until these guys exfiltrated all their money out of this wallet. Then after this was dying down, Cryptolocker was dying down, what was interesting is when you start looking at the CryptoDefense blockchain data. So here, starting with, I think, one seed from CryptoDefense, we were able to reconstruct two clusters, so basically two wallets associated with CryptoDefense. What was interesting in that particular release is that it didn't make a lot of money as much, so it didn't make as much money as Cryptolocker did in the beginning. So it wasn't a successful campaign for these guys. And one of the reasons that we were able to identify is that it had a cryptographic bug. So the victims were able to easily decrypt the data that these attackers tried to encrypt. And compared to a peak of $140,000 for Cryptolocker, CryptoDefense really didn't make enough revenues, which bring us to the third release, which was CryptoWall. CryptoWall was actually particularly tenacious, as in terms of how much revenue these guys collected, I think I estimated around $4 million. And we were able to identify around, I think, 13 clusters of Bitcoin addresses associated with CryptoWall. And these clusters or wallets, I mean, there's two hypotheses as to why there were so many clusters or so many different wallets. One hypothesis that I can think of was that there were different minor releases, which were associated of that software, which were associated to CryptoWall. The other hypothesis is that there were, as ransomware was evolving as a ransomware, as a service type of industry, probably there could be other attackers that were leveraging that code and also releasing campaigns. And what is interesting is that some campaigns were relatively way more successful than others. So there could be some intelligence outside the blockchain as to why that specific campaign was more, maybe like they're phishing emails or whatever were more targeted or there has to be something to make it more successful. But also the other thing is that the ransomware becomes more successful if the cryptography is strong. So CryptoWall was the first software to implement really strong like rigorous RSA cryptography, which was hard to break. And which is why if you look closer at the balance data, there were a bunch of these wallets cluster, which made a lot of money. And one of them around the end, like around August 2015, had a balance of around $180,000. And then basically, as I said, CryptoWall came in right after crypto defense and they were able to patch that cryptographic bug and implement rigorous cryptography. And if you look closely, one of these clusters, the orange cluster, made three times more money than the crypto defense. And then what we offer in this tool is like your ability to dig deeper into each one of these clusters. So like, as I said, CryptoWall version one was started around, I think it was released around April 2014. And you start seeing that first wallet making money around May 2014. And at some point, they made around 60 Bitcoin. The most interesting wallet or version that made a lot of money was CryptoWall 3, which at some point was able to collect, like you can see here, 800 Bitcoin at one specific point in time. So if these guys hodled, as they say, in the crypto community, they kept their Bitcoin, they're probably millionaires right now. But going back, the other thing is I want to tell you guys is that we open source that code, so like it's easily downloadable from our own GitHub repository at TrueStar, TrueStar, WhyTrabit. What you need for that code to operate is running instance that contains BlockSci, which is our analytical tool to allow us to explore the blockchain and do some of the analytics and the clustering that is necessary for us to reconstruct the clusters. So what you would need to do is let's say you want to run one of the example notebook that we released as part of this tool, you would need to spin up a node, like a BlockSci node, which allows you to pull in the latest blockchain data. And from our experience, it required a large AWS instance because the blockchain data right now, I think is around, at least for Bitcoin, is around 140 gigabytes. So it's a lot of data. Anyway, coming back to let's say the example, there's a couple of utility functions that we built that's wrapped around that tool, but I'll go quickly as to what are the steps for you to reconstruct these clusters. So the first thing is you need to basically, what you would do is initialize the blockchain. Then afterwards, you could set the heuristics for you to perform the clustering. So by heuristics, as Olivia explained, in this case, we use the co-spending heuristics, which allow you to determine which Bitcoin addresses spend together towards a certain transaction. And this is basically that allows you to stipulate that the cluster that you were able to obtain is associated to one wallet, one user's wallet. For the crypto locker case, we were able to identify two main seed addresses. And the seed addresses, as I said, are the Bitcoin addresses that were the receivers of the funds from the victims. From that, the seed addresses, you basically create that address object that block site allows you to build. And then you want to find the specific cluster that is associated with that Bitcoin address. So you would obtain the, in this case, the heuristic was removed a heuristic that is called a change heuristic, but I'm not going to go into it. And then for crypto locker, what I'm printing out, the result that I'm printing out here is the number of Bitcoin addresses that were associated to that wallet. So for crypto locker specifically, we were able to reconstruct 968 Bitcoin addresses associated to the wallet of crypto locker. And then from it, from that cluster that you find, what this tool, what this, what block site allows you is to obtain all the transactions that transacted with that specific cluster that you're constructed. And basically, that allows you to compute things such as the volume of money that went or flowed into that cluster or into that wallet. So for crypto, for crypto locker, the first version, the volume of money that what transacted with that cluster was around 11.8 million dollars. And then we also build a utility function, which allows you to rebuild these balances associated to that specific cluster. So from it, for example, in this, in this function, you're able to obtain the block heights at which the transaction happened. And then for these block heights, you can obtain the balances. And of course, you can plot it at the end. It's true that for example, in this graph, so this is the graph that was that I was showing in the tool that is hosted already. It shows you data from September 2013, which was the start of crypto locker till the end of 2017. But the reason is that these ended were so large, there's a lot of researcher out there that perform these micro payments in order to be able to reconstruct the clusters and determine if like reconstruct the wallet. So the reason they do it is if there's no payment or money flowing through these Bitcoin addresses, it wouldn't be possible to reconstruct the cluster. So that was I thought interesting, even though the actual campaign for the first version of crypto locker ended around March or February 2014, which bring us actually now that I discussed a little bit what we did and showed the tool to a final proposition that we'd like to make a spot of this release. So I think as security, like threat intel analysts, we have to think about how we can leverage data on the blockchain and how we can leverage Bitcoin addresses and add it as part of an indicator to the to the pyramid of pain. And I'm gonna go and I'm gonna explain I mean this is debatable, of course, it's not final. So Bitcoin addresses have like some interesting characteristics. The first characteristic is more similar to hashes. As in, you can easily use a Bitcoin address and like some link analysis and compare events to each other. And the links are as accurate as a hash. The other thing is that they're also kind of I mean, they're easy to generate right like hashes are easy to to generate you change one bit in the in the code and you have a different hash. Bitcoin addresses also in some ways, share that attribute as in the sense that attackers or anyone really can use any open source tool or like whatever like service commercial service to generate a wallet or to obtain a wallet and generate new Bitcoin addresses and a public public private key pair. So in a way that that's the easy part. But the cool part, the other characteristic of Bitcoin is that they in some way share some form of TTP about like that about the attackers in the sense that because blockchain data is immutable, researcher and forensic experts or even government agencies can look at the blockchain data and investigate the behavior forever. It's something that is always going to be there and it's always available. And the attackers can't do anything about it. And once you investigate on the public blockchain, you can easily if you're an offensive person, you can easily disrupt the payment system of the attackers. So that could be really a really painful attack that you can perform on these attackers. So in some way, that's why I consider it as like a characteristic that is similar to TTPs as in the sense that it's very painful if you disrupt that payment system. And on average, I considered adding in the basically to right in the middle being an in between TTPs maybe and hashes. There's of course a lot of future work that we'd like to perform. One one problem we identified is a lot of the time you're heavily reliant on the seed addresses that are the start of the clustering methodology. If you don't have enough intelligence, if you don't have malware samples, you're not able to reconstruct basically all of these clusters based on this methodology. So you wouldn't be able to track the campaigns as efficiently. One potential solution to make this more like usable by everybody is to have like a common repository, maybe for seed addresses where like if someone or detonated a malware sample and collected a Bitcoin address, it's it's important to share it and link it and associate it to a specific ransomware family so that defenders are able to track balances of that specific particular wallet. The other aspect of it is that there were there's a lot of missing labeled data, particularly around whitelists. So it's it's basically it's very hard to determine or not very hard, but like sometimes the heuristics that we showcase in terms of reconstructing the clusters could bet could get contaminated by something called a coin join transaction, which is a transaction that lenders money that was made by these attackers. And it's very it's behavior is in some way similar to the cost spending heuristic that is used to build these clusters. And if you encounter one of these, you might be able to build a cluster that is associated to like maybe a mixing service or even a complete an exchange, so you can end up getting 1000s or hundreds of 1000 of Bitcoin addresses that are associated to an exchange. So in some way, if there's an ability for exchanges and other tools to provide maybe defenders with whitelisted Bitcoin addresses, that could be super helpful. The other aspect is the computational complexity. Sometimes it is very slow to compute clusters for some of these addresses. The other thing is that like if you want to let's say leverage things like ML, there's I think right now, there's around 140 million different clusters. If you perform the clusters on the whole blockchain. So that gives you a sense about the scale of the problem. So there's 140 million clusters. And if you want to compute some form of let's say I computed some I did some time series analysis about how these wallets associated to bad actors behave. And I want to compare it to other clusters that are unlabeled on the blockchain. It's a highly computationally complex problem. The final the final problem is, of course, the matter of ethics and privacy, right? Like these tool can be used also by bad guys for surveillance and monitoring of transactions. That data is public. Anybody can use it. You can link any off of the chain data to blockchain transaction and try to determine who transacted and what were they doing. So you can do a lot of surveillance that is could be challenging morally. And you can also have arbitrary discrimination in the sense of like, let's say you start building these heuristics and clusters and identify like different risks associated to Bitcoin addresses. Some people could could leverage these types to discriminate against specific people and not allow them payments. So as a community, like, I think we need more engagement from good actors to be able to resolve some of these some of these issues. And again, thank you for listening in. And basically, as I said, this tool is is open source. And if you have any question for me or Olivia, please like, share this question with us. And like, we're ready happy to take questions right now. Yeah, it's actually a very good question. And we were like, playing around with it for a while. So in terms of a warning system, it's more it's if you were not the victim, maybe you're like, it's more about like if you're a security operator, and you know that specific ransom wire is ramping up or ramping back up. This is where it allows you to defend and if you haven't been hit, this is what I mean by by a warning system. It's not really in terms of a system for or crypto exchanges to for them to block these transaction or something like that. Maybe I understand your question correctly. Yeah, I think so, I think but it requires more like ML based approach in the sense where like you directly identify something as as bad, but this requires of course, like much more investment and computing power. And yeah, it's