 Nikola Greco heads CryptoNet Lab at Protocol Labs, where he works to build technological empowerment through providing secure building blocks for Web3 technologies. Today, he'll be speaking to us about his work on the Filecoin Data Retrievability Consortium. Welcome, Nikola. Hello, everyone. It's a pleasure to be here. I am actually representing Iran, and I was meant to give this presentation, but due to last-minute issues, she couldn't join. And this is most of Iran's work, and I'm here just to present this. So the key topic of this presentation is how can we guarantee retrieval from decentralized storage networks? There are several storage networks like Filecoin, which do offer retrieval, but not retrieval as a guarantee, and we will dive into this more. It's very important that the retrievalability of files from the decentralized storage network must be web-scaled. So we can't just have a small decentralized storage network with a small amount of files that can be retrieved. Anything on the web should be able to be retrieved. And also, this allows not just the unbounded retrieval from storage, but unbounded onboarding of new files in our network. We're going to cover four topics and talk briefly about Kryptonite, and then about data availability and friends, data retrieval, the topic of this conversation, and then one of our proposals. If we work on data retrieval, it's because we believe it's important, but it doesn't mean that the other topics that we will look into, like data availability and proof of storage are not important. We're also working on this. For context, Kryptonite is a crypto cryptography applied research group, where we do fundamental crypto research, vector commitments, NARCs, and so on, but also protocol design. And our goal is to have our group of researchers to collaborate with several other researchers to grants or collaborations. And out of these collaborations, we will have new researchers, DAOS, new projects, and so on. Through time, we published several papers in the academic world related to proof of storage, consensus protocols, and so on. But also we made the substantial improvements to the file calling network, some of the most notable were SNAP back or SNAP deal. And one of the things that we also do is we gather a lot of ideas and start conversations on potential new protocol design. And all of our work is public, all the work that we do with other organizations is public, and you can see our Kryptonite notebook, which is where we post down ideas the moment they come to us. Some of these ideas, like the one that we're presenting, may make into products as well. So let's talk in high level about data availability. The protocols of data availability can be divided in two steps. One is dispersed and one is retrieved. Dispersed protocol works more or less like this. The node that wants to distribute the data distributes to several nodes. Majority of the nodes approve that they have seen the data and then the protocol can continue. If majority of the nodes don't give approval of the data being seen, then the protocol holds or wait until there is a disapproval. Why? Because it's very important for data availability protocols, the data is distributed and the data had the chance to be distributed to enough nodes such that when we want to retrieve, there is at least one of N in some protocols or M of N in other protocols, available nodes, honest nodes, they're willing to serve the data. This is perfect for, sorry, this perfect for roll up data or blockchain data because this data doesn't have to be preserved for a long time, it can be proven and we need to make sure that at the moment of distribution, not only a few miners saw it, but as many nodes as possible. But there is some negative aspects of data availability as a theoretical problem. One is that it requires honesty assumption, then node will serve the data. There is, in most protocols, there is no guarantee that these nodes will be willing to serve the data later on. And the second problem is that it requires consensus on dispersal, which means that there's a limit of the throughput of new data that can be dispersed in a consensus protocol. Why? One of the core properties of a consensus protocol is that data gets distributed well enough so that everyone can verify that something has happened in the correct way. And in fact, most of these protocols aim at one to four megabytes per second, which is unfortunately not web-scaled if you want to do a storage network or guaranteeing retrieval. Reminder, retrieval is not guaranteed in these settings. Then there is another approach, which is the proof of storage approach, which is the one that we mostly use in Filecoin and many other protocols use this as well. The idea is that the node distributes the data to as many nodes that they feel that they're willing to replicate the data to. This works in Filecoin, it's different for other protocols. And then from this moment onwards, storage providers generate a proof on a daily basis in the case of Filecoin to prove that they kept on storing the file. But even in this case, there is not strong cryptographic guarantee that the file will be served. The way we guarantee in Filecoin, of course, is that there's always, we believe that if you distribute your file with several storage providers, or if you pick storage providers that do the reputation, then these storage providers will do a great job and that's how most protocols guarantee the retrieval. And this guarantees onboard the throughput because we don't have to have agreement or consensus on the fact that the file has been distributed. It's the role of the client to make sure that the miners, the storage providers have received the file. So data retrieval is trying to solve these two problems. Unbound onboarding and the guaranteeing of retrieval which doesn't rely on assumption of one of M nodes are honest. Also, most of these protocols are not really designed for providing the retrieval long-term. And while most of the previous, while the availability, it's great for solutions are great for roll-ups, they might not be great for say NFTs. Especially if we think that NFTs may be the standard for any digital asset, then one to four megabytes per second is not gonna be enough. Plus, we don't want our data to be pruned in the future. We want to make sure that data can still be retrieved later on. So can we have a proof of delivery? So someone gives you a file and I have a proof that they can give me a file. Otherwise I would prove that they cannot give me a file. Well, we can't really have this without adding other assumptions. If we were other assumptions, so for example, there is a third party that checks that these exchanges been done, then that would be perfect. But without trusted party assumptions, then this is not possible. So let's look at some attempts on how we go around guaranteeing retrieval. One way, the naive way, is to use the blockchain as a weakness. So if we assume that data in the blockchain is always gonna be served by nodes in the network, then what we can do, we can do the following. We ask data to the storage provider. The storage provider doesn't give the data to us. We force them to post the data on chain. Now, clearly we can post a hash on chain. We can't post one megabyte file. So this will never work for files there or data that is gigabytes big. So this technology, so storing data on chain doesn't work for large data. The other alternative idea is proof of retrieval is a beautiful primitive that as you query the storage providers to give you proof of the retrievality, they leak data. But it takes a very long time to leak a full file. So the idea would be to keep on querying the miners with proof of retrievality until they give you a full file. This works for small data. This doesn't work for large data. So the alternative is to have a trusted single party. But that wouldn't be ideal. And can we have a trusted third party that is decentralized? And this is pretty much the approach that other oracles take. So for example, chain link as a network of users that agree on the price that they see across exchanges for Bitcoin. And in a very similar way, we are proposing an oracle which says whether or not a file can be retrieved or not. And it works in high level works like this. So the node ask for the retrieval to several storage providers. None of them gives them the file. And then the node goes to the oracle that says, I didn't receive the file. Go check if you can receive the file. The oracle goes and check. And if they don't receive the file, then they slash the storage provider assuming that there is some collateral that the storage provider must put in order to participate in this protocol. If they receive the file, well, then the retrievability oracle has seen the file and the retrievability oracle conserved the file back to the node. And this is in high level, one of the simple solutions that we're proposing. There are many big questions we will not cover in this conversation on how can we make an oracle these are true through telling incentivize the network. The MVP, which we will show at the end is having the oracle as a network of referee and the clients and provider agree on a retrievability deal. You can think of this almost as an insurance contract on a retrieval of a file. When the client requests the file and it doesn't get it back, then it appeals to the referee, the referee or the oracle, the referee committee. Then we have a particular protocol where instead of querying the entire committee we query a subset, even a single node as a single committee node. They go and check if they can retrieve the file, if yes, they serve it back, if not after K attempts of different referees then the provider is penalized. This is the state machine of what I just discussed. Soon we're gonna have a very nice diagram but we're working towards a prototype that is gonna show in the next few slides. There are other several steps in the protocol which is to sign up, the provider must sign up, we have a sampling of the referee which follows a particular protocol and then the retrieval deal. The goal of the retrieval deal is that it can be possible on chain. The beauty of this protocol is that it could be cross-chain also. The appeal works very simply as I described before and there are two retrieval steps. The retrieval step number one is that the leader doesn't get the file and so if the leader doesn't get the file then they propagate a message saying that they didn't get the file and they go to the next step. If the leader sees the file then they share the file to the rest of the, to the rest of the committee and the majority of the committee signs that they've seen the file and that's how we skip posting the file on chain. We just sample a smaller network to check that the file is available and propagated and it can be witnessed. If the protocol can receive the file this means that the file was there from the storage provider and can then be propagated to the client. If there are K messages where the file was not well distributed then the storage provider gets lashed and you can think of this as lashing almost like an insurance contract where the retrievability deal, in the retrievability deal the client puts a premium that the minor will get if they're always honest at the end of the period and when the deal expires and if the deal is still active and the file is not being delivered to the client then they get lashed. We're a prototype of this smart contract and we deployed this smart contract three days ago and it's live on Ethereum Testnet and you can create deals and the way you create, you can create deals, you can see where are the live deals, you can cancel, create them, withdraw funds from contracts and there's this lashing process and right now we have a small set of retrieval nodes that are part of the referee nodes that are part of the comedy and this is how it's gonna look like and this UX is just for the MVP and anyone can put any IPFS CID in here they can put what is the value of the deal, the premium and then what is the duration of the deal and so for example someone wants to make sure the candidate deal throughout the course of a month and then you will be able to manage your deals, deals are stored as NFTs and so you will be able to see NFTs in your wallet that shows which files you wanted to ensure. The goals of this project is that we don't want to be a new network by any meaning, we want to be able to compose with every network. We want to give the ability for anyone that want to ensure, sorry, we want to give the ability for anyone to ensure IPFS files and any node, not just file coin nodes, should be able to provide the service. Of course we want to target file coin first and although the smart contract has been deployed on Ethereum as soon as we have the FDM this is going to be made in Fycorn as well. Another goal is that it must compose with other ability solutions and that we are, our product labs and in CryptoNet in particular are also working on and the goal is that there will be a set of smart contracts that other future providers can pack up and offer a single optimal ability solution. And we want to give flexibility on pricing so clients and storage provider can choose the premium and the slashing. And then finally, we want to be WebScale. We want to make sure that any file, any data on Fycorn can be insured and also data from other storage providers. This is the QR code for the Data Retailability Oracle project. As I said before, I'm just representing the project. This has been worked from Luca and Irana at CryptoNet with the help from Sebastiano at Yamadigital and Giorna from HAD. And you can join the conversations on the NewsLack channel that we created which is called Retailability Oracle. And more broadly, there is a lot of work that we work on at CryptoNet from SNAP research to vector commitments to threshold network and Fycorn improvement proposals and we're always looking for great engineers to join our team and in particular, product managers that can take some of these ideas and turn them into products. So if you're interested in that contact us, if you're working on similar solutions and you want to integrate with our, with the Data Retailability Oracle but also with other projects that you're working on, feel free to reach out. Excellent. Thank you, Nikola and Luca and Irana for that interesting overview of your work on Retrievability. Really exciting news about that prototype. Look forward to seeing it deployed on FEM and for giving us some insight into your future goals about composability and scalability in the retrieval research. We have time for a few questions and we've got one over here. I'm just curious if what you would consider like prior art in this or like, what have you been inspired by on the sort of retrieval insurance thing? I think that, so let me put it in this way. Since 2017, some of us have been thinking about these problems and in the past five years, we looked into a lot of different things from. So I've been influenced personally by a lot of people that cross through protocol labs even in person, in the company but also at events like this. There has been several things that we didn't think that would work in the past and then they got wide adoption. And then there's a protocol that is changing. I think they do Oracle really well. And what's very interesting about that is that the economics works out such that in order to fake the Oracle, so fake an Oracle outcome, you need to buy a large amount of tokens in the engineering and that's very expensive. And so the question is, is a solved problem to do an Oracle for getting values of websites? It's very difficult to do an Oracle that retrieves storage from storage providers or even from IPFS nodes and then it serves the data back. So that's a little bit more of an Oracle. And most of the, I think, Chaining has inspired a lot of this in a way. And then also in other solutions that we have protocol labs that some that didn't make and some that are working in a different way. For example, FiveCon Saturn is a great example to look into and this is just a different take on renewable living. And this by no means the best way, this is one of the many and it provides a specific service which is insurance on renewable. Thanks, one more question. Cheers, thanks very much for the really good talk, Nicola. I was just gonna ask you to expand a little bit more if you wouldn't mind about the insurance model and how you're thinking about the pricing and what kind of factors and how that's structured. Yeah, so let me tell you, some of our team are, we are a team of researchers and not a team of economists. And so what we decided was that instead of writing inside the protocol, what would be a good price for insurance, we left it open. Or what is the token that needs to be used for insurance? We decided to leave it open to the market. And this gives, of course, a lot of flexibility in the, I'm sure in the MVP, we're gonna pick some members who are working with I think with some people from BlockScience trying to understand what would be a good value or a good pricing mechanism. But there's one, so to answer you, I don't have a good answer on what would be a good price. There are too many factors to be taken into consideration and there's still analysis that needs to be built through this. But there's one thing that is important is that it doesn't mean that the search provider is necessarily losing dollars or file coin. It could even be that the search provider puts as collateral some reputation score. There's another project that Luca is gonna present later on which is called the storage metrics now. And the idea is to collect metrics from the network. And these metrics are basically the reputation of a storage provider. And every time you're gonna lose files, you're gonna lose reputation score. So that's interesting for collateral. And in case we want to have under collateralized the retrievability deals. Then on the premium, I think likewise, the premium, if it's a file that is very important, the client can put whatever premium to the miner as a reward. Thanks again. For those that are interested in doing pricing models for this, this is the best time to get into this lecture and get in contact with us. Great, with that call to action, I think we'll thank the Data Retrievability Consortium one more time.