 Hi, everyone. Back in June, I introduced our project on provide the record liveness on which we're working with Mikael and Leo from Barcelona Supercomputing Center. So this demo today is to present the almost final results of the project. So I'm going to start, of course, this is work that has been done in Problab, which is focusing on protocol benchmarking and optimization. And I'm going to start with a little recap of what I said in June on what is a provider record and where does it leave. So in order to do that, I'm going to go through the IPFS design on the contents life cycle. So what happens from content publication to content request and retrieval. So assume you have a document you hash it and what you put on the IPFS, as you will know, is a provider record and the provider record includes the contact details of the publisher as well as the CD of the person publishing the content. So what the DHT does then it's doing some magic and it's finding a proper node to store the provider record. Then on the other side on the retrieval side, the requestor would have to know the CD of band they they're going to go and ask people for a bit swap there immediately connected peers. And if those answers are negative and they're going to go to the DHT and ask for the same CD, which the DC hopefully will again do its magic and hopefully end up in the same node and request for the provider record. So at that point, the requestor has got provided a record so they do have the contact details of the provider, which means that they are going to contact the provider set up a connection and transfer the data. So what is the hypothesis of this work. So as we said, as we've seen a provider record is a small file that includes the contact details of the content publisher as well as the CD, and it is published in a number of different nodes in the network. So in the previous example simplistically that was one node, but in reality is 20. And the replication here is done because we want this provider record to be live in the network to be stored somewhere. So in case some of the nodes go offline or they're overloaded, or they cannot respond to the request. So this means that there are going to be others that have to provide the record, which are findable, they're online, and they can provide it to whoever requests the content. If there is no provider record in the network. If all of the nodes have gone offline, then this means that the content is unreachable, which is pretty bad. The hypothesis for this work is that because we've seen high rates of churn in the IDFS network which would reach up to 70% of peers have left the network within after only two hours after joining the network. And that's, you know, a lot of peers, a lot of those 20 peers have got good chances of having left the network, which leaves a few replicas of the provider record to the inside the network. So we wanted to see whether there are cases in the IPFS DHT where provide the record but basically are not live anymore and therefore content is unreachable. So what we did is that Miquel build what is called the CID order. You can find this URL down there on GitHub, which basically is a tool that produces content, produces CIDs, produces the provider records of those CIDs. It stores them on the IPFS DHT and then monitors those specific nodes to see if they're still online, if they're providing the provider record, either serving the provider record or not. So there are several features. I'm not going to go into the details there, but that's the main functionality. And we tested that over the live network, and we wanted to answer some questions. So the one of the main questions is, as I said before, does the record stay live until the published time. So provide the records are republished every 12 hours to make sure that they're alive in the network and despite network failure and we're still going to find the next 20 years that are online at this point in time in order to replicate the record. The answer to the question is yes and we see in this graph where we have on the Y axis we have the number of nodes where PRs are available and on the X axis we have the time since the CID has been published or the record has been stored elsewhere. So it goes from zero to 38, so the provider record would be republished at this point after 12 hours, but despite that, obviously the CID order does not republish records, that's the whole point. So we see that the record stays live to approximately 15 nodes for more than 35 hours, which is a good thing, means that the current DHT keeps records live. So no content is unreachable. Now the next question that comes to mind is if records stay live due to hydras and yeah we excluded hydras from the requests that we've been trying to make in order to get the provider record to find out that excluding hydras we still have on average about 20, sorry about 12 nodes that keep the record alive for more than 35, for more than 35 hours. Again great news because it means we are not really affected or not affected but we're not really dependent on hydras. So what does this mean practically? It means that perhaps we can reduce the value of K from 20 which is the current replication factor of 15. And we've done experiments on this as well so we reduce the replication factor, we publish CIDs, we publish provide the records and we monitor it again for how long to peers stay online and keep the provider record live. So we see that of course there is a little drop down from 12, sorry down from on average 15 to an average of 10 which again stay live for more than 35 hours, which is again great news, it means we can apply some optimization to the IPFS DHT as it is today. So what does this mean practically? As I said, the publish interval on the IPFS DHT is 12 hours. So perhaps we could consider increasing the republishing interval and we found out that we can at least double it because we've seen that everything stays live for at least 35 hours. You know, even more than double that than double the 12 hours would still be okay in the extensive set of experiments that we've run. So, but what do we need? There is something that we need to be careful of at this point. To publish a CID, we're trying to find the 20, the K closest peers to the CID in extra distance in the academia DHT. So we need to make sure, you know, if peers come and go, if those peers that we have chosen in the beginning of publication time are still the closest peers after 12 or 24 or, you know, whatever amount of time we choose for the republishing interval. And again, we find out that 15 out of the 20 closest peers chosen initially are still among the closest ones after more than 35 hours. So we see this, we see here that initially it's around 17, it drops down to 16 and then stays stable at 15 nodes. So this means that 15 nodes actually keeps the provider record live up until 32 hours at which point it goes down to 14. And the question here is, does this include the hydros? It does include the hydros. But again, if we exclude the hydros, we're going to see a drop of two to three nodes from that. So it would go from an average of 15 to an average of 12 and would stay like that for more than 30 hours. So the conclusions of the study. There is a final recommendation that will be coming soon. We found out that definitely there is significant space for improvement. And these builds on the case that DHT servers and DHT, sorry, content providers are actually overloaded. And they have to run high CPU machines, they have to consume lots of bandwidth and so on. It has been a longstanding issue in the IPFS network. So this means that if we can reduce the overhead, then these will have quite significant impact. Now, roughly, if we go from K20 to K equal 15, we roughly had 25% reduction in overhead. And if we republish, if we increase the republishing turbo from 12 to 24 hours, obviously you understand that this has got about 100% reduction in overhead. Of course, it has to be noted that by overhead here doesn't mean the entire overhead of those machines is just everything that is provide the record related. So sending provide the records, receiving provide the records, storing provide the records and so on. Yeah, we're not aware what percentage of the overall energy consumption of the servers this is, but definitely this is going to be worthwhile reduction. So we've been working on this with the team, as I said, we've got more grants on radius. We have the final report which is very extensive several several pages with like many tens of more figures and results that what I presented here, you can find it in the website. That's it. Thank you. You can get in touch. We live in probe lab on IPFS discord and also in slack. Thanks everyone. Cheers.