 All right, I see we have about like 20 people here. So I guess I can start. And so I'm going to go over a presentation and you can leave your questions in the Q&A section and I'm going to take a peek at the Q&A from time to time. All right. So what I'm going to talk to you about today is a way of a reducting transactions from fabric but this technique is not limited to fabric. You can also employ it in other blockchains that have a similar architecture such as fabric like an execute order validated blockchains. And I'm Yako Menevich and this is joint work with the Artem Barger and Gal Assa. Artem is my colleague from IBM Research in Haifa Israel and Gal is from the Technion in Israel. And by the way, can you guys hear what I say? Yeah, we can hear you, okay. Thanks. So the purpose of this talk is not only to promote a paper, a research paper that I presented in a blockchain conference not long ago but also to actually promote this work within the Haifa fabric community and hopefully with enough support from the community, one day this work which actually has been implemented, however it's not production ready, I mean the implementation is not production ready yet that one day this work will be incorporated into the official Haifa fabric upstream. So I guess I don't need to go over this slide. You all know Haifa fabric, a permission blockchain platform in the Linux Foundation and it has Turing complete smart contracts. You can write smart contracts in basically any supported language like Go, JavaScript, Java. And one of the things that differentiate fabric from other blockchains is that it has a unique architecture which is the execute order validate paradigm. Let's go over briefly over the architecture of fabric. So we have an SDK. The SDK is embedded within an application and the SDK is used to send transaction proposals to peers. A transaction proposal is really sort of, you use it to convey an intent, what you're going to do in a transaction to the peers. And then peers, they execute the smart contracts and they return to the SDK something called the sign proposal response. Basically, they execute the smart contract and they sign over the execution results, also known as endorsement. Then the SDK takes all of these endorsements, stitches them together and sends a, and creates a transaction signs over this transaction and sends this transaction to the ordering service. Now an ordering service is a cluster of several nodes where each node is called an ordering service node. And the role of the ordering service is to actually create a blockchain itself. So these nodes, they create a, by running some kind of content algorithm, they create the blocks. The blocks are disseminated to peers and then the peers validate the transactions and only apply the valid transactions. And transactions that are deemed invalid, they are simply ignored. However, as you know, the transactions, they all end up in the ledger, in the blockchain. Let's see if there are any questions, no, not questions yet. So let's go over something completely unrelated to fabric which is the GDPR. About five years ago, the European Parliament came up with a set of laws and regulations that basically mandate how are companies and organizations inside Europe, how they deal with data. So the regulations specify three types of actors. You have data subjects, which are really people whose personal data is processed by entities called data processors. And the data processors, usually their servers, they are entities that they process the data of the data subjects on behalf of the data controllers, which are entities that determine the purpose of the processing data of the data subjects. Now the GDPR regulations, they specify all kinds of things. For instance, the data subjects should be informed what data is being processed. They also specify all kinds of limitations to the processing of data. For example, you cannot process it for a reason that is not known to the subject. Also, you need to minimize the amount of data that you keep in order to processing, for the processing, do not like store information if you don't need it. Also, when you store the data, it needs to be accurate, meaning that the data subject has the right to correct the data that you store about him or her. And storage limitations, also the GDPR dictates for how long you are eligible of storing the data. And also it specifies that how you need to protect the data against any kind of unlawful processing. And it also specifies that you need to back up and save the data. Okay, no questions so far. So there is a section in the GDPR that deals with which is called the right to erasure or in other words, the right to be forgotten. And this section states that the data subject shall have the right to obtain from the controller the erasure of its personal data. Meaning that the data subject can reach to a data controller, to a company and ask that its data will be erased completely. And then it is the responsibility of the data controller to make sure that the data processors also do not store this data and that this data is eliminated completely. Of course, there are all kinds of corner cases such as financial data, but let's not talk about them. And what I want to focus on is the following thing that basically a blockchain and the GDPR are a contradiction because the right to be forgotten regulation it requires deletion of transaction data from the past. However, if you delete data from the past you change the data that the network nodes see in the future. You change this data and we know that blockchain is all about immutability. Each block contains the hash of the previous block. For example, transactions, they're all cryptographically signed. So if you change any data in a transaction then the signature will no longer be valid. Also in some systems say in some blockchains such as fabric the blocks themselves are also signed. So basically we can say that when you change the past it basically means that you invalidate the transactions of the present. And this undermines everything about blockchain. So this problem has been, let me see if there is a question here. GDPR is about personal data fabric currently is used mostly in enterprise for business transactions. What do you think is the personal data in the transactions? Yeah, that's a good question. So it can be all kinds of data. For example, medical data, maybe sometimes data about financial data or data about if you use like fabric to audit all kinds of things like certificates of fair people. Basically anything that can be related to some person this is a data that you might want to be able to raise under conforming to the GDPR. So let's go back to the presentation. What is the state of the art in blockchain reduction? So in the literature they're having proposed all kinds of techniques. The most prominent technique is using something called a chameleon hash function. So a chameleon hash function is basically a function, a hash function that uses public key cryptography. And it allows you, it allows someone that knows a chapter of the hash function. It allows that person to create a hash collision. So you see here in this example the hash is given in blue rectangle and the hash of the green rectangle is the correct hash. However, if someone knows the chapter it can create a hash collision. The chapter basically allows you to carefully select a string which under the chameleon hash function you will create a hash collision and you will map this string in the red rectangle. You will map it to the same string in the blue rectangle. And how do you use this in redacting data in blockchains? So basically imagine that you have this blockchain on top, you have these blue blocks. And if you know the chapter, then you can alter the content of this middle block and create a different block which is orange. And then you can use this technique to reduce data in transactions. Basically you can use this technique to delete information of users, right? To comply to GDPR. However, there is a one small problem. And that is if someone knows this chapter it can also maliciously alter the entire blockchain. And this is a problem, right? Why? Because that person can fork all of the nodes in the entire network. And some works in the academia, in the literature, they propose all kinds of solutions. For example, let's maybe have one chapter for one block. It's still a problem because this chapter, it can be leaked, right? It can be stolen and it can be used maliciously. And then other academic works came and said, okay, let's take this chapter. We will split it into many secret shares and we use multi-party computation cryptographic techniques to compute the hash collision without any party knowing about the chapter. However, this is all very complex. So in this talk, we will see a very easy technique to comply with GDPR only in blockchains that are of the execute order validate paradigms such as fabric. Meaning that you execute the transactions before they are put into a block. This is what the execute order validate paradigm means. So by definition, execute order validate blockchains lack post-order execution, right? And an immediate conclusion from that is that the value of a key does not alter the validity of the transaction. So in other words, a peer can safely process transactions with all key values being removed. Let it sink for a bit. So if you are a peer, you can look at a block and when you validate the transaction, you never look at the value of a key. Only about, you only look at the signatures or of the MVCC version of that key, but you never look at the value. And this means that a peer that joined much, much later after that data has been reluctant, it doesn't need this value. It doesn't need this value to know, to answer in the same manner about the validity of the transaction just as other peers that process this transaction before the data was redacted. So let's make a deep dive into the format of the fabric transactions. We'll see some more questions. Information as an asset, I get an asset in my pocket and then give the owner the asset, the ancient asset. Not really, so anonymous asked question. So the answer is not really basically, if you delete the data after it has been processed, all peers that have processed this data in the past, they already recorded this data in their world state database. So they don't really need the ledger. The only peers that need the ledger are ones that actually came to the network after the data has been redacted. And I'm actually talking about this in a few slides. So someone asked, in your opinion, putting data on the PDC and then removing it from PDC can be interpretive implementation. That's a great question. Indeed, that is one of the ways that customers and users implement this requirement now. However, in fabric, it is still not possible to manually and explicitly delete the PDC data. You need to wait for this block to leave parameter blocks to pass. And this is why A is still not GDPR compliant. Would you consider an identifier to a natural person as PII data? PDC, PDC is a private data collection, private data collection. So let me go in identifying to a natural person as PII data. I don't know the, I'm not a lawyer, so I don't know the answer to that. So if we go back to the presentation, basically to comply to GDPR and to be able to redact data, you need to slightly alter the format, the data format of the transactions in fabric. So if we dive deep, we see that we have these fields in yellow and these fields in yellow, especially the most interesting field is the value filled in yellow in the key V right, these are the fields, these yellow fields, these are the fields that we want to be able to redact. So what we're going to do is simply replace these fields instead of putting the value, we replace them with their hashes. And then we still need the pre-images of the hashes, right? So we simply allocate a new space in the transaction, which we call a pre-image space and there we simply dump all of the pre-images, just as in this slide. Now, in a nutshell, the technique that we implemented and also benchmarked, it allows you to redact data on chain in fabric, it goes like this. The first thing we need to do is relocate user data to a special place in the transaction, just as I said. So you see on the left, you see the transaction structure before and on the right, you see the transaction structure after. So now after this change, the value in the transaction, it contains the hash and there is a separate place in the transaction which is not signed. You see the signature is only signing over the payload. However, the pre-image is now taken out of the payload. The second thing you need to do is replace the original user data with hashes, just as I said. And now another very important thing is you basically need to migrate these pre-images. You need to migrate them in all the messages that are involved in the fabric's transaction life cycle. So in the peer side, when the peer creates its endorsement and the proposal response, it allocates a new pre-image space just for that. Then the client extracts all of these pre-images in the proposal response and puts them in a new place in the actual transaction. And then when the client, the SDK, sends this transaction to the order, the order also does something similar. It basically strips out all of these pre-images from each transaction and puts all pre-images in a special place in the block called a pre-image space. And in this way, the data of the block which contains the transactions, it never contains any pre-images. All the pre-images are in a special place. Now, how do you actually do the reduction? So consider that we are going to reduce transaction TX3 in block 42. So what we're basically going to do is simply we're going to zero out the pre-image space in block 42. But again, this slide is not really accurate because the pre-image, so like the pre-image for transaction three, it resides in the pre-image space and not in a space per transaction. This is just like to show you. However, there is a problem with doing that. And the problem is it's actually related to something that someone asked. And the problem is that let's assume in a situation where you have a peer that joined the network before a reduction. And let's assume that in block 100, you have a value bar and this value is being assigned to a key foo, right? So transaction three assigns bar to a key foo. And then in block 102, there is a reduction transaction. And this reduction transaction simply needs to zero out the pre-images in block 100, right? And also let's assume for the sake of soundness that in block 103, this key is deleted. So then a peer that joins after reduction, let's assume that this peer for now, it only got block 100 and it didn't get block 103 yet. So it is not aware that the key foo has been deleted because that peer, it is still in block 100. So imagine that this peer now receives chain code, a smart contract invocation request at which it needs to read the key foo, but it doesn't have the pre-image of key foo. So that's why we add a new constraint, which means that if you as a peer, you read a key which was redacted, then you simply instantly abort this execution. And we mark these keys as crippled keys. Now this crippled key will be crippled until it will see block 103 where it will know that key foo is deleted. And then it doesn't need it to remember it anymore as a crippled key because it is deleted from all peers at that block height. Another thing that you need to do is you need to filter out transactions with missing hashes because basically a malicious client can send a transaction with mismatching hashes and we don't want to create blocks that cannot sound. So the ordering service also needs to do some processing. And also in order to reduce the transaction you need to collect signatures from the eligible parties. The most important thing is that we performed micro benchmarks. We basically benchmarked the peer commit throughput in both CouchDB and GoLevelDB. We ran a total of one million transactions per child and 10 total childs and we averaged the results. And each transaction contains key rights to five out of 10 possible keys. And basically the overhead that we measured is that if you use fabric with the reduction technique then you get a 19% reduction in throughput if you use GoLevelDB and if you use CouchDB then the penalty, the overhead penalty is 7%. And this makes sense, right? Because CouchDB is significantly slower than GoLevelDB. Another thing that a peer now needs to do when it receives a block from another peer it needs to basically validate the block. I'm not going to go over this in detail because we're lacking time. And now to close I will want to, before I take questions in the Q&A I will want to also mention other approaches for complying with GDPR and blockchains. The first approach is off-chain storage which is actually similar to what a PDC private data collections is doing. You basically store on hash, you really store on the chain only the hash and you store the pre-image in some side store also called side DB. However, the problem is that this creates a data availability problem because if you for example use PDCs and you from some reason the private data hasn't been replicated to enough peers and the peers that did have this private data they crash, you may lose the data. With this, with our technique, you never lose the data because the data is remained on-chain however you can still deduct it. Another approach that exists in the literature is that you can basically collaboratively sign on the block replacements. However, this is not so good because you cannot deduct data of transactions that other transactions have seen execution on. For example, if I transfer a coin to someone then you cannot deduct this data because it will create problems and the final approach is pruning and basically it involves taking a snapshot of this state database, removing the ledger files and then you implicitly deducted the ledger because you can remove the ledger files. However, the problem here is habitability which is lost when the data files are removed. Let me see if there are any questions. Where do you stand in terms of implementation? That's a great question. So let me paste the link to the... So we implemented the key parts which do not include the... I pasted it in the chat. We implemented the key parts. The thing that we did not implement are the actual reduction where you zero out the bytes and also the crippled key. Besides that, we implemented everything and is your proposal solution with premium segregation to implement proactively or actively? I'm not sure what it means proactively or reactively. Benedict said, I really like the idea but doesn't this assume that all peers are honest? No, it does not assume all peers are honest. There is that slide with the tech algorithm and basically a peer can verify if it receives a block from another peer that has problems in it and the dishonest peer may not decide, oh, okay, yeah, yeah, so absolutely. So you're correct, Benedict. If a dishonest peer decides not to delete the data, then this solution falls apart. However, this solution, I mean, this problem that you described exists in any system you will imagine, right? Because I can take the data and I can post it on Facebook, right? If I do it or someone else does something like this, then you cannot fight it. The only thing that we're trying to achieve here in this work is to still write your data on chain, however, to be able to deduct it after time if someone wants to. Let me see if there are questions that I, yeah, yeah. So I believe there's a function to remove. No, no, no, so there is no function yet to remove private data explicitly in fabric, but this is something that is being worked on. You can still see your data in log files. Yeah, that is correct. And if you go to the JIRA, if you go to the JIRA, Rafa, you will see there that the JIRA also mentioned mentions in log files and also things like that. Oh, Mark was asked, does your approval only apply to state updates or is it used to delete invocations plans? So that's a good question. I, you can use this approach also for the invocation parameters. I simply focused on the rate rate set because that's the interesting part, but you can also use the same approach to delete the other parts. Let me see if there are any more questions. I think I covered them all. So thank you very much for listening and also for asking the questions. And you can email me if you want, let me put here my email. You can email me if you have any further questions.