 Welcome. My presentation is titled Why You Shouldn't Just Trust Your Blockchain and Apply Critical System Design and I hope you will like it. So, what I would like to begin is to hammer a point of home that I'm sure most of us are already aware of is that a blockchain by definition gives us guarantee on one single aspect. Quoting the Hyperledger Fabric Paper, a blockchain can be defined as an immutable leisure for recording transactions maintained within a distributed network of mutually untrusting peers. This is a strong statement but there's also a lips in this statement. It says immutable but doesn't say anything about the transaction being the right transactions meaning in the practice that my smart contracts compute actually what I want them to compute. It doesn't say anything about high availability, the necessary throughput and latency and so on and so on. All it promises is essentially a high integrity database. What will this presentation be about? My beginning point is, opening point is that distributed ledger technology is not an extra functional panacea if you like. It doesn't solve any and all problems out of the box. We will see that if we are careful many problems will become solvable. But reasoning and actual guarantees is possible and this is becoming quite necessary in the system engineering context and smart contracts are especially problematic from the point of view of further extra functional guarantees. I will proceed to lay down our first blocks of a dependable chain code lifecycle model. This is something we are working on actively. I will try to draw limelight to the fact that some tricks from the old engineering toolbox will be usable especially on fabric and I will show some results in advancing fabric chain code static analysis. These are all research efforts that we are working on at the critical systems research group at my university. Some knowledge of hyper ledger fabric will be beneficial here but really not that deep. The reason for having the picture of the nail and the hammer is actually two-fold here. What we are seeing in academic research is more and more people and more by DLTs are trying to apply DLTs to domains which these systems really weren't originally intended for. Point one, point two, coming from a research group that used to be called fault tolerant systems research group. It is only natural that we try to apply those techniques that have been around in fault tolerant systems for decades. Hopefully, usefully and successfully. With distributed ledgers quite a few exciting possibilities can be read in various papers and surveys and meta surveys and for my sins I have to read these. People are trying to do federated learning, robotics, data marketplaces, grid management as in electric grid management, industrial IoT, smart cities, digital twins, railway operations and so on and so on based on a distributed ledger because these are immutable data stores, multi-party consensus provide security for them so it should be much better than a single centralized entity. Not so fast. Let's play with one of my wilder ideas if you like. Is anybody from Siemens around? They are a world sponsor. Nobody. And it's good this way actually. So let's perform the following thought experiment. We have our classic railway infrastructure and that's been working quite well for quite some time. Granted a bit pricey but still these things are appearing and they are supposed to be self-driving. When they really become self-driving and weather is the flow of the software or the driver if before an accident the self-driving software switches off say five seconds before it. That's again another matter but these things are coming. And various people are having various wild ideas. I had one of those too that in order to coordinate those two maybe we should do some movement authority registry when there's a unguarded crossing when can this little thing cross and because DLTs are so good and immutable and have an audit trail and so on and so on and so on. Let's do it on a DLT basis. Okay, let's try. When you give this task to a system engineer first they will have some choice words and second they will begin to have a look at the structure of the system. And then a system engineer worth his or in the case of my wife her salt we'll begin to look at the various internal error modes and system level failure modes of the thing we are having in our hands. The internal error mode simply mean that we have components in this stuff. What are the veils ways it can fail if you are familiar with hyper ledger fabric. You know that endorsement endorsement that you expect it to go through may fail or they can be too slow. You run the thing in a cloud and you have performance interferences or too many organizations may be available due to network problems or too many organizations shall I say it may be compromised because that's not fully out of question either or you may have messed up your chain code in a way that it gives back some garbage. Let's call that course data error or in a way that it gives back to you some subtle error in the computed results that should be written into the ledger. Let's call that subtle data error or your chain code makes unnecessary range queries. Whole bunch of stuff can happen there going forward to ordering your endorsements packaged endorsements may be late to get into a block again for instance because the ordering services simply overloaded or someone is doing some rather malicious constant front running specifically your transactions or the ordering service may have been compromised at the MVCC stuff. We are still not out of the water because unexpected conflicts may happen because do you really know what other chain codes are working on that channel and what other transactions are trying to get into the ledger. The end result is that you can have updates that missing in contrary to your to your original plans. Updates may be late updates may be superfluous on the key value level and updates may be even erroneous even in detectable or undetectable hard to detect way. So let's see what happens in our thought experiment if we have unexpected conflicts because there are other transactions that are trying to get onto the ledger too. The message that I will try to put on the ledger is that the train will be in the intersection in a specific 5-minute time interval don't cross because this is a time when you shouldn't cross. Everything is okay so the transaction is proceeding in the logical architecture of fabric. Here it becomes late at the best case because the client will see that this didn't go through and it tries again but it will get on the ledger later than you originally expected. We will have late updates and the end result is that we either have no problem because well they put it there in the witching hours and I will go to that crossing only 8 o'clock in the morning or this may result in a hazardous situation because if that don't cross comes later than I'm still only approaching the crossing that will be a hazardous situation so you will have to create a structured argument on either avoiding or mitigating this during your design phase. Let's see another example. Let's say that we did some subtle chain code data error. I believe there are quite a few computer engineers, IT people here and I'm sure that if you are one of us then you have seen and possibly even yourself did some date time conversion errors. So if we are dealing with times and dates then well stuff happens. Should be code in testing but it's not always code in testing. So let's just say that I do some dumb stuff with the conversion and what gets put on the ledger refers to actually a time slot two hours after than it should actually. We go through, we propagate our error through the system. Here enters a subtle data error and it goes through manifests in an erroneous update at the service level and what it can cause is again no problem if I'm not interested in that interval actually or it can cause a super closely denied crossing. The crossing will become unavailable if you like because I want to come two hours later and I see that I shouldn't cross but actually the train is already gone or it again can lead to a hazardous situation. So what I try to get with this example is the following. First that example was certainly a very dramatized use case if you like but not overly. There are many academic proposals for industrial IoT, grid control, robots infrastructure but actually at the end it may not be about safety because that's something that we only touch very carefully but reliability, availability, timeliness and a whole bunch of other system level properties are important which are influenced by the way hyperager fabric in our case actually works. For those more control inclined this protocol actually should be certainly explicit authorization and positive control. You may only pass if you get a personal signed slip for having the authority to pass but in that case this would have been much less dramatic and for good while certainly DLT will only just shadow control in the safety critical space I'm not sure how long this will remain the case. But the more general idea that I wanted to support is that we have truly important functions in our IT systems. These we call critical systems. These can have many forms. We have business critical systems, mission critical systems, safety critical systems and even if you are not talking about safety critical systems your DLT based financial solution a CBDC say can be critical enough for this stuff to be actually important. So we need assurances and often quantitative ones. We have to assure integrity, reliability, availability, safety, maintainability and it's not just about crypto asset security contrary to what you can see in the solidity world. Critical system design actually has standards and quasi standards require when driven design processes as well as techniques to avoid, remove and tolerate faults and my specific message is about applying this stuff for hyperledger fabric. As we see the three major key cross domain concerns for hyperledger fabric are actually the following. One, reasoning about potential fault and effect chains in a specific application context. This is what I've been covering with my simple and dramatized example. The second major pain point if you like is or are the hidden common fault chain code. We all know from the solidity world that blockchains are all well and nice until someone has a bright decentralized finance idea, goes on to implement it, leaves a bug in the code, puts it on the mainnet and hilarity ensues. If you have money stake then not necessarily hilarity. Smart contracts in general are a common mode fault. You run it in 10,000 instances but it's still the same code if it's faulty then it's faulty. We already are beginning to have approaches to detect and remove those bugs in development and to tolerate them during operation which is something that Ethereum doesn't really can do but hyperledger fabric should be able to do. There's a third category I do not want to talk about here today, performance and ledger update timing guarantees. It would be very nice to have out of the box admission control for fabric to have some form of equitable block space allocation if I have IoT devices then to have the guarantee that every 10th block I have at least one transaction that I can put in and there are no so called age of information guarantees for the ledger but that's a talk for another time. For the propagating errors we are doing combinatorial search based modeling and analysis called error propagation analysis. The current evolution of which is in answer set programming. If you know prologue or constraint solving that's the modern way to do almost the same on those in those domains but that's not something that I really want to focus on here today. How can we create dependable chain code? Whether we should create it at all. You may argue that well chain code smart contracts are small pieces of software so I can see it through if I can read it through and audit it manually then it will be okay. That argument may stand. You can also argue that I don't have to do any special tricks during operation because in design time I test it, I perform formal validation, formal verification it will be fine. That argument may also stand. We actually don't really know for chain code because there's ether scan for solidity smart contracts and people all around the world are doing stuff like getting all the smart contracts from ether scan analyzing those building even AI models for correcting them. Fabric chain code is bespoke custom software so we don't have ether scan. We don't have rec.news or any of the similar sites for Fabric so when some mishap happens due to chain code faults, errors inside some large company we don't really know about that. There are indications that chain code deployed to production is not always completely fault free but we won't really have any public registry for that. We don't really have these for Corda or Demo or any of the other permission platforms either. It's just the way how these smart contracts are. But I submit to you that empirically the solidity world itself is a counter argument. Solidity smart contracts are not that long. They can't be that long otherwise you would pay an arm and a leg to run them. Still, there are errors. Arguably, creating correct chain code is a harder problem for Fabric because it's not only about losing assets or avoiding the loss of assets because for solidity code that's the alpha and omega of V and V because what do you do on Ethereum mainnet? You juggle virtual assets and that's it. If you want to use this DLT for something else, control for instance or command and control of cyber physical systems then the properties that I will be interested in will be actually a wider set. And it's still manually written bespoke software and faults may be sparse but existing for IEEE does have standards for the number of faults to expect in thousand lines of code and for code that's been developed properly shall we say it's a low number I believe one in thousand lines but still not zero. We did actually an experiment to show that it may be more than about losing assets. Back in 2020 with colleagues from Coimbra in Portugal we did an experiment where we took solidity smart contracts injected various programming errors automatically into those smart contracts and tried to run them on the Bureau EVM interpreter on Hyperledger Fabric it was a bit of an unnecessary complication but still it demonstrates it will demonstrate my point. We had a bunch of faulty contracts. Formal verification did detect quite a few of those so those programming errors but by far not all of those and those were back then the state-of-the-art tools. Contract self-checks if we do those so special asserts that we intelligently put into the smart contract most people don't do that still for solidity did catch quite a few errors again so that's nice but we still had quite a few that had been passed simply by what we call runtime platform checks so the EVM didn't throw an exception everything went through and what follows here is already a huge problem because when we say output invariant check that only says that the client receiving back the transaction sees that something is fishy here and by that time the modifications are already on ledger so the fact that so many faults were ineffective doesn't mean that it's not a problem it just means that we should have had a test case set that activates these faults more easily more readily so what we began to put together is actually a rethinking of very classic patterns from fault-tolerant software there's a decade old still developing library of patterns for architecting fault-tolerant software and in the end when we talk about fault-tolerant chain code dependable chain code it's actually tolerating faults in the chain code execution service so it's based on actually the very nice tomb of Robert S. Hanmer and you can see that yes in development we can do fault prevention and removal and we are doing it by coding practices testing fuzzing static analysis formal verification but certainly model-based design will be also important and we had a presentation on that here today when I generate my token contracts then if somebody took the time to go through the generator logic well then probably the quality of that code will be better than something that I just slept together and we are beginning to see reusable domain contracts too if you are familiar with demo they have implemented the common domain model is that so if there's something that's as reliable as open zeppelin in the solidity world then maybe should just use that or extend that my message is that with hyper ledger fabric we can do even more meaning that even in operation we can do error detection and fault-tolerance for chain code and possibly we should also have error recovery so when we see that something is problematic then we should be able to do some quarantining and checkpointing but that's that's still in the works technologically the reason we can do this is that for high in hyper ledger fabric we do not pay for every execution step so I can do software diversity meaning that if I have the same chain code functionality implemented one two three four five days this means that if those different implementations don't have the same implementation faults then they can cross check each other and I can do some voting mechanism to be able to do that on the technical level actually the read write set model is something that's quite important so I will try I will talk a bit about these and at the end about the somewhat sorry state of chain code static analysis to which is arguably important just not the only piece of the puzzle so inversion programming is a very simple idea actually you do independent implementations of the same function and do a very reliable voting voting is easy simple you can implement it so that it it will work hopefully the hypothesis is that after a point it will become cheaper to do that at runtime then testing your chain code further and further and further and doing formal verification and validation even more to reach the same level of confidence now the interesting thing is that for fabric the application of inversion programming is rather straightforward and there are multiple possible architectures to implement it one way to do it is that you have your organizations you have a set of diverse implementations of your chain code version 1 version 2 version 3 due to the way fabric works we can even say that one one team does it in Java one team does it in Python one team does it in Node.js and you simply execute all of them and do some voting with three voters you can you can actually even mask a single a single fault in one of the containers okay and organization 2 does the same and the whole thing is actually transparent to fabric way to do it is organization 1 runs version 1 organization 2 runs version 2 and you can actually even ideologize it because you can say that all organizations shall bring their own implementation for the common business because they have to understand it and shouldn't necessarily trust trust an implementation that somebody gave to them but that's beside the point in that case you can say that the fabric consensus itself is a voting mechanism is a majority voting mechanism so whatever majority consensus you specify in your in your endorsement policy that will be used for that purpose granted if you begin to think about this this can interfere with your consensus guarantees in a major way but it works runtime approach to is what's called runtime verification so the idea here is that whatever we compute as ledger modification there may be certain properties that must hold on all ledger modifications and actually from the solidity where there's a very simple widely used property for that when we are handling some kind of asset money then the first thing that one tends to verify for solidity is that the cumulative sum of assets at the beginning of the smart contract call and the at the end of the smart contract call is the same no money is created and no money is destroyed unless unless the function is explicitly about creating or destroying money and the pattern in fault tolerant computing to do that is that there's a system we wrap it in an instrumentation wrapper and there's a monitor component that observes what's happening in the system and gives feedback to the system and when it decides that a internal execution path of the system does not comply to the properties something happens what should happen is is case specific you can say that we can wrap the chain code into instrumentation and see whether the changes it tries to make on the ledger or are compliant with known properties and a very good student from the University of Alberta Bahari last carry in a mentorship project is looking at that it's useful it's easy to implement technically because the if you want to be really transparent it's just a wrapper around your chain code implementation inside your chain code container but it's limited in scope because if you want to cross check with other data that's on the ledger for that you would have to read from the ledger and that interferes with the read write set of the chain code execution although this may have been a bit too technical option to is that you may want to look at the ledger content where you can do preventive measures but that needs transaction validation in integration when we do the MVCC there at least theoretically we can do runtime verification whether the modifications that we try to do are in line with a set of properties that have to be upheld for the ledger content or it can be reactive so we can say that somebody monitors the ledger and looks for patterns that bring the ledger content to a state which is out of line with respect to properties but then you have to be able to do coordinated rollbacks across the participants of the consortium which we believe is a function that would be needed but not implemented and is technically somewhat challenging to be able to do this there is technology if you want to delve a bit more into this the Java modeling the tool support for the Java modeling language seems to be readily applicable to do chain code based runtime verification so and as a last closing constructive constructive thought I would like to go back a bit to the first phase the only phase that's being done for solidity for obvious reasons when we are looking at our smart contract source code and want to make really sure that no fault remains in there so it's understandable that the solidity word fixates on vulnerabilities specific code constructs that have known at tax if you like many of the papers looking at the problems of hyper ledger fabric chain code chain codes use the same terminal terminology and try to use the same approaches and this may be actually a misnomer so usable solidity vulnerabilities are very specific and for chain code interesting effects are more diverse and we don't have a vulnerability registry and arguably will never be so it may be a better approach to focus on detecting patterns of code weaknesses such constructs that may lead to improperly behaving chain code and well if we are at that inject them to evaluate effects and gauge the effectiveness of the few VNV tools that we already have for chain code using a common platform the difference between weaknesses and vulnerabilities is easy to catch if one opens up the both Mitre hosted the common weakness enumeration and common vulnerability enumeration one speaks about specific exploitable problems in code and the other one gives you a taxonomy of ways you can mess up creating software systems so we have a early prototype for a system when we have patterns of weaknesses we created a visual syntax domain specific language to describes though describe those a whole bunch of things happens here the key thing is here is that we are trying to reason about chain code in terms of actual structure so no regular expressions abstract syntax t3 annotated abstract syntax three and do pattern detection and pattern rewriting on that now this is how what a pattern looks like as an example the weaknesses that we read ledger data but do not use it this is a known weakness of chain of chain code in chain code programming could we do better possibly it's not that readable and those who may be worse than this stuff this is a trick here because it refers to a this doesn't use that there's the not relation that we computed on the graph database actually but that's we have and that's what we have and we don't know anything better ideally we should have a chain code weakness registry so the same way as the solidity world is maintaining the SVC registry the solidity weakness classification and test cases registry it would be nice to have something like that for chain code weaknesses too because we may not be able to create a registry for vulnerabilities for chain calls but if we come if the community comes you come together and discusses that what are the ways smelly code can look like to use a technical term we may be able we may just be able to create a registry we began to do that we'll extend it in the coming month is and hope to reach a point where we can publish it and open source it so in summary fabric is a truly great prep platform but to use it in critical contexts critical settings we need to create engineering bridges to critical system design we have to have a much firmer grasp of what constitutes good chain code and we argue that we absolutely have to begin to implement runtime dependability mechanisms of which creating diverse chain code and checking what chain code actually does quasi independently are important parts ledger audits that I'm not convinced that in the practice that would work but that's also an option when we are constantly auditing the contents of the ledger based on a data model and to able to rectify any errors that have been already committed to the ledger we should begin to think of we should begin to think about creating mechanisms not one of mechanisms not saying that the consortium should come together and everybody should roll back the state of the ledger to a given point these things rarely but happen come up ideally there should be mechanisms at the platform level to be able to do that for instance ledger roll forwards or rollbacks granted tied to a consensus of the participants maintaining the ledger I did steer clear of timeliness on purpose because it's a topic on its own I hope I could pick your interests and there were actually useful messages in these half-finished academic results and truly uncharacteristically I left five minutes open so if you have questions then let's discuss yes yes yeah great question so weaknesses have multiple definitions originally the term was used just for security weaknesses so such code structures and such behavior behavior in code that may lead to vulnerabilities and that refers to transitively to the definition of vulnerabilities but the term has been generalized so some weaknesses can be called with static analysis just by looking at the code this actually what I tried to show you ledger data read but not used this is pushing the limits of static analysis and there are such weaknesses that you can formulate informally well in closed text but which you can't really attack with static approaches so you have to you have to look for dynamic verification approaches you can do model checking on chain code it's there were a few guys in Karsua who were playing with using years ago who were playing with using a Java bytecode model checker on Java chain code theoretically doable but if you are involved with model checking you know the pitfalls but yes there are weaknesses there are kinds of problems in code that are weaknesses per se but you can't really detect them with with static analysis but great question there's there are I believe there are multiple PhDs to be done in in these particular niches yes yes one of them is ledger data read but not used actually let's look at specifics so initializing global variables in init functions that's a classic that you can find it in n plus one copies on medium that when you are writing chain code you shouldn't initialize global variables in init functions non-deterministic chain code is a big no-no then modifying a narrow subset of keys no input type validation no input validation no execution timer set unexpected unchecked call a core return value yeah and so on and so on and so on so defining what we mean by weakness is a bit like nailing Jell-O to the wall I know but I hope that you that the general message is goes go through yes yes yeah yeah yeah so look the technology is great but exactly this for embedded software this stuff has been this is already behind us because it's it had to be dealt with you have miserable see see you have coding practices at practically all companies doing doing serious development in C you have a static analysis tools for weaknesses you have dynamic analysis tools that's orders of magnitude harder but there are tools for that and so on and so on and so on all just because there are applications when we have to be where we have to be this serious this is still missing for fabric I don't say that it has to be there has to be available but if we want to climb gradually up the criticality ladder this will become necessary because I don't see any way to avoid creating recreating these techniques if you like and reapplying them okay my time is up thank you for being here