 Good afternoon everyone and my name is Han Liu and I'm currently a postdoc researcher from Tsinghua University in China And I will be talking about a research work on statistical link to for a study these smart contracts and Okay, so as a normal computer computer programs are smart contracts and have smells such like unused function parameters or unprotected message cores and Maybe a delayed update which is a vulnerable to reentrancy attack So these smiles are not not are not necessarily causing any catastrophes, but these are something that you don't you're trying to avoid in your smart contracts because they may they can be statistical errors or Maybe they are not following the best practice. Some of them are bugs and even security issues So the most straightforward way to Capture and remove these smells would be when the developer finished their code We can use a bunch of analyzers for example for more verifiers static analyzers to check this code and see if the bad code is there and if you're doing this commonly you will need your code to be compilable in most cases and you should be aware of the Predefined rules which tell the analyzer what kind of thing they should be searching for and What we've trying to do is that we want to make these checks earlier in the development Lifecycle and we want it to be in a interactive way, which means that the developers can check their code even if they Haven't finish it or even if they have no clue what the patterns look like and to realize this idea We have proposed that they s-gram framework which exploits the Naturalness of smart contract code. So the nationalist notion is actually coming from the software engineering community Which tells you that how natural or how irregular your code is with respect to a large collection of other code and So we what do we do with this nationalist notion in s-gram is that given the contract code We are we will use a pusher to pass it into a token sequence and based on this token sequence we will build this like statistical language model which captures the regularity of all the tokens and The language model would be able to answer the question whether the token sequence is likely to occur in the in a specific context and then we can identify irregular code in the smart contract and flag potential problems So more specifically the s-gram from framework works in a two-phase manner in the first phase We will need a large collection of smart contracts to train the model to do that We will use a static analysis to extract a semantic metadata of out of your contract Basically, we're trying to do two types of things and we can focus on the access on storage data And also the flow sensitivity. So if we take a look at this simple smart contract at these two lines of code The analyzer will tell you that oh, there are two these two lines of code are Accessing on the same storage data called user balance and one of them is rate operation and the other is right And these two operations are dependent on each other because they are from different Public functions and are not commutative to each other and in terms of flow sensitivity if we look at this kind of code The flow condition of this line of code includes constraints from the modifier and also the if statements and the way we modeled flow the flow This flow is by using the addresses and the operators Involving these flow conditions in this case that will we will be using message standard and the two operators as specified here and then we will use a tokenizer to generate a token sequence from this contract and The generation is basically done by traversing the abstract syntax tree in a type based manner which means that we generate a corresponding token for a specific type of ast node and Then we will be training the model using an underlying and run model engine and to build this statistical language model and In the second phase we pretty much do the same thing and given the smart contract We are generally the token sequences and then we can use a detector to curate the language model before and then Calculate the calculate the regularity or perplexity scores of sub sequences and then we will highlight top candidates of these smart contracts With the highest perplexity scores and if you are trying to use these candidates information to help Optimize the existing smart contract analyzes for example a symbolic execution engines what you can do is to design this ranker which takes the candidate information and generate scores for all the information or All the functions in the contract and then this scores will tell the symbolic executor which function is more buggy than the others and Then the smart executor can prioritize the exploration of a specific function with high scores so as to detect vulnerabilities more efficiently and In the future we plan to Work on optimizations on the language models. For example, we were trying to figure out more efficient way to encode Both syntactic and a semantic regularities and also we are considering porting sgram to more existing Technics for example formal verification static analysis random fuzzing something like that and also to create a better developer experience we are Planning integrity as square s-gram with an IDE so that we can capture and model developers feedback and optimize the s-gram itself and We'll actually have published the academic paper about s-gram if you guys are interested you can look into the details and I will be around offline around here for offline discussions and that will conclude my talk. Thank you