 and DevCon. So this is the outline. Number one, we will introduce our malware scaling system. And number two, we will show you how we design the Dalvik Bico loader. And number three, we will go through two cases of real malware analysis using Quark Engine. And number four, we will share our strategy of generating the detection rules. And the last thing, yes, the future works because we still have a lot of things to do. All right, so let's introduce the malware scoring system. As we know when developing a malware analysis engine, it is important to have a scoring system. However, those systems are either business secrets or just too complicated. Therefore, we decide to create a simple but solid one and take that as a challenge. And since we wanted to decide a novel scoring system, we stopped reading and decoding whatever people do in a field of cybersecurity because we don't want our ideas to be subjected to the existing systems. So we started to find ideas in fields other than cybersecurity, and luckily we found one. Yes, the best practice we found is the criminal law. So when sentenced a penalty for the criminal, the judge weighs the penalties based on the criminal law. And after decoding the law, we find principles behind it and we develop a scoring system for Android malware. There are only eight principles decoded from the criminal law. And I'll go through it in the following slides. Now let's see principle number one. A malware crime consists of action and targets. In the criminal law, the definition of a crime consists of action and targets. For example, steal money and kill people. So with this principle in mind, we developed the definition of a crime for Android malware. And the definition is the malware crime consists of action and targets. For example, steal photos or steal your banking account password. Now let's see principle number two. We consider that the loss of fame is greater than the loss of wealth. So in the criminal law, physical body injury is more serious than psychological injury. So the principle we decoded here is when things are hard to recover, we consider it a family. So with this principle decoded, we developed our second principle. We consider the loss of fame is greater than the loss of wealth because it's easier to make money back than rebuild your reputation. Okay, now let's see principle number three. Arithmetic sequence. In the criminal law, when a murderer is sentenced to 20 years in prison and a robber is sentenced seven years in prison for his crime. Why the number, why 27 years? Why the number? We found no obvious principles in the criminal law. So we use arithmetic sequence to weigh the penalty of each crime. For example, the penalty weight of Y1 is 10 and Y2 is 20 and Y3, 30, et cetera. So now let's see the most important part of the scoring system. We create an order theory, which consists of three principles. There are principle number four, five, and number six. Now let's look at principle number four. The later the stage, the more we're sure that a crime is practiced. As mentioned in chapter four of Taiwan criminal law, each crime consists of a sequence of behaviors. Those behaviors can be categorized in the specific order. So let's take murder, for example. Determined in stage one, determined means somebody decided to kill someone. And in stage two, conspiracy, it means he or she started to make a plan for the murder. And in stage three, preparation, it means a fine staffs, for example, weapons or arranging services for a murder plan. And in stage four starts, it means when things are all set, the murderer takes action and is on the way to kill someone. And in stage five, practice, it means the murderer does pull the trigger and shoot someone. So as we can see here, the later the stage, the more we're sure that a crime is practiced. So with this principle in mind, we developed Android malware crime order theory. And in this theory, we also have five stages for the crime. For example, if a malware tries to send out your location data by using SMS in stage one, we will check if related permission is requested by the malware. And then we will check if the key native API is called. And in stage three, we will see if certain combination of native API exist. And then we will check if the APIs are called in a specific order. Finally, we check if the APIs are handling the SAM register. Okay, so now you can see from this picture, this is a two-dimensional map for Android malware crime. For the crimes, we put them in Y-axis. And for each crime, we use X-axis to see the evidence we caught for the crime. So X5, Y1 means in crime number one, we have found a native API that are called in a correct sequence and they're handling the SAM register. And X3, Y5 means in crime number five, we have found certain combinations of native API that is used in this APK. So now let's look at principle number five. The more evidence we caught, the more penalty weight we give. So we give stage two more weights than stage one and stage three more weights than stage two, et cetera. Okay, principle number six, proportional sequence. As we decoded from the criminal law, the later the stage, the more we're sure that the crime is practiced. So we consider proportional sequence, for example, two to the power of N to present such principle in our scoring system. All right, principle number seven, crimes are independence events. For simplicity, we assume crimes are independence events and penalty weights can be added up directly. So this is an example of adding up two crimes in a malware, we find two crimes, there are stealing photos and stealing your banking account password. So the calculation of a total penalty weight is quite simple. For example, for each crime, we use penalty weights of crime to multiply proportion of caught evidence and add up the results of the two. The last principle, principle number eight, thresholds generate system. So after calculating the total penalty weights for a malware, we need to have threat level threshold so that we can tell which threat level does the malware fill in. Unfortunately, we can't find them in criminal law, but we know we need to design a threshold generate system for that, not just give any number by intuition. So we decided that threshold for each level is some of the same proportion of caught evidence multiplies penalty weight of crimes. So we know it's not a perfect one, but we're sure that we build a foundation for future optimization. All right, now let's talk about the design logic of Delphi Bicol loader. And Junwei, he will take care of this part. Hello everyone, my name is Junwei and I will take care of this part. So now let's talk about the design logic of Delphi Bicol loader. Our Delphi Bicol loader is actually the implementation of the Android malware crime order theory. We implement every stage of the theory. There are five stages. The first three stage are easy. We simply use APIs in another open source tool, Android Guard, to implement the first three stages. As I just mentioned, the implementation of the first three stage are easy, but in stage four, we need to do a little bit more. So before the implementation, we need to know what does stage four do? In stage four, we find the calling sequence of native APIs and check if they are code in a specific order. For example, if a malware sends out your location data by SNS, then first, it will call native API, get cell location to get your location data and then call native API, send text message to send your location data by SNS. Normally, native APIs are wrapped in functions. So we trust back to see which function is cross-reference from the native APIs. And we call those functions the parent function. And we will keep tracing back until we find mutual parent function for both the native APIs. Here is the example. SendTestMessage is code by sendSNS, which is the parent function of sendTestMessage. And getCellLocation is code by getLocation, which is the parent function of getCellLocation. And if we keep tracing back, we will see that both sendSNS and getLocation shares the same parent function, sendMessage. And after we find the mutual parent function, we will scan through a small like code of the mutual parent function and check which function is code first. So this is the small like code of sendMessage. We can see that getLocation is code first to get the location data of the cell phone. And sendSNS is code to send out the location data. And in stage four, we found that our design can also overcome the observation techniques used by the malware. When applying observation techniques, function except native APIs are renamed. This has made the decompiled source core hard to read for human. The machine can still run the code because the logic of the code remains the same. Here is the example. When applying observation techniques, native API sendTestMessage is code by function K and function K is code by function F. The alternative API getLocation is code by function E and both function E and F shares the same parent function, which is A. You see, if you start reading the decompiled source core of A, it will be hard to figure out what is going on there. And by the way, since our goal is to find the mutual parent function, so it doesn't matter how many layers the wipers are. Now, let's see the implementation of stage five. Yes, this is the most important part. In stage five, we need to confirm that if the native APIs are handling the same register. Let's use the same example. Send out your location data by using SNS. So when native API getSaleLocation is code, it will return the location data of the cell phone. And what we do in stage five is to check if the alternative API sendTestMessage sends out the location data returned from getSaleLocation. So in stage five, we simulate the CPU operation. We read line by line of the Somali-like source core and operate like CPUs to get two things. First, the value of every register. Second, the information like functions who have operate the same register. To make this happen, we create a self-defined data type. We call it register object. In each register object, we store three kinds of information. Number one, the registerNAN. Number two, the value of register. And number three, the function who use this register. Let's see an example. So the registerNAN is v7 and the value of the register is a string. And the string depends the value of string one and the result of function one. And then we can see that the register is used as the input resource of the function two. By the way, when filling the value of use by which function in the register object, we expand every register by cross-referencing outer register object in the table. So for example, by cross-referencing, we know that vA is a string called userLocation and v3 is a function called getLocation. As you can see in the lower right corner, the result of getLocation is appended to the string which is userLocation. And the new string is sent out by using function sendSNS. In other words, the value of register v7 is generated by using function getLocation which has native API one in it. And the value is used as a input for function sendSNS which has native API two in it. So now we prove that by using the register object, we can check if the APIs are handling the same register. So after we scan through the source code, we produce dots of register objects. And those register objects will be organized when the two-dimensional Python list. This is a similar idea like hash table. We use it to boost out the read and write of the list. So now let's see the table. As you can see here, register v4 has three register objects. That's mean in the source code we scan v4 was used three times. And every time when it was used, we store the present value of register and the function who use it if there is one. So basically the whole table is the history of the registers. So when we finish constructing the table, we then scan through all register objects in the table to check if the native APIs are handling the same registers. So now let's see how to use quark engine to analyze the malware. Now let's get back to Qoongi. Okay, in this section, we prepare two malware. One is non-obfuscated and the other one is obfuscated. And for each malware, we will show how we detect the behavior of the malware with the detection rule. Now let's look at the first malware. This is a non-obfuscated one. We will use the rule in quark engine to detect whether if the malware send out a cell phone location data by using SMS. So this is the detailed report of quark engine. In this report, the engine shows the detection results of one single malware behavior or you can say one single malware crime. For example, we try to find if the malware sends out your location data by using SMS. So in this report, we list out the evidence we found in each stage of the Android malware crime order theory and this report shows we find evidence in every stage, which means we have 100% of confidence that the malware has this behavior. So let's see, in stage one, permissions like send SMS, access course location and find locations are requested. And the second stage, the key native APIs like get cell location and send text messages are used. And in stage three, we find certain combinations of native API exist. And in stage four, we found out that in functions like send message and do bytes, the APIs are called in the right sequence. And in stage five, in function, send message, we find out that those APIs are handling the same register. So now let's think if you're analyzing this malware and you want to trace the compiler source code to see the evidence, how do you do it? Our suggestion is you start the backwards. That means you start from stage five. For example, in stage five, we know that inside a function of send message, it has two functions that contains the two native APIs respectively and you're handling the same register. So you start to locate function, send message in the decompiled source code. And in stage four, we know that those two functions are called in the right sequence. So we can start to find the functions that contains the native APIs and check if they're really caught in the right sequence. The information of the two functions and the sequence will be shown in the next version of Quark Engine. So now let's look at a real example. Let's locate the function, send message. And we found out that two functions that contains the two native APIs respectively there are send SMS and get location. And if we dive into the function of get location, we will see that it contains native API, get location. And if we dive in this function of send message, send SMS, we will see that it contains native API, send text message. So the code here means it first collect your cell phone location data and send it out through SMS. So now let's dive into the source code of get location. As you can see in the source code, it tries to call native API, get cell location and return this information at the end of the code. And now let's dive into the source code of send SMS. Native API send text message is used to send out the location information. So quite simple, isn't it? Now let's look at the second malware. This is an obfuscated one. We will use the rule in quack engine to find whether if the malware detects Wi-Fi hotspot by gathering information like active network info and cell phone location. Okay, so as a malware analyst, we read the report backwards. As you can see in stage five, there are functions like p.a at view.c and af.run. And those functions has two functions that contains native APIs respectively and they're handling the same register. And in stage four, those two functions are also called in the right sequence in functions like p.a at view.c and af.run. So according to the reports, we can see that the malware has the behavior of Wi-Fi hotspot detection in three parts of the source code. And we can pick any parts for further analysis. So we just pick function p.a. So now let's look in, now let's see the source code. Let's first locate the function p.a. And we found out that two functions that contains the two native APIs respectively, they're ap.a and f.f. And if we dive into the source code of function ap.a, we will see that it contains native API, get active network info. And if we dive into the function of f.f, we will see that it contains native API, get cell location. So this code here means after collecting information from function ap.a and f.f, they sent the information as an input for function am.a. So now let's dive into the source code of function ap.a. As you can see in the source code, it tries to call native API, get active network info, and return the related information at some points. So now let's dive into the source code of f.f. Native API gets cell location is used to get the cell phone location data. And this information is processed with some other strings. And at the end of this function, it returns the strings with the information. As mentioned earlier, after collecting information from function ap.a and f.f, they use the information as an input for function am.a. So, and we noticed one thing, the function am.a used by a rate output stream as one of its input parameter. And we know when seeing by the rate output stream, it means the function is probably trying to write the data into a file. This is amazing, isn't it? So with quack engine, malware analysis can really boost up your productivity. Okay, so now I will introduce our detection rule generate strategy. So why do we need to develop the detection rule generate strategy? Because to make our engine practical and easy to use, we have to, we need to have more detection rules. However, the speed of rule generated by human is quite slow. And human generated rules is subjected to his or her experiences of malware analysis. So we developed that rule generate strategy to boost up the production of detection rules. And since our goal is to find all kinds of behaviors in the malware, so that if we use permissions and native APIs to generate all possible rules, we will have an amazing amounts of rules. After generating the rules, we then use quack engine to find the intersection between those amazing amounts of rules and the malware we prepared. In other words, we find rules that match the malware behaviors. However, this is not a good way to generate detection rules because it's time and resource consuming. So we developed a seven step generate strategy. Step one, we curl down all native API information on Android official API reference. For example, this is the native API information of send text message. You can see the input parameters, return value and the description of this API. Okay, step two, we did a little bit modification to our engine. We ignore the permission checks in stage one of the Android malware crime order theory. And in step three, we find all kinds of API combination and generate rules without permission information. And in stage four, we use the modified quack engine to find the intersections of the rules and the malware. We call the rules, we call rules in intersection the first stage verified rules. In other words, these rules need to be verified again. And since we don't need to generate rules with permission and verified permissions in quack engine, the whole process of rule production speed up. And in stage five, we try to generate rules with permissions. So inside the intersection, we have first stage verified rules matched with malware. We then use the first stage rules and permission in the matched malware to generate rules with permissions, which is the second stage rules. And in step six, we then use the quack engine. Yes, the four function version to find again the intersection of the second stage rules, which are the one with permission and the malware we prepared. So after that, for each rule, we label the number of matched malware. For example, the behavior of rule number one can be found in a hundred malware, et cetera. And in step seven, after labeling the rules, we then sort the rules by number of matched malware we will review the rules from the highest matched one to the lowest one. All right, the last part, future works. As I mentioned earlier, we still have a lot of things to do. For example, we need to have more detection rules. And we also need to deal with the data so files and the packed APKs. And we want to have more features of Dalvik biker loader, for example, the downloader. And we will also want to apply the scoring system to other binary formats. Yeah, it is in order to the list. And we notice that API changes in different versions of Android. So we will also take care of this problem in the next version of quack. Oh, we probably would change the core library since Andrew God is quite inactive recently. And one more thing. Actually, we're trying to make quack itself easier to be integrated to other tools. For example, users can import quack as in Python library and output the analysis result as a JSON file. So now quack is collected in black arc Linux and will soon be integrated to a threat intelligence analysis tool, Intel all. And there's one nugget that I would like to share. We work at the limits of our tools. When new tools come along, new things are possible. Okay, so that's all for today. And if you have any question, please feel free to DM our Twitter accounts. Thank you.