 Hi everyone, my name is Feng Xiaofeng Georgia Tech. It seems like I'm the first DevCon safe mode main talk and I really hope everyone can enjoy attending my work. So this is of course not only my work, but also the work of my wonderful co-authors from Georgia Tech and Texas A&M. Today I'm going to talk about some interesting new vulnerabilities in the Node.js ecosystem. Before the talk normally begins, please let me introduce myself a little more. I am a CS PhD student at Georgia Tech. My research goal is about building automatic systems to detect and exploit vulnerabilities. We want the tools to exploit vulnerabilities because we want the people to know the existence and the consequence of their security bugs. And I do research in the web and application security. But I'm also researching security problems in other areas such as the software defining networks and x86 virtualizations. Okay, so this is the topic I'm going to cover today. The talk will be divided into three parts. First, I will introduce the technical details of the new vulnerabilities and discuss the exploitation. Then I will talk more about the bug finding part, which is about the lessons and the insights from building one kind of a tool to detect and exploit HPA. And in the last part, I will give impact analysis of the new risk and some evaluation data about the tools. So first of all, let's take a quick look at the vulnerabilities we found during our research. So in total, we discovered 13 zero-day vulnerabilities from widely used programs such as MongoDB official drivers class validators. And these bugs can be exploited to launch serious attacks such as leaking credentialed data by passing security checks and denial of services. So before we touch the technical details of these vulnerabilities, let's have a brief background introduction about the Node.js. Node.js is for executing JavaScript code outside of browsers. The picture on the right is the overall system diagram of Node.js. To interrupt and execute JavaScript, the Node.js implements a runtime engine based on Chrome V8 to satisfy the needs of the host, the server-side languages. The runtime engine also provides a set of API to let JavaScript interact with the host environments. By providing such APIs, the JavaScript code can access the host environments like any other server-side languages. For example, he can read and write file systems or execute system command. So Node.js is pretty powerful. Nowadays, many websites are deployed in Node.js. For example, Node.js are intensively used in companies like PayPal and LinkedIn. Also, we are all using a lot of electron apps, such as Skype or Skogs, Discord, and search-electron apps are also based on Node.js runtime. We've seen so many Node.js applications. The web-based apps are one of the most common types of Node.js programs. For those web-based applications, packing the communication data into object representations like JSON is pretty common, and this feature is convenient. For example, the Node.js can use this feature to send and receive very complex data structure. From the monthly download statistic picture on the right, we can have an idea of how the object sharing is being supported and used by the Node.js ecosystem. So the diagram demonstrates how the object sharing is being used in the Node.js ecosystem. There are two major methods of serialization object data. The first is the query stream-based serialization, and the second is the JSON-based serialization. As shown in the picture, if the user wants to update the age information in the Node.js web, he can send his data either via a standard query stream in the URL or a JSON stream in the request body. Upon receiving the request, the web application will convert the JSON data or the query stream data into an object so that the object can further propagate in the program logics. Okay, so basically this is how the object sharing is carried out in Node.js ecosystems. Usually, if we want to evaluate the security of such a program, we may want to inject different payloads into the age field so that we can try to trigger certain vulnerabilities such as cycle injections or cross-excripts. However, what if we choose not to test the existing data field? Since we can pass an object into the program, what will happen if we inject additional properties that are not expected to be received by the server program? In particular, if an attacker can send the properties that forge or override the certain internal program states, the attacker may easily obtain dangerous abilities to manipulate the key program logics. So we are going to introduce the hidden property abusing, HPA. So the hidden property abusing leveraged the object sharing in Node.js to tamper or forge critical program states. We call the additional properties we inject as the hidden properties because these properties are kind of like some hidden parameters which is valid to the endpoint user API. These parameters are associated with certain internal program states. However, nobody knows their existence so they average HPA to tamper the states. In this talk, we mainly focus on the server-side scenarios where a remote attacker wants to attack the Node.js web applications or some microservices. To explore the vulnerabilities, the attacker will access the legitimate interfaces such as the web API endpoint to send his payloads. In most cases, the attacker payloads should be in the form of planned object with the simplest object representation in Node.js. During our research, we discovered two typical attack vectors of HPA. We call the first one as the app-specific attribute manipulation. This one is for manipulating certain internal properties defined by the application themselves. Search internal properties are supposed to be initialized and managed by their internal functions. However, they usually represent certain internal states of the program. So as shown in the picture, the IONIT role is an internal function that is responsible for maintaining the access right on the user object. However, with HPA, the attacker can propagate a conflicting name properly to the user object. And thus control the internal states of the program. As shown in the picture, the program also provides an API called Update. This is for external usage. However, if a malicious user injects an additional key-value pair, which is access and admin in the picture, to the API, then the additional properties will override the existing access right. So this payload is pretty useful when we want to abuse certain concrete logics in large applications, such as some older information or user privilege management logics. Also, HPA can target some unique JavaScript schemes, such as prototyping. We call the second attack scenarios as prototype inheritance hijacking. In JavaScript, every object has a link to a prototype object. When the JavaScript code wants to access a property of an object, the property will not only be searched on the object, but also the prototype of the object and even the prototype of the prototype until a property with a matching name is found. As shown in the picture, when the JavaScript code wants to get the constructor property from the input object, it will first search locally within the input object. Since there is no property named constructor here, the code will continue its search to his prototype, where the constructor is really located. So with HPA, we can hijack the inheritance chain and forge our own payloads as the internal properties on the chain. As shown in the picture, if we inject a property named the constructor, the search process will be very different. Since there is already a property named constructor within the input object, the search will immediately stop and end up with retaining a user-controlled value. As demonstrated in the red circle, the value of the constructor will become a string, Rick and Morty, rather than normal JavaScript functions. So the second attack vector is really useful because we found many JavaScript developers tend to trust the properties inherited from prototypes and make many security-sensitive decisions based on them. Also, we should be aware of the differences between these attack vectors with the prototype pollution. The two attacks are totally different. The prototype pollution, as the name suggests, is about tampering the prototype object. However, our attack vector does not modify the prototype object. The root cause of the HPA is that the node.js fails to isolate the unsafe object, such as the user input from the critical internal states. Also, to make a clear classification, the HPA can be seen as a new security risk under the common weakness category 915, whose child variants are all about improper modification of the dynamic object attributes. As shown in the hierarchy tree on the right, there are some similar issues on other language platforms, such as RubyMess Assignment, the PHP object injections. Although these variants share the same behavior feature dominated by 915, they all have their own data patterns due to the language differences. For example, the RubyMess Assignment is a set of vulnerabilities discovered in widely used Ruby web application framework called Ruby on Rails. Unlike HPA, the attacker does not pass objects into the program. Instead, the attacker abuse a framework-specific assignment feature in Ruby on Rails to modify certain existing objects on the right side of the assignment. And the payloads between the two attacks are also different. The mass assignment payloads are literal value. However, HPA can introduce hidden properties with either literal value or nested object form. More importantly, the Ruby is strongly typed so the mass assignment vulnerabilities cannot introduce new properties to the objects. However, the HPA can inject arbitrary properties which make HPA really flexible and powerful. Okay, so with several pages of concept introduction, I think it's time we can hack some real targets. In this example, we target at a popular web framework named RoutingController. We will attack its official example code to demonstrate an NTWM prototype inheritance hijacked exploit from security check by passing to cycle injection. The figure on the right can give you a brief idea on how our example works. In the example, a server program is deployed using RoutingControllers. If a remote user wants to authenticate with the server, his data will flow into the following components. First, he will send his serialized data into the authentication module. Then, the authentication module will instantiate the objects according to the JSON he provides and send it to the param handler. Here, we use the green box to demonstrate the user data objects. The param handler is responsible for ensuring user input object is not malicious. The handler will first collect internal format specification object, which is the blue box in the picture, and he will merge the specification with the input object and invoke the input validation API. The input validation API will sanitize the user input data according to the format specification. In this case, it will check if the email field is legitimate or not. If the check pass, the user object will flow into the database. Okay, so this is the overall data flow. Let's analyze how we can attack the logic step by step. So, the first step is the hidden property injection, in which the malicious attacker injects a hidden properties in his request, which is the constructor in this case. As shown in the picture, when the server program instantiates the user object, there will be an additional property named the constructor, which is a payload for bypassing the input validation module. So, in the second step, the program will prepare the parameters needed by the input validation API. The server program will merge the user input, which is the param, with the object named the schema. The merging operation is carried out by putting every property from the param objects into the schema object. So, this process is very much like object assigned. To simplify the demonstration, let's just use object assigned in this example. By performing search merging operations, the hidden property constructor will also transform to the schema. And after the transformation, we now can hijack the inheritance of the constructor on the schema. Actually, the constructor of the schema plays a very important role in the input validation module. As shown in the picture, the constructor actually stores the important format restriction information. As a result, the merging operation enables us to hijack the inheritance of these important format restrictions. As shown in the picture, when the constructor is read by the get-schema function, our hidden property will immediately match and return to the code. To bypass the input validation, we just need to set the format specification as an invalid value so that our cycle injection payload can escape the check. Now the last step is much more straightforward. The validate payload then flows into the sensitive API to finish the entire attack. So this is how an entire HPA chain looks like. Actually, the code logics behind vulnerability is much more complex. For example, the input validation model contains 30,000 lines of code. So it would be helpful if we can find a tool that automatically tracks these data structures and discover time for hidden properties existing in the program logics. So what are the challenges of building such a tool? First of all, it is JavaScript. Analyzing JavaScript is known to be hard due to the dynamic nature. And second, HPA is our tech that creates unexpected and new data dependencies. However, program analysis usually such as data flow tracking is mainly for analyzing the existing flows. Third, from our run examples, we can observe that HPA tempers internal program states. So the attack effects highly depends on the roles of the compromised states. This makes the detection more challenging. To overcome these challenges, we design and implement links a hybrid JavaScript program analyze tool to detect and exploit HPA vulnerabilities. Links mainly consist of two components. The first component on the left is for identifying potential hidden properties. It combines dynamic data flow tracking and static synthetics analyze to track all the user input and infer potential candidates. And the components on the right is for detecting the harmful hidden properties and generating exploits for them. To help future Node.js security research, we decided to open source our links project at the GitHub links in the bottom. So if you are interested in the technical details, you can check it in the GitHub report. So the very first things links will do is dynamic data flow tracking. First of all, links will generate a label object, which is a unique key and value key. Links will inject the label into the input data of the program. Since different properties from the input object may flow into different program logics, and we want to track all these propagations. So we performed the label injection in a recursive manner. That is, as shown in the picture on the left, links will generate three different inputs by label injecting the original test case. In each time, links will inject the label into our different properties. After the injection, links will observe the program execution. We leverage a JavaScript analysis framework called Jalanki to instrument our test programs. Since we are studying the data flow, we instrument the variable reads and writes, object property indexing and function calls. Then we executed the test program. During the execution, links will examine every object we see in the data flow. If an object can't carry our property labels, we will record it for further analysis. So now we have a list of property carriers. Record that an object is flagged as a property carrier because we detect our injected label under his body. So if we can propagate our label here, maybe we can also propagate another malicious property here also. More specifically, if we can inject a property that has a conflicting name with certain internal property the program has, maybe we can control that property by overwriting them. So now we want to extract all the child properties from the original programs of the property carriers and flag them as hidden property carriers candidates. To achieve this goal, we need static syntax analysis to extract the necessary syntactic information from the code. The picture on the left right will demonstrate how we pass our statement from our running example. Links will traverse the syntax tree until reaching a property carrier which is circled by the red line in the graph. Then it records all the properties under the carrier. In our case, the hidden property candidates is the constructor. So here is an output, a screenshot of the first component. As you can observe, the links will first instrument the code base and then we tracked 43 property carriers. As indicated by the red circle, links successfully detect a hidden property named constructor. So in previous component, we discovered the key name of the potential hidden properties. By injecting a property with the same key, we might override certain internal states. However, we still don't know whether the candidates can be controlled or not. And we also don't know how to introduce attack effects with these candidates. So apparently, links could do more. Let's revisit our running example to see if there is any insights to help us design such an exploitation component. The figure on the left is the vulnerable code from our running example. As we have discussed many times, the hidden property tempers the internal program states, which means HPA exploitation highly relates it to the code context. So it is important to conclude a set of vulnerable sensitive behaviors. This behavior should clearly indicate certain security consequences so that we can decouple the harmfulness detection from the code context. Also, from the running example, exploitation is mainly about manipulating the return result. More specifically, there are two possible passes here. If the execution enters the branch on line 19, we will get a validation failed. But if we can go into the line 21, we can successfully pass the check. So the exploitation point and the override point may not be the same place, which means we shouldn't stop our analyze at line 11. Instead, we should continue exploring all the possible passes that can be triggered by manipulating the hidden properties. So we studied and concluded six general types of sensitive things. Due to the time constraint, I will not introduce the details of each type. If you want to know more, you can check our Git report. So after defining our sensitive things, we want the hidden property to trigger as many as possible branches and monitor whether we can hit a sync. To achieve this goal, we use symbolic execution to explore all the hidden property value space. Links first generate an exploit template that can reach the potential hidden vulnerable property. We did not search data structure as an exploit template because this is... Links does not specify a concrete value in the input. Instead, we insert a special placeholder which will be used by the symbolic execution later on. Then we run the test program with our constructed templates and symbolically executing the hidden property. As shown in the picture, the links will explore all the pass constraints along the input path. And once... if links found that it hit a sync, for example, in this example, it hit the sync I2, then he will fetch the corresponding payload that can trigger the sync as the final exploit. So a little background about the sync I2 is the sync we define to detect whether our input can manipulate the return value of a module or not. So this is the output of the exploit module. From the circled area, we can observe that the key value pair constructed in one triggered the sync I2. In the last line, we can find that the links successfully bypass generator exploit that can lead to the successful validation. So this is pretty much about how our system works. Let's see some interesting new results of our research. During our research, we choose 60 widely used programs for unknown NPM. There are 55 modules and 5 web applications. And with the help of links, we tracked more than 1,300 carriers and detected more than 300 hidden property candidates associated with those carriers. In the end, we confirmed 13 zero-day vulnerabilities. With the help of symbolic execution, links even synthesize 10 exploit automatically. So how is the impact of these vulnerabilities? We found that HPA can introduce various attack effects, such as leaking credential data, denial of services, or bypassing the security checks. Based on the impact analysis, we can observe that the HPA can compromise previously unreachable program states, which effectively enlarge the attack surface. Even more, we will notice that HPA is not a simple input validation issue, and many input validator themselves also vulnerable to HPA. So in the following slide, I will pick up two interesting vulnerabilities from our result and case study them. So the first case comes from MongoDB official driver. We found that we can temper an internal state named the Beeson type. A background here is that Beeson or the MongoDB leverage the internal state Beeson type to indicate the data type, the data type of the query object. However, when serializing the query object, MongoDB will ignore the object with unknown Beeson type. So what if we abuse the logic for a query condition object? The code on the right is from an open source online game. The online game uses a vulnerable API to implement user management logics. As shown in the picture, by injecting the unknown Beeson type to the input, the attacker can force the MongoDB not serializing the query object condition so that the MongoDB will always return the first use on top of the database. With this ability, the attacker can log in or delete arbitrary accounts. The second case is from another widely used eMemory database. The hidden property is more like a backdoor which helps the user accessing the sensitive data. So in TuffyDB, we discover a hidden property named ID, which is an internal index for each database data item. Once we specify our own ID in the query, TuffyDB will ignore other query conditions and directly return the result associated with the index. So as shown in the picture, even though we got run password and username, we can still leak the valid user data from the database with our crafted hidden properties. Okay, so thanks for attending our talk. I hope you guys keep safe in this special year.