 Hi. Welcome and thank you for joining this talk on Node.js deeply bugging. Before we talk about what this session is all about, what are the stated purpose and the expected outcome, let me provide a brief intro. Myself Girish Punatil, I work for IBM. I'm an architect with a team called IBM Runtimes. That's responsible for developing and supporting language runtimes such as Java and Node.js. I'm also a member of the Node.js Technical Stating Committee and the Diagnostic Working Group. What are the motivation behind today's talk? I've been involved in debugging a number of end user reported issues. In fact, I'm personally interested in troubleshooting complex issues that's coming out of the production deployments such as crash, hang, performance, etc. That essentially means I have spent a good amount of time debugging Node.js intervals and I felt it prudent to share some of these experiences, some of the path I traveled with people who are using Node.js and might come across same or similar situations as the issues that I debug. And instead of reinventing the wheel, it would be pretty easy for them to reuse some of these experiences and lead the issues in hand towards QK resolutions. So that's the whole idea or the motivation behind this talk. So today I have five issues or five topics in hand for each of the issues. This is how I want to explain it. I'll start with the execution environment where the issue is reported. How the issue is manifested externally and then the whole path or the proceedings towards problem determination and what was the actual issue and how the issue got resolved and then the tools, the methodologies and the best practices that I followed as well as what are the learnings and the insights that are reusable. So that's the whole idea. All right, so here is our first topic. The title says high end steady state RSS. What is RSS? It stands for resident fit size. Here size would mean the memory size. As we know, there are operating system primitives to allocate and deallocate memory to and from the process. Now, at any given point in time, the allocated or deallocated memory is not necessarily a true reflection of the actual memory accounted for the process. This is because we rely on the virtual memory system and at any point in time in a sufficiently loaded multi-tasked system, there could be other processes which are demanding memory and the memory which are not used by the process or memory that are not used by the process for a considerable amount of time is swapped out of the process and given to the other process that needs it. And then other later point in time, when the current process requires it, the memory is brought back in. So this means the actual accounted memory is going to be equal or less than the allocated memory for the process in most of the practical scenarios. So the RSS, the resident set size is expected to be or supposed to be the accounted memory for the process. I stress on the word supposed to be because that's the whole essence of the issue at hand and we will see why. So the issue was manifested in the form of a high and steady RSS size in the Node.js process and reported by many users with a similar symptom as they all are seeing very low heap usage but very high RSS. This is not a surprise because we know that Node.js is not just using memory from the JavaScript heap. It has also bearings on the native backend. Many objects have memory that is used in the native heap as well. So low JavaScript heap but high native memory and high RSS on the face of it is not a problem but because many users reported the same issue with a similar symptom, we thought it's brilliant to investigate and started looking at it. One of the common symptoms was the fact that they all had a history of huge memory allocation in the form of various objects getting created back to back and then later all the objects getting garbage collected but the RSS is not proportionately coming down. So that was a common symptom reported by most of these customers or end users. Now coming to the problem determination, as we know the JavaScript heap is not a true reflection so we make use of the operating system tools to see what's the actual memory consumption of the process. The layer tools such as stop, Pmap, etc. are helps here and then we look at the consumption and it's correct we see high RSS but then unfortunately we don't have a mechanism to see what is the breakup of the native memory consumption or what is the finest level of details about where this high RSS is accounted for within the program or within the virtual machine. So the only way to get to the details would be through running some simulation testing. So we wrote custom programs to simulate the behavior but we were not getting the that's the point in time we got into some other hint around the behavior was not very consistent even with the same program that was run in different systems. So for example different system with the differing in specification or differing in terms of the load conditions. So that was on the face of it as a challenge but it gave us a lead in terms of how do we proceed. So one of the first step was to disambiguate or get an answer to why the same program is behaving differently in systems with the differing spec or differing amount of physical memory or different set of load characteristics in the system. So that actually gave us some opportunity to dig into the problem. We see two patterns that is when the system is high end with ample amount of physical memory and the system is relatively high then we saw that the RSS once hiked up is never coming down whereas if the system is tight in terms of the physical memory with respect to the running process and if there are a lot of load in the system then we see that RSS is immediately coming down which is in accordance with the expectation from the knowledges process. So reading through various documentation, various manuals and specifications around virtual memory and residents at memory and with the help of a small C program which truly simulated the knowledges scenario we could actually understand the root cause of the issue. So in my small C case what I was doing is allocate small chunks of memory in rapid succession but in large amount and then immediately free all those memory say for example 2KB chunks of say 1000 such chunks allocate from the free pool and then free it immediately then sleep for some time to see if the RSS that short up would come down or not. So we could see that the RSS stays on top forever unless you add more process to the system and bring a demand in the memory the process is never going to relinquish the amount of RSS that was accounted in the system, in the process. That's really a surprise to me, correctly myself and my fellow customers or the end users I don't know if that's a known art but it was really surprising. The resident size is not truly reflecting the resident memory of the process. The process has long back freed up the memory but still unfortunately it's all accounted against the same process. Now what's the resolution we don't have a resolution yet identified we have an RFE opened against a LibuV component unfortunately the challenge with respect to the implementation or resolution of this problem is the fact that we don't have a consistent or well defined API that would capture the actual accounted memory in a process the actual accounted memory in a process is something called working set size in Linux unfortunately we don't have an API to get that value out of it and then secondly this is not something which we can implement cross platform where NodeJSC supported in a number of other platforms where we may not be able to consistently get this value. So for these reasons the RFE is still not completed. So the key learning from this exercise is the fact that RSS does not truly reflect the accounted memory or the active memory in your process it depends on the system characteristics also if it is largely ideal with a lot of ample memory never trust the RSS that's my bottom line. So the next one is ExitRace this is an issue that was reported in NodeJSC version 11 so what is ExitRace? The name was given later upon in time when we completely diagnosed the issue and the resolution was identified the issue was manifested as random crashes by all sort of programs with no specific patterns whatsoever that can be identified so we got segmentation fault, we got aborts we got illegal instructions to name a few the PD approach was to run the program many times under the problematic version and collect as many system dumps as possible associate the failing context from the dumps and see if we can do some classification based on the failing context into a discrete set of patterns rather than in the set of random crashes so this approach helped in two aspects number one it helps to understand there are two or three or four discrete set of patterns and not for too many number two it revealed that in all the cases the main thread was in its exit path so that essentially means in all the programs it was about to exit or the main thread has initiated the cleanup activities this was really a strong lead the other challenges that we faced were like when we did a git bisect it showed some commit and by looking at the changes that was belonging to the commit which was so evident that that commit has no relation with the issue because it was absolutely no way connected but then it was just adding to the timing window so the problem was that there is a set of resources or entities or assets whatever we want to call it that the node creates or initializes at the time of the node strap and those assets or entities or resources are destroyed or cleaned up in the reverse corner at the time of the node exit so the order is absolutely important because there are interdependencies that could be defined between the resources and the core part that access these resources the issue pops up when there are multiple threads it could be the worker threads or it could be the internal helper threads that node uses to perform asynchronous operations so when there are multiple threads involved and if you don't do the cleanup in the reverse order properly or we don't harness the threads then these threads were accessing the resources which are already destroyed and that was causing the random crash and definitely based on the exact point where these threads were accessing the resource we were seeing random crashes so the resolution was to follow the order in which the resource was created to be followed at the time of the destruction as well as to make sure that the helper threads which can potentially access these resources are kept in a very consistent state at the time when we are doing the cleanup which means if one or more threads can access one of the resource we make sure the thread is brought in the state where it does not make for the thread to access that resource anymore that was the resolution one of the problem determination technique that we learned and applied in this scenario is called timing injection so timing injection essentially involves adding sleep delays between activities where we suspect presence of multiple threads which can cause issues basically widening the timing window in an artificial manner and leading to simulating race collisions alright data truncation with re-piping again one of the complex issue that we have worked with this is something which is not even easy to explain to a normal programmer so what was happening? Consider to node.js child processors like any other process these processors also have the standard streams such as the standard input output and the error stream a key difference with the child processor of the node.js is that these streams are essentially pipes long story short to avoid confusions around which process owns the console from where the child processors are spawned between the parent and child all the native stream handles so the child processors are closed by default and then three pipes are created by the parent and the one end of the pipe is held by the parent while the other end is given to the child and the child uses it as if it's one of the standard streams so whenever the child writes into the stream it goes to the parent the input stream of the child gets flowed into the child process so that's where the child is able to use the standard streams effectively as if those are the standard input output and error streams while parent is able to find control those streams so that mechanism works pretty good now coming back to the context problem context what was happening is when a two child process were engaged in the communication by re-piping the standard streams for example one of the child process output stream was piped into the other child process input stream so that whenever the child process is writing the data should flow into the second child process so that was the understanding that was the design but unfortunately the data was getting lost in an arbitrary manner the challenges with respect to the problem determination was that the truncation pattern was not consistent every time we get different set of results sometimes the test passes and sometimes we don't get we don't see any data flowing into the second process etc the second challenge was that we did not have a proper tool we did not have a proper methodology also to debug this kind of a problem one of the hypothesis that we made around this was the child endpoint is already a pipe we know about that and that essentially means there is already a channel that is open between the one of the child process and the parent process now when the child end of the pipe is re-piped that essentially means the file descriptor the underlying handle of the pipe is copied and redirected to the other endpoint of the second child what happens to the channel that is still open between the first child process and its parent process to arguably that is still open and the data could flow through that channel as well so now depending on which two pairs of the process are currently active which two process are having the CPU cycles there and which process is currently writing and which process is ready to read in terms of its code flow and control flow the data would actually go partly to the parent process and partly to the second child process because both the pipes are open and it depends which pipe is being exercised which process is active etc as really funny scenario with the help of some powerful tooling such as S-Trace we were able to actually get to the bottom of it and prove this hypothesis we were able to see exactly what the suspected that data was partly going to one process and the rest of the data without any loss was flowing into the second process the solution was pretty straightforward when the pipes are re-piped so that the existing pipe should be deactivated that means the remote endpoint of the existing pipe needs to be closed now one of the main learning from this exercise is that we spent a lot of time writing test cases and debugging without getting much clues so what is important is the fact that we need to build a theory that explains the problem at hand and then write test cases and debug around the hypothesis to prove or disprove that hypothesis and then if it is not to be the case make a new theory and write test case and validate around that without that if you are just starting by writing test cases and traveling in the reverse direction I don't think it's going to give much use to the problem determination exercise okay shutting down Node.js in flight well looks pretty straightforward looks not even an issue what is shutting down in flight terminate the JavaScript runtime and the platform that was started by the Node.js process isn't it as simple as calling process.exit or killing the process with a signal such as control C the point is in this particular case we don't want to get out of the process but still want to terminate the Node.js sequences but then what is left when Node.js terminates that makes this case very special the use case here is that we make use of an embedder an embedder by definition is a process that embeds Node.js with a container component relation not as a parent child or not as a server client relation but as an in process component so this means an embedder has a custom launcher or an entry point function and the entry point is responsible for initializing a number of components of the embedder application and one of the component happens to be Node.js now in this setup the lifecycle of the embedder process could be much larger much wider than the lifecycle of Node.js to give a simple example in response to certain external commands the embedder would want to shut down the running instance of the Node while continuing to work with the rest of the components of the embedder there are many Node.js embedders but the two things that I know of are one is electron and one is IBM IIB IIB stands for IBM integration bus that defines a framework for connecting heterogeneous applications so the problem is that Node.js event loop is designed in such a manner that as long as there are active handles in the event loop the loop is not going to exit the only way to get out of the event loop is to send an exit signal but that is going to terminate not only the event loop but also the wider embedder process so that's where we need a mechanism to shut down the Node.js instance specifically and in a custom manner where the event loop is quest and it silently and gracefully comes out of all the processing so that the embedder can continue so the way the API which was proposed work is like this it inserts an async handler into the event loop in the beginning and when the embedder wants to stop the Node.js instance it files an async event with the handle that we talked about earlier as the async handle this triggers an async event in the event loop and subsequently the handler gets called which is involved in the main thread of course by stopping all other processing in the main thread and in the handler we close all the handles in the event loop and basically we close the event loop we come out of it we terminate all the worker threads and then start the cleanup activities in a graceful manner this eventually shuts down the Node.js instance and returns to the caller which in this case is the embedder so my key takeaway from this particular activity is that when new architectural possibilities are attempted we learn a lot and in this case specifically around how the event loop works how the overall architecture of the event loop is designed and then how the threads coordinate around the event loop etc and the last one in this series is the aggregated STD out data on process exit this is reported by many users with varying problem descriptions here the process is again a child process and the standard out data are typically large like in kilobytes so that means if you're having a child process that writes kilobytes of data and exit immediately after the write the parent does not the data in its completion so some amount of data gets truncated that's the issue now as we know from one of the previous topics that the parent and the child communicate through a pipe and the pipes are subjected to buffering and chunking and there is a max buffer parameter that defines how much data will be allowed through the channel in one shot a trivial suspect is the value of this max buffer we increase this but did not see any result any positive result that's coming up we tried testing in different platforms that basically exhibits different pipe behavior but that was also not giving any clue as such now changing the data volume had a visible effect and more importantly each time different amounts of data so clearly there was some risk condition but between which entities so the sequence that writes the data to the parent is one sequence and the sequence that wants to exit the process is a second second sequence so clearly there is a risk condition between these two sequences and which one is able to progress further based on that the other one was lagging behind based on CPU cycles based on operating system scheduling and based on various other control flow related sequencing the data truncation was varying in a variety of manner but why wouldn't the process exit wait for the data write to complete that's where the crux of the problem is so the console.log API which was used to write the huge chunk of data is asynchronous by nature in Node.js this means if you present a large amount of data to it only a chunk that is a volume which is you know bare minimum which the pipe can hold is written in the first place and then the rest of the data is scheduled for for the writing when the pipe is drained completely and is ready to write again what if an exit is pending on the process is this data rewritten no the data is the data write activity is completely abandoned and a pending exit is processed first and here is this is actually the issue now what is the solution the solution would be not to use the console.log API which is asynchronous in nature so one of the non-workaround is to use process.std.write API as opposed to the console.log API process.std.std.out. API is a blocking API so it holds the process on a blocking state until the whole data is written so that is essentially the non-workaround but fundamentally for the console.log being asynchronous and at the exit point if there are data that's pending will get truncated does not have a comprehensive solution as of now and that means we do have a number of issues opened on that I don't think we will have a design level proper fix for that in the near future but we do have a stable workaround so that's pretty much it is I hope you enjoyed this talk I want to conclude the talk by saying these are some of the examples of issues where we spend considerable amount of time debugging the Node.js internals and gain wider insights around how things work plus usage of a variety of tools and techniques and methodologies. I would like to strongly recommend folks who are interested in getting further details on this or how to use some of the tools or even to get started with contributing to Node.js and start debugging. You are welcome to write to me or any of my fellow project members. We are more than happy to assist you. There are enough issues in the backlog at the moment that are waiting to be hand-picked so you wouldn't be disappointed at all. Once again thank you very much for listening to this talk.