 Okay, so good afternoon. Thanks for joining this session. The whole idea of insight from help proposal is to, you know, look at the usage pattern, as well as patterns on the problem which user are encountering when they make use of the Node.js and what type of problems which they report in the help proposal and see what we can do as a team to discuss those things at high level and see if there is any trend that is happening and if there is any thing which we can do at the design level, at the code level or at the documentation level, back in the code so that the overall user experience is improved. So that's the whole idea about this. I've been spending some time on the help proposal for the last maybe one and a half years or so. So couple of caveats, these are my personal observations, not necessarily representing any group or the project as a code. And then help proposal is a running of code that means all the years there have been hundreds of issues. I'm not representing all the issues as such. The ones which came across me, the ones which I was part of, you know, problem declination, et cetera, are the ones which are represented. And then I put recommendations against these of the issues for each of the problem patterns. Again, those are not necessarily hardbound or final recommendations. I just put some observations as based on my understanding as a team. My expectation was that we look at that as a baseline and then apply some collective intelligence and see what we can do about that. So that's the whole idea. Then let's look at the first one. Against each of the problems, I guess I have four of them at high level. Each of them I have the problem and the root cause and the recommendation. So NPM and node installation, reinstallation and uninstallation or migration, reinstallation or migration. This is by far the most common issue which we see across the board without any exception. So if you have 100 open issues, you see around 40 of them belonging to this category. So that's a very conclusive scenario. So what is the problem? Problem is users are not able to install node jays. Probably they get exceptions. They get a file not found or some native level crashes or things like that. Then they're not able to upgrade. They had a value healthy node installation but when it comes to upgradation, it breaks in very strange manner. And then uninstallation again, either uninstallation is not complete or it leaves some files in the file system, et cetera. And they get these sort of exceptions as well. When it comes to NPM, all the things which apply for the node jays installation apply as it is in addition to that and number of other issues like proxy failures, network connectivity failures, the mismatch of the version between node and NPM and all sorts of things. So I would say NPM installation issue is by far more problematic than the node jays related issues. So the root cause in my opinion is that you have a number of distributors with respect to the node. At high level, I will classify the two. One is the one which has a native installer, something like AppGate or Brew installed in case of Mac OS. The second classification is the distributor has a custom installer, something like the one which official node jays community delivers, which has an MSI installer, something which is associated with that. So native installer versus custom installer. Now, if you install node jays, say version 10 using the native installer, and then you want to install or upgrade node version to 11 or 12 using another installer, then there is no coordination between these installers about where you kept the cache files, where you kept the hidden files or the configuration files, et cetera. What are the shortcuts which you kept and when you upgrade, which items you need to delete, which items you need to replace and things like that. That is by far the most common root cause in my opinion. And then when it comes to NPM, it's more tricky. I haven't looked at the NPM client as such to understand what is the type of problems that comes in. But in my opinion, NPM client, which is NPM CLI Dodgers performs a number of operations, like resolving the module, connecting the network, and then look at some of the cache files in your local file system, and then try to make some meaning out of it and finally downloads the module. And then if there is a native component in that, you build it using the build tool chain and then finally install the module. In that sequence of operations, there could be n number of expectations that could fail. For example, the version of Node.js, which the tool figures out is not necessarily the node that is prevalent in the system and things like that. At high level, I will classify that in the two. One is resolution of the NPM script itself. As we know, NPM client is a JavaScript, which is NPM CLI Dodgers, and that is resolved through a number of magical steps. NPM has a script is available in the path, and then that script basically redirects to another file called NPM CLI Dodgers. So that resolution can fail based on what shortcut you've kept in your file, et cetera. Then the NPM CLI.js, in turn, resolves the Node executor website because the first line in the NPM CLI.js is she bank and look, which means you are resolving back the move and then this multi-level resolution and go for a toss like that. That would be one of the main root cause in my opinion about NPM installation or uninstallation failure. I absolutely have no idea about what are the steps that are involved in NPM CLI Dodgers to make any recommendation. But in my opinion, the installation should be standardized across all the distributors. We should have a conversation about with all the distributors how you want to install or how you want to uninstall or upgrade. Should we have a standard protocol for the steps that is agreed upon by all the distributors? So when you're saying distributors, do you mean like through the Devin registry and the clouds and like how they're doing it and their thing? Yeah. So you're talking about like anyone that's basically giving a developer note? Yeah. Okay. I just wanted to make sure I understand that. Yeah. So in my understanding, node source, distributed node that had distributed nodes and the proof, the macOS guys and our official use. Yeah. So each of them have their own protocols. They assume that it is, and the end user uses their own tools to install or uninstall. So that's where the problem comes. And then when it comes to the NPM installer, I commonly look at the NPM debug.log to figure out what is going on. So that's it. That's okay. It's not a bad tool at all. It provides a step by step, a log about what is happening and what stage these things are failing, et cetera. But one of the serviceability pitfall, I would say in the log is that the logs, the log provides too many information for a minor issue. Whereas for a very critical issue, it doesn't provide any information. So basically the balancing of the log frequency or the normalization of the quality of the log is very, very securely represented. I mean to say, if you're downloading, if you're connecting the internet, if you're natively compiling, or if you're caching some information, these are high-level structured, high-level activities. The log is not necessarily normalized or recorded in proportion with the actual activity or its nature that is being captured. So that's the key thing I want to highlight there. Then I have no idea about who owns the NPM plan at this point. I believe there is an upstream, but I don't see anybody raising a PR or addressing an issue as such. We always download the upstream and bundle it in notices. It's NPM. It's NPM. I was gonna say, I've had a question about that recently. Is it NPM Inc, the company that supports people that work on that successfully? So NPM, there's NPM 4, and there's NPM Inc. NPM Inc, regardless of who, whether it's in the org or the Inc, I believe Inc ultimately holds the IP for things. Yes. So. They currently hold it in people too. Yeah. I just did a course on this recently, explaining of this in form. I was always wondering like, that we did some, it for derailing there. I know. Yeah, it's from NPM. Okay. I'm sorry, do they control what the storage game is? Yes. They have 100% of it. Ultimately, yes. Well, that said, I am not sure whether they have granted merge rights to anyone outside of the company. I believe they were working towards that. They were, and they did not end up working on it. So they were able to commit things? Yes. Oh, absolutely, yes. That's why I can't say that. Yeah, then I guess the most important thing that we should be looking at is NPM on the NPM client, but the issues or the problems that's coming from the installation is always reported against lodges. So either we should have a documentation or a best practice that says it should all get redirected over there, or we should have a better stake on an ownership with the NPM client. Oh, like a redirect, like directing them to the NPM.community for questions related to NPM. Like an arbitrary response, like an arbitrary point of view, give or minus. This is a dumb question, I think, but NPM, is that not recommended for folks to start with NPM? Would that help with the OI installed 18 months ago when now I have installed 10, but I forgot how I installed eight? Yeah. Do we recommend OI? We do recommend NPM, but the problem is it's only the NPM. It doesn't work on NPM. NPM windows exist, but it's not feature compatible. It's not pretty. Yeah, and like, I mean, I guess, I guess with windows 10 and WSL, getting better, like, I'm more happy. Like, I might get a Windows machine just to use that to see it if it works and see what the issues are, but other than that, like, yeah, for Windows, it does work. That's how I recommend to people in the courses I'm making right now and install through NPM client. The way it goes, They're used to that by now. They shouldn't have. Yes, exactly. There is a package manager for Windows called Chocolate T. Yeah. Anyone familiar with that? Yes. Is it a Node package manager? No, it's a Windows package manager. Oh, like, really? It's basically through on, well, I think they're through on Windows, but it's also through for, it's basically through for Windows. I see like a third party maintaining and Node.js installed through Chocolate T. Yeah, I believe that is also, sometimes I recommend it down though. Interesting. I've seen that in tournament myself. Okay. I don't know if it's true or not, but it's good. Sorry, we totally, you're out of here. Yeah. Yeah, so it would be a good idea to look at tooling as an option to redirect the issues to NPM, but then my question would be, if the open culture for the whole process of addressing issues in the NPM community, if it is totally different than the inclusiveness, the openness and the activity, activities of the Node.js community, and eventually there's a gap in the user experience. In an ideal world, what would the community, like the interaction with, let's say, because I know NPM like totally closed their issues on GitHub. If they, you know, undo that or like some other method to do that on GitHub, what would that look like to you? To be able to work well with the help people. Yeah. So I'm curious about that and how we can deal with that. I mean, in my opinion, we're having better control of the NPM CLI.js, the whole institutional migration process. Yes. If I were owning the NPM CLI, what I would do is I will look at improving the overall loading process of the serviceability process and better document. If you remember, we have a well-claimed documentation around module resolution. I mean, do you look at the Node modules in the current folder, then you look at the parent folder like that, you will go up to the vertex of the file system and then you look at the package.js. There is a sequence of steps, maybe some 20, 25 steps that is involved. That is well-known, well-documented and well-controlled. So if you have something of that sort for the NPM installation process, that will really help. So, but for that, we might need a more better problem. We should be able to recover the code. We should be able to put a more structured logs and then a better documentation. Yeah. So if NPM communities are able to do that, that is better. Otherwise, somebody in the code doing that, I guess, will improve the situation. Wait, so are you saying you make the recommendation that we should have access, we should have commitment to NPM? That's right. Well, I mean, to get the conversation starting, somebody from the Node community taking this ownership of championing this as an initiative, I mean, having a conversation with NPM, look at the ways to improve the overall NPM installation experience. Yeah. I would say this is that has happened many times. Yes. So it's not that we have not, it's not for trying, but it's sort of similar. I think if we look at like me, right, other upstreams and examples of that for Node and how we're affected by it, we have to spend, I think nearly a year in advocacy and talking, people from our project talking with that team to get them to work with us. And they were already very, they were amicable and they already had relationships, but we sort of codified it as we need to be communicating on a regular basis, so that we're both benefiting from each other's work instead of sort of like failing the effects of somebody's work. Whereas with NPM, we have sort of the added complication prior to our community relationships between NPM folks and the rest of the Node project. Early on, the Node projects, some of the contributors were much more forceful in sort of the demanding, a little bit entitled in terms of we meet, like, of course we need to have access to be able to modify NPM, but that's like stepping into somebody else's project and telling them that you're gonna take over. And I think because of that insensitivity then, it sort of put us at a standstill. And unfortunately, I shouldn't say unfortunately, the leadership and the people who worked on that it has not changed. So they still have that memory in the history of not working well with us, even if on our side the group, the people have changed. So it's not that we shouldn't have that conversation again because it's a new group, but that's like a, that's a project. It's not, because I think that's, this all ends up being side effects, right? Our past choices. And it's not even our past choices, it's like individuals in the project. Yes. The project. But just to give you, if you were to ever suggest that on GitHub, just to give you a heads up, that would be some of the feedback you may end up seeing. Okay. Sure. Let's move on in the industry for now. Yeah, sorry. No, no, right, right. Okay. So next one is on the child process. So this is not necessarily a coherent or a connected set of problems, probably discrete and, you know, one of the kinds in many situations. So we have an outstanding problem on targeted child data. So what that essentially means that if you're spawning a child process, and then it's a small running process like a CLI or a tool that just brings a piece of information and just exits. So because the console.log is asynchronous, the main process, the main child process is not really great for the console to be completely flushed out and it comes back to the parent, essentially using the data. So this has been the ever since I can remember, I joined the project, I've seen this issue and on and off we get people complaining about that. We know the problem, what it is. We know how to solve it is, but there are side effects. There are known solutions which have more severe side effects than the advantage it can bring about, et cetera. But there are one or two options which I believe is something which we can look at to solve this once and for all. There is conditions with spawn is basically again a design level issue. When you spawn the child process at up to some point in the protocol, the child do not exist, which essentially means all the issues that's coming out of this spawn sequence is thrown back to the caller in a synchronous manner. And the moment the child comes into the life, then any issues that is coming to the process which is just got born cannot be passed back to the caller in a synchronous manner because it's running asynchronously. The child is completely independent. So that means there are still the errors that thrown us in a synchronous error back to the parent. So what that essentially means is when you're spawning a process based on where the issue happens, the caller should be prepared to catch the error either in a synchronous manner or in a synchronous manner. And if you don't have enough insight about what type of problems you are expecting, unexpected issues can happen in the consumer side. So that's a pretty tricky little thing. We don't have a complete solution for this yet other than documenting this fact, which I just stated. That is, it's a sequence of activities and up to some point, the caller should expect it as a synchronous program and beyond that, it should be in a synchronous manner. That should be stated and documented as a protocol. So that's what essentially I meant by normalizing the runtime and then performance and footprint issues with the spawn. So the way the spawn works is you replicate the parent process in all its aspects by the fourth system port and then try to implement the trial process within the fourth child. So depending on the nature of the child, say the child is a small unit process like LLs or DWD, essentially what you're doing is just replicating the whole Node.js process in the child address space and depending on how much memory the parent has been consuming, say two GB or three GB for example, you are allocating as much of memory for the child, basically causing a lot of memory footprint issues. There could be a better capability than the memory footprint replication and one class of customers can get affected because of this and especially if you're running things in a cloud deployment, for example, memory is charged and then this particular way of doing things can potentially cause issues for the console. So the recommendation is look at ways to spawn the child in a customized manner depending on either the deployment scenario or the type of child you are spawning, et cetera. So fortunately, the spawn API has a parameter called options which can be overloaded with additional flags or additional input which you can pass. So that means the API provides a capability or the opportunity, just need to look at what is the right way to implement that. This is child's process spawns and the memory of the parent is low but then the child process is higher, will that cause an error as well or without cause an issue, can it grow? It can grow, okay. So the child process at the time of this fork, doesn't require any memory. The child process grows organically based on its memory demand which will anyway need fresh piece of memory. It cannot make use of the parent's memory ansities. Say for example, the parent is one GB and the child is going to be 10 GB. The one GB replicator into the child is of no use to the child. It has to anyway allocate new fresh 10 GB. So essentially the one GB of parent memory becomes useless to the child and it becomes an overhead in the whole system. Is there a reason why we've not addressed that? We don't have to get into it, it's too distracting. So there are discussions that's happening. So it's about being able to implement it in a platform neutral manner and being able to do it in a non-breaking manner. So child process has been there in the field for quite some time. So just not touching anything. Yeah, yeah. So the third one is embedding scenario. Embedding basically means, you know, using node from an existing native application. That means you don't use node.exe or the node binary ansities. Instead you have your own native process which is running a major chunk or a larger chunk of your workload and you just want to make use of node for a specific subset of your workload. Say for example, an IO bound or a interactive workload as such. So the problem which we see is that the embedders do not know how do you consume node? What is the entry point to the node? And what are the things which you need to initialize? What are the things which you as an embedder can customize? And what are the control flow points? What are the two numbers, et cetera? So if you look at the main file that node implements for its entry point which is node.ccc or node.main.cc, you'll see at least three start functions, four init functions and one main function. So basically these are entry points which have different types of abstractions. One for example, one start function takes the whole arguments as it is for your input. Another init function takes or another start function takes a subset of that instead provide you with the flexibility of defining the inspector or defining the V8 engine or the isolate and things like that. So essentially you have five or six discrete APIs to call and control the sub modules of V8. This is good, but it can, it can lose the speed. You don't necessarily have five or six discrete use cases for embedding. You may have two or three but you may have maybe more than that but the whole idea is but do we have a documentation or do we have a higher level of abstraction that any embedder can be able to relate? For example, an embedder may not be interested in customizing an inspector. It will just go by what the node.js is providing as it is as a capability. And again, if you are embedding node.js you don't necessarily need to control the V8 engine as it is. So the whole idea is either talk to the embedding users or collect these feedbacks and then look at one of the two or three discrete way of embedding node.js, define it, expose the entry points and then document. At this point, one of the main pain point is there is absolutely no documentation. So people look at the C++ APIs I mean look at the source code and then they open questions in the node.js slash help report and then they just get along. So a question I have, so I know Electron is an embedder. Are there any other examples of embedders existing? Like one line of embedders or things like that? So I know IBM, we have a product called IIV, IBM integration bus. So in that it's a messaging system. We have a lot of legacy data that is coming from the main frame and we implement a queue and things like that. So there's a well-grained API, so well-grained data flow that's happening and then based on the type of the message if it is highly asynchronous, we pass it on to node. So there is a way of embedding node. We find this issue maybe two years back and then we are trying to see what is the best way of abstracting, best way of invoking the right abstraction within the node.js. So that's when we looked at which are the entry points that are making sense. Yeah, yeah. One other question. So I mean, I talked to Shelley Borough on the Electron team a lot about this kind of stuff that I don't understand, but would it be helpful for people to, like you said, there's no documentation, would it be helpful to, like, as someone who doesn't understand SQL's plus at all, would it be helpful to kind of go build out the structure of that and allow people to go fill it out? Or would it be, or would that be kind of our list of like, hey, we have this, but there's no content, someone will maybe eventually fill it out? Yeah, makes sense. Which way do you feel like would that be helpful, I guess? So my way of thinking is, if you document the existing state of the fact as it is, it can cause problems because the current state may not be optimal. Yeah. For example, tomorrow somebody comes and they want to embed more in a particular way, which is neither existing nor document. So then we are breaking the situation. So the ideal starting point is get a feedback on the actual embedding usage and then look at, holistically or from a design perspective, what are the different discrete way of embedding more and then freeze on that, then document. Okay. Okay, that's, yeah. I guess this is the last one. So streams. I would say this is not a pattern as such from the helper code, but it's there in helper code as well. Anybody who's working on code will also say this is a ever recurring pattern of a problem statement. People complain that the event, say, X event is occurring after Y event, whereas I was expecting Y to be happening first and vice versa. And this one keeps changing and keep occurring. The X and Y can be P and Q, but every time we see the life cycle events or the stream are completely coming in an arbitrary manner. And then transform stream, streams, I see quite many users complaining about, I want to transform the particular stream into a particular pattern, but I don't see the data coming out. I see the original data that's coming out, the transformation is not really happening or the particular data event is not triggering, et cetera. And multi-level piping is like A pipe, B pipe, C, probably the most common use case, whereas A and B piping into C or A piping into B and C, these are different scenarios of piping. So not necessarily all combinations produce the data in a desired manner. It's not necessarily because there's a severe problem in the notice, but because these are not necessarily defined as valid use cases. So as in many people with various issues, we look at that and see, oh, okay, there is a specific use case that is missing, A and B piping together into C is something which we do not think about. Let's implement that kind of thing. And there could be real bugs as well. I am not sure about that. But fundamental problem is that the stream API does not have a specification around that. Just like any API will have a well-defined input and a well-defined output. Documenter and the design follow the documentation or the documentation follow the design. Here, the fact is that how the stream works becomes the expectation or that becomes the documentation. But assuming or keeping in mind that streams need to work with a number of other processes, probably outside of Node.js as well, there could be child processes, there could be other endpoints coming from the network, which is feeding you with the data. Then you need to adhere to the protocol that is defined at the other endpoint or you need to align with that expectation. So, if those are not matching, then the result is basically this one, the half-assard life-cycle events. So, here my clear recommendation is that we should have a specification. By specification, I mean a set of defined principles, set of expectations about how the exposed API should work in any sort of implementation. The implementation can be customized, but the expectation should be standard. And that should work seamlessly across all endpoints of the child process interactions and things like that. Just so I understand you clearly in the example of the rights after end event, this is specifically like a problem in the user land where people are implementing streams improperly? No. No, this is basically with HTTP or any other APIs in the code, which internally make use of streams, yeah. But the whole point is, though you are making use of a core stream, the data control is within the application. The stream do not control the data as such. The velocity at which the data comes or the data type of the data and the endpoint from which the data coming, these are completely in the control of the application. Okay, so someone somewhere, maybe that created the API and you're consuming streams or something, they're breaking, they're the one that is particular, perhaps sending a right after the end event. You got what I'm hearing? Yeah. That's what I would consider it's a user land and should probably use that term properly. So they're not using, someone somewhere is not using streams in their events properly because there's not a well-defined stack for how to use them. Oh, okay, let me clarify once again. So we have a stream, which is said basically, you are making use of an HTTP capability and as an HTTP client, you know that the response coming from the server is actually a stream. Then if you implement the core back for the data event, for the end event or the finished event, et cetera. So these are implemented in the user land. The core back belongs to the user, whereas the Twitter of the event belongs to the code. So user, as a user, the expectation of these core backs are that they get executed in a particular order. Say for example, and the right happened first and then end, whereas that expectation is not met. So definitely the responsibility or the break is happening in the code. So that's pretty much I have. So just to reiterate what I said in the beginning, these are not necessarily the whole problems in the Node.js support report as important. They're probably the ones which I believe is following a pattern at the moment. And then we also looked at the root cause and what are the recommendations that can be made because some of them as discussed not necessarily will work as it is. We need a better approach, but some of the things which we know what can be done in a systematic manner. Somebody who is interested take ownership, champion it. Basically, ownership not necessarily mean implement the end-to-end solution for that, but just to kickstart the conversation in cases for define some milestones and see if you can get engaged with the people and delegate some part of the work, then have that as an initiative and as a competition. Do you have a link for your slides that you can share? Come again. Do you have a link for your slides? I don't have. I can share it with you. Thank you. Yeah. I feel like something that hasn't already been done is like the recommendation made is like an issue filed in the relevant repo alongside with like the problem in root cause and like volume that you've seen. It's such great user feedback if you haven't already done it. I haven't done it. I mean, I don't think you need to do it alone either, but I think it's just really good justifications for people considering how we're building things out, moving forward or where we can devote time, right? Like defining the node streams back. Yeah. I mean, from a tail would have very much agreed with that. But it's just about the time to find to do it or the people, right? Maybe tagging, you can use more. If it's easy enough for you to tag things that people can subscribe to the tags. Yeah. It's always good to have broken use cases to test the game. Yeah. This is great. We've got so much work to do. It's great work. Okay. Thank you so much. Thank you.