 So hello everyone, thanks so much for coming to my talk. We'll talk about loader's hook in Node.js. And yeah, I said it will be fun. I hope it will be fun. So a few words about myself. I'm Vladimir the Turkem. I handle Node.js for a security company named Screen. We do other languages, but I'm also a Node.js collaborator and then part of the Node.js security working group. So if you have security questions regarding Node.js, feel free to ping me. I'd love to help. Let's first ask, what would you need to hook modules in JavaScript and how you would do that? So do you have a way to know every modules that have been loaded into a Node.js application? It's time a module has been required. You want to know about it. So you've got two main solutions. Either you can check the cache. So each time a module is required in Node, it's placed in a cache that is accessible by the user. So you can check it. Or you can hook into Node.js and be told each time someone do require and import a module. So cache, it's pretty straightforward to access it. You can all test it. You just type require.cache. And you will see a massive object that has knowledge about the modules that are loaded and which module have required them. So you have the whole dependency tree. It's not the npm tree. It's really why the module has been imported in Node.js, who imported it, and who. That's doable. It just sits there, so it's pretty free to access. You just type require.cache and you have access to that. You have the full required tree. But first of all, you don't know when to check it. So let's say you want to instrument Node.js. It's an arbitrary decision to check the cache at one time and not five minutes later. You don't know what people have passed as an argument to require. You just know what is the module that has been loaded. But you don't know that require express actually loaded express.js slash index.js. And you can't rewrite the import. You can't modify the module as it is loaded in Node.js. And that's maybe something you will want to do and we'll see later why. So the other solution is to hook into require. So basically, you should not do that because it's overriding a private method in Node.js. That's why it starts with underscore load. And some people in the room will throw things at me if you do that. So you override module.load. And that's a method that is called under the hood by require each time something is loaded. It takes two arguments. The first one is request, which is the string that has been passed to require. And the second is parent. Which represents the module from which you have require. So in our case, on line four, I just log the parent, then in a row, then the request to tell you each module has required that. So if I run that, we have the log at the bottom. And we know that dot, the main module, required express. And that user video cam screen demo app that required that slash lib slash express, et cetera. We are able to see the module as they are loaded. You still have access to the tree because you know who has required what. And you can rewrite the import when they are loaded. So you can change the code that people see. It's ugly, and you should not do that. And it's synchronous hooking. Basically, it's what every 8 PM do under the hood, to be honest. So thanks to the great Node.js core teams. There is an API for that in Node.core now. It's still experimental. It's designed to replace the second behavior, ugly monkey patching. It's a clean API for that. It's asynchronous. And so far on Node 12, because I tested that on Node 12, it only supports ES6 modules. What does it looks like? That's a great question. Thanks for asking it. So let's say I've got those two modules, one named main.mjs, that imports another module that is lib.mjs. lib.mjs just exposes a function named hello that returns world. So if you start Node with dash-experimental modules, or even without a dash, without a flag on the latest version of Node, the code will actually run. And the ES6 modules will be resolved. So you will have a warning in the console, but you will display world as you wanted to in this code. And that just works, which is great. Now let's try to hook into everything that is imported. So we create a file named loader.mjs. And this file, this script, just exposes an asynchronous function named resolve. That's what we do on line three. This function takes three parameters. The first one is a specifier. Remember the load method we were using before? It's basically the same first argument. It's what people have imported. When you do an import in Node.js, that will go through this method and the specifier will pass. Then you've got the URL of the parent module. Once again, it's the modules that have imported the other one. And a default resolver that is basically a function to tell Node how to resolve the URL. What we do in this loader, we just console.log the specifier and the resolved URL. Meaning that when you require express, this loader will resolve the full path to the express main file and log that. And then we return the URL in an object with a property named format that is module. Meaning, hey, Node, it's a standard ES6 module you can load as you know how to do. So we start the program with Node-experimental-module-loader. And then we give the loader as a parameter. And then we put main.mjs. We have two warnings this time because we are using another experimental feature. So it's another thing you should not do in production. But then you start to have logs. So it's exactly the same code as on the previous slide. But we can see that slash user of editor cam, Webstone project, yeah, I love Webstone, LLO main.mjs actually resolved to the same URL. And on the second to last line, the slash lib.mjs resolved to a full URL to where the module is actually located. And then, of course, we've got the world that is the hello world from the line of the code. So basically, this API enables us to more cleanly with just an asynchronous function hook into everything that is loaded in Node.js. And that's a totally legitimate question on the screen. What is it good for? Let's do examples then. Who here loves TypeScript? Yeah, I like TypeScript 2. It's a great language. Reminds me of Java, but I like it. It's a great language. And the Node.js community and the JavaScript community at large loves TypeScript 2. So let's build a loader that will intercept everything that is imported. Read the source file of the TypeScript file you are trying to import in Node.js. Transpile it to JavaScript and tell Node to load the JavaScript file, not the TypeScript file. So if we manage to do that, we will basically have rewritten TS Node or, in another word, told Node to how to do with TypeScript files. We will tell Node without changing our code, just creating a loader, how to do with TypeScript. So here, we've got two files indexed and lib.ts. You can see that despite loving TypeScript, I don't put a lot of types in my examples, but it's a detail. Can we run that with one command without any build stage? Yeah, it's technically TypeScript. It's in a TS file. Can we run that in one command? Yes, we can. So once again, we create a function that's named resolve that is exported in the loader hook. What resolve will do on line seven is that if the specifier, meaning the module you are trying to import, the string you pass to import, if it ends with .ts, in that case, it will ask for new target, which is just a variable that calls a method named TS this, we will check that later, that gives actually the URL to a JavaScript file. So basically, what TS does is it takes the URL to a TypeScript file and will return the URL to a JavaScript file and guarantees that the JavaScript file exists and is in JavaScript. So let's check this method. It's pretty straight away. On line eight, what we do, we just read the file. We read the TypeScript file. It's just a compromised FS.read file. So we get the source code in TypeScript. We just read the file. Then we call the TypeScript module. So I did npm install TypeScript and I got the module TypeScript that I imported that I required into this file. It expose a method named transpile module. So as a first argument to this method, on line nine, I just give it the source code in TypeScript, okay? I give an option to tell it, please don't compile the imports. Give me a six imports. And as a result, I obtain a huge string that is JavaScript code transpiled from TypeScript. That's what I have in transpiled. It's a synchronous code, but basically it's just a JavaScript piece of code. I probably should put that in a worker thread but that's the next talk. In the same one, stay here. On line 12, what I do, I just write this JavaScript code into a file that is the same file from which I read the TypeScript but just I replace the extension from TS to MJS. That's what I do on line 12. And then I just return the URL to my new file. Does that work? So I never do live demos in my talk because I'm not crazy. But if you do not dash dash experimental modules, dash dash loader, loader that MJS and then you put test index.ts, actually the code will run. So instead of giving a JavaScript file to execute to node, we give it a TypeScript file and we just told it what to do with TypeScript which is pretty straight forward. We don't have compilation. We don't have anything to do before running this code. And that was the first example over three. Let's do another totally different and crazy and stupid example. This one is a remote loader. So I heard some people disliked package.json. Unpopular opinion, I love package.json. So some people want to be able to transparently get dependencies from the internet, not knowing where it's come from, not having any kind of hashing security or anything, signatures, and let's do that. So we create a loader that will intercept what's imported, exactly the same thing as the previous one. But this one will download the source file you required because you will give me a URL to require the file. I will write it on the disk and tell node.js to load this file. So here again, an example. So in index.mjs, we've got import start from as remote from HTTPS slash slash gist.githubusercontent.conviditor can da50. glas slash lib.js, and we do console.com of everything that comes from that gist. And in HTTPS the slash slash gist.githubusercontent slash Vladimir.ture can lib.js, we just have export const illo equals world. Can we make this world? Can we just start node, have it node, download the code and do the thing? Of course we can because basically since the API provides us with a need for an asynchronous method, we can do any asynchronous code we want into the loader. So once again, we create a result function that takes as a first argument the identifier. And in our result function, we'd say on line three, if the identifier, the specifier starts with HTTPS colon slash slash. Yeah, I did not put HTTPS, I'm crazy, but I have limitations, let's use HTTPS here. What we do is we just await req.get, so req is a cooler utensil to download the HTTPS, to download an HTTP client. I recommend using it, it's pretty good and promise ready. I'm being paid by the people on the first round to follow up for saying that. So you just req.get the specifier, which means you do an HTTP request, you get the content, it's available as a payload, you take it and you write it in slash tmp, tmp.mjs. Meaning you basically just downloaded the code from GitHub and write it on your disk before executing it, which is probably the safest thing you can do over the internet, right? Then you tell node to resolve the file based on the default resolver to open the file you have just written in the disk on line six. Does it works? Of course. So you do node dash dash experimental modules, dash dash loader, index.mjs, test index.mjs, and it will display that you are trying to display a module which has a property named hello with a value word. So it worked. And of course you can mix the TypeScript thing and the remote thing if you want. I think there's a project name like that that was based on 10 things you might forget about Node.js. But you can mix all of that and do whatever you want based on that. So it's pretty cool. Let's go to a more usable use case, one that can be used in real life and that might make sense. Maybe the TypeScript thing makes sense. Let's do dependency injection. So testing code is sometimes painful and some people and often you change your code to make it easier to test. And I actually don't like changing my code just for the sake of making things more easy to mark. I want to write code that is performance, that is easy to maintain, that makes sense. And then we will find a way to test it. But I don't want to have this constraint of oh, maybe I should pass that as an argument because it will be easier to mock later. No, your code should make sense to you, not to arbitrary machines. So what we will do is we will do a loader that will intercept the imports. This one doesn't change. And when something is imported, it will replace every export of that module with proxies. And it will expose these proxies to you so you can manipulate them. So question, what is a proxy in JavaScript? And to talk about proxies, I will need to talk about dogs. So let's say you have a dog class and on line two I declare all good dogs are good because brands are good dogs. So dog, it has a constructor with the name and it has a method name say hello that console.log move move my name is and I am a good dog and all dogs are good. So on line 11 we create a new good girl which is named Onika, she's pretty. My parents believe it's their dog but everyone knows it's mine. Even if we live 600 kilometers away from each other, I love you Onika. And then we console.log the dog and we tell the dog to say hello. It works pretty standardly. So if we console.log the dog, we've got kind, good name Onika and it's a dog. If we asked it to say hello, she would say woof woof, my name is Onika and I'm a good dog and that works perfectly well. And one morning you wake up and there is a text from your mother showing this. So the dog has made a party in the rubbish at home and my parents aren't mad at it. And maybe we misaligned the dog. Maybe, maybe Onika is not a good dog. She might be a chaotic good dog. Thanks for laughing in France, they did not get this joke. So yeah, she's still a good dog. She's just a chaotic good dog. We cannot unalign all the dogs based on that. So we want to find a way to make Onika a chaotic good dog, but she's still a good dog and without impacting all the other dogs, I don't want to change the construct of dogs just because one dog is not good, only. So I create a proxy. So on line one I create my dog, Onika, new dog and then I create a proxy around it. So I call new proxy. As a first argument, I pass the object I want to create a proxy on. And as a second argument, I pass an object named a handler. So basically what we will do here, we will define a getter on every properties. So the handler is just an object. You put properties into it. They are written in the MDN documentation and we will have one of the property we'll use is get. It's a trap named get. Meaning that each time someone will try to access a property on this object returned by the proxy constructor, this piece of code will be called. It's like a universal getter on that object. So we say that if someone attacks two arguments, the first one being target, the proxy itself, the object that is owning the property and prop, which is the name of the property you are trying to access. So what we do if the property is kind, instead of good, we return chaotic good and otherwise we return reflect.get, which is basically giving me the real value of the property of the object because if I called target.prop, it will be an infinite loop. So I do reflect.get and I return the original property. Then I console.log my dog and then I asked her to say hello. So as a result, we can see that she's still a dog. She's still good. She's still named donica. The base object hasn't changed. It's the way you see it. It's the way you access its properties that had changed. So instead of saying, woof woof, my name is Annika and I'm a good dog. She would say, woof woof, my name is Annika and I am a chaotic good dog. And that's a very educated dog if she can say that. So let's go back to our resolvers because that's still the main point of this talk and I don't want people to throw things at me if I don't deliver. So we create a result function. And what we do on line five, if we check if the module is not imported from itself, it's to prevent infinite loop. So just an implementation detail. But then we append an underscore at the URL on line seven. It's to avoid having two modules with the same URL because what we will do is instead of having a res.format which is module, remember in the first example, here it's dynamic. It's a way to tell Node.js this module you won't find on the disk. You will find it dynamically. It will be created dynamically from the loader. And then we return that. So what does it look like then? Because I don't call any other function. Hopefully the Node.js API is well done. So it expects in that case when you have dynamic modules, it expects another method to exist named dynamic instantiate. And that's the one that will be called when you have a dynamic module. So what we do, we remove first, we remove the last element of the URL. Remember it's the underscore I added on line seven here. We remove it and we call import. And since we call import again, that's why we had to check for an infinite loop here. We say that if we are requiring a module from here, then we require the main module. So in mode on line two, I have the actual real module loaded. It has been loaded as it would have imported it. Now I can rewrite it. So first of all, I need the list of the things that are exported by this module because Node.js will want that. So that's what I do on line three. I just do object.keys and I get the list of all the exports of the module. I push an extra one that I call test underscore mock. And that's basically the one people will use to manipulate the proxies. So I push it. Then I return an object that has a first property named exports that contains a list of everything that is exported by the module. No problem. Everything that is exported by the module. And then we've got a function named execute that is executed when the module is dynamically instantiated. What does it do? It creates a map on line eight that I named mock. And I set experts, export the test mock to mock. Meaning like if you import this module, if you import the modified module, there will be an extra property named test mock that will expose this map we created on line eight. And for every property exported in the module, that's a for loop on line 10, we create a handler object that is an empty JavaScript object on line 11. We make it available in our map, meaning that we are making the handlers of our proxy available through the map. Meaning that if I have a method named x exported, my map created on line eight mock will have a property which k is x and which value is an empty object that is the handler for the proxy. And then we set the exports, that's what we do on line 13. We set the exports for that key to a new proxy with as a base object, the previous object, the one that was genuinely exported by the module. And with the handler, the empty handler we created. Meaning right now it does absolutely nothing. So let's try it out. Let's import assert first and then we import a module named lib1.mjs. You see that this module, it exports a method named main, but also I import test mock. My IDE is not happy with that because the real module does not have a property named test mock, that's why syntax coloration is bad. But I know that it is here because I created it in my loader and I made it available in this proxy, in this file. So test mock is actually a map. So on line six, I check that main returns hello world and it works. Main is a function that returns hello world. That's what's happening on line seven, that's interesting. On line seven, I take test mock, so it's a map. I ask for the property which is behind the key main, meaning it's the handler on the proxy, on the function named main. I know it's a bit deep. And to this object, I add a property named apply. Remember I showed you get in the example about the dogs. Apply is another trap on proxy that will be called each time someone call this as a function. I'm pretty sure if you can replace undefined with a proxy, you will be able to make undefined a function. Should you? No. And on line nine, we tell node what to do each time someone call main, meaning that we tell it reflect dot apply, target self act, call the original main, but append two exclamation marks, two bangs at the end of the string, at the end of it. On line 11, we call main again and here instead of just hello world bang, we have hello world bang bang bang, which worked, meaning that we were able to replace the exports of a module in a loader, to expose a way to the end user, to the end module, a way to manipulate these modifications, meaning that in your test file you can basically rewrite every piece of code without making it easier to rewrite. I know this sentence doesn't make any sense except in my head. So on line 12, we put undefined in apply, meaning we basically remove the property from the handler and when we called main in line 13, it shows us hello world bang as expected. So it works and that's already something. Let's wrap up. First of all, none of that is prod ready, so don't use it. I mean, it was a talk for the show. Now it will probably be extremely useful in the close future, but secondly, we probably can do dino things with that. You remember like running JavaScript, TypeScript transparently, getting modules from remote location without package touch isn't. Do we want to, that's another debate. I don't want to get into that. We could have a generic API for transpilation, so this one is exciting. Let's say that TypeScript, the module, expose the method name, transpile for node and that coffee script for the people who still use that, the coffee script module would expose a method name transpile for node and maybe even we could imagine weirder things like even the L1 transpile for node. We could start node with dash dash transpilation, dash dash or dash t, then put the name of any NPM module, like you would say node dash t TypeScript and then node would run TypeScript transparently for you. We could be even more weirdos with that and go one step further and put a rust to WebAssembly transpiler here and you would do node and call rust fights directly for node, but thanks to a loader, it will be transpiled to WebAssembly and loaded in node. Or you could do that with a M script and to load any C++ libraries or anything that compiles to node or WebAssembly, we could transparently require to node once again do we want to, it's another debate. Thanks so much for your attention. This, let's keep in touch, you've got my contact details and I swag at the end of the room including these dog stickers and some checklists.