 We have 10 people waiting on the stream already, so it's gradually ramping up. Yeah, but I guess it's nighttime in the United States. Yeah, yeah, and the US, that's gonna be difficult, but that's why we do the recording as well. I'm just gonna go live now, we can hit 10 o'clock. On the stream. Okay, okay, the live stream is up. We'll give it one or two more minutes before actually starting the talk. Just let people join in. Just again, for the people on the stream, we'll get started, well, now it's two past the hour. Okay, so I would like to welcome Rui Uiyama, the developer of the Mold Linker, a modern linker. So Rui is an independent software developer who used to work at Google and who recently got a master's degree in computer science at Stanford University. He's also the original developer of the LLVM LLD linker and he has now started working on a new implementation, a new linker that's called Mold. So go ahead, Rui. Yeah, thank you for introducing me. So my name is Rui Uiyama, can you see my face on the screen? Yes, okay. So today I'm gonna give a talk about my new linker, whose name is Mold, which is obviously just a joke name, but it's back-named to modern linker because LLD is the linker's command name. So here is the overview of today's talk. So this is going to be 45 minutes presentation followed by 15 minutes Q&A session. So I don't think that I can cover a lot of stuff during this talk, but so in the first half of the talk, I will be focusing on what the linker is because people do not really understand the actual job that the linker is doing. And in the rest of the talk, I will be giving an idea, high-level idea as to why Mold is so fast, as are the linkers. And lastly, I will give you some ideas as to how to write on a faster program in general. So who am I? So my name is Rui Uiyama, I'm a full-time independent developer since 2020. So, and before that I was working for Google for 10 years and at there I was in the compiler team and specifically I was working for LLVM and I was creating LLV linker. I'm the original creator of the linker and I left the company and then I started a new project called Mold and I've been working on that project for almost two years. So, and besides that, I have a few other open source projects like a small C compiler called Chibi-CC. So if you are interested, please visit my GitHub page to see what I did in the open source world. And since I'm an open source developer, I don't have any revenue so far. So if you are happy with my product, then please consider supporting my project via GitHub sponsors or I am making a commercial support contract with me so that I can help you guys to introduce initial introduction of the Mold linker to your organization. And I think that we can also fix bugs for you. So status of the project, how much use for the Mold linker is? So it's, so for Unix or Linux, the Mold linker is actually production ready and used by a lot of projects and many organizations. And the reason why people wanted to use the non-standard linker instead of the standard one is because Mold linker is extremely fast. It's so fast, but I will talk about it later. And so these days, I'm actively working on Mold linker for macOS or more specifically Apple operating systems such as iOS, WatchOS and of course macOS. And it's pre-alpha, but its performance looks pretty promising. So even though it's not a general use, I think that it's going to be an interesting project. So yeah, if you are working on macOS or iOS, stay tuned. And after I finish macOS version of Mold, I will, I haven't planned to work on macOS, Windows version of Mold so that we will be covering all the major platforms using the single linker. And all of them, and these linkers are just a drop-in replacement for the existing linkers. So you can, you don't have to change the command line other than the command name itself to use the Mold linker instead of the default one. So what is the linker? What's the linker is doing in the built system? So linkers seem to be a complicated piece of program because it has a lot of features and a lot of command line options. But the fundamental concept of the linker is actually pretty simple. So what it does is, so if you are writing a program in a compiled language like C, C++ or Rust, the compilation or build step consists of two stages. In the first stage, you compile individual source files, individual source file into an object file which usually have .o file extension. And then in the second stage, you combine all .o files into a single executable. Or the shared object file. So and even if your program consists of single source file, chances are you are linking that object file to the standard library. So even if your program consists of a single file, you are still using the linker. So you are using linkers every day or every command location. So in order to understand how important or how fundamental the idea of linker is, so this is a quote from the linkers and the loaders book published in 1999. So according to the book, the very idea of the linker was mentioned in 1947, which is only two years after the digital computer is invented. And in that memo, a small program that combines a pieces of machine code into the single program was mentioned. So and if you think of it, 1947 is the year even before the assembler was created. So the linker would be in a program that you wanna write after inventing a computer program, even before writing the assembler for that computer. And that's understandable because after inventing the digital computer, chances are you are writing code in hex code but you are writing some sub-routines like sorting routine in machine code. And you wanna reuse that piece of code in many programs. So naturally you wanna write in a program that combines pieces of fragments of program into the single program. So where are you invoking the linker in the usual build process? So here is the example command line of the open up build system, which is actually building G-mold linker. So because the linker takes all object files, it usually invoked at the very last step of the build. In this case, it's the emphasize line of the command line invocations. So the linker's command name is usually LD which is usually installed under bin slash user slash bin. But it is not that you don't usually invoke LD directly from the command line, but instead you invoke it via CC or GCC or clang or clang plus plus whatever of your choice of the compiler. So you are usually thinking that the CC or GCC is a compiler name, but it's actually the compiler's front end name. And that front end invokes the appropriate back end command based on the given file names extensions. So if you pass .do files to the compiler driver, then it invokes LD. So this is how you invoke the linker. So you can see the actual command line name of the LD command by appending hyphen triple hashes to the compiler driver. So in this case, we are compiling a whole world program into another single program. And I'm appending hash, hash, hash to clang. And it prints out the actual command line invocation of the LD. So the compiler driver adds a lot of command line arguments to LD, like hyphen G, real role or whatever. And it also passes a lot of file names like CRT1 or CRT whatever. And these are implicitly linked to your program because they are startup routine and shutdown routines that needs to be linked to every program. And the compiler driver also passes the directory names in which the standard library is stored, like user being live or whatever. So the compiler knows the structure, directory structure of the system. So it passes all the necessary command line options to the linker. And if you have any question, please interrupt me anytime so that I can answer. So here is the question, why do we need a linker? I mean, why can't you make the compiler to directly emit the entire complete program instead of the fragment of the program? And the answer is, well, we can actually do that. We can actually make the compiler to emit a complete program. But in order to do that, you have to pass all source code to the compiler, including standard libraries. And sometimes like if you are building web browser, the amount of source code is in the order of tens of millions of lines of code. So it's not practical to compile all code every time you make a single line change to the source code. And it is also just a waste of time to compile the same function again and again every time you invoke the compiler. So for example, if you are using printf, then you don't wanna waste the time to compile printf function every time you use printf because printf function is the same function. So you just compile once and the link to all programs that needs printf. So the idea of separating compilation from linking is called separate compilation. So in order to understand what the linker is doing, you first have to understand what's inside the executable and how it is executed in the computer on the computer. So this is a very simplified diagram of the executable file and its memory image for being executed on the computer. So essentially executable contains initialized data, which is essentially in a global variables with initial values along with the machine code which should be able to be executed for a map to memory. And a little bit of terminology. So in this context, text doesn't mean ASCII string. In this context, a text means machine text or machine instructions. So it's machine instructions. So text segment contains machine instructions and the data segment contains initialized data. And they are mapped to memory when being executed. And the OS kernel initializes the stack area which contains local variables, usually to the very top of the memory address space. And then jump to the middle of the text segment to start executing the program. So it's a very simplified picture, but it's an essential idea, fundamental idea of what's going on. So how is the executable file created? So object files and executable files are kind of similar. So object files also contain data. It's not a really segment, but object file contains data and code in the separate sections. And the linker combines them individually and concatenates them to construct an executable file. So what's in the object file? So here is the... So I'm going to explain it. So here is the compile... So if you compile a small piece of program that is shown on the left-hand side of this slide to now assembly, you will get something like the assembly on the right-hand side. And so this is a Hello World program. It's putting out a constant string to the standard output. So take a look at the assembly output. So if you carefully take a look at it, you would notice that there is similarities between the C program and its compiled assembly. So assembly contains a literal string, as it is. And it also contains a function call from the fourth line from the bottom called printf. So it kind of preserves the original structure of the program. So function call is still a function call in assembly. But the assembly text is not something that is in the object file. What's inside the object file is actually a machine called which is not readable to humans. But we can disassemble it to display it as an assembly. So this is a picture of the assemble output of the same function. And since assembly is a language that's very close to the machine code, disassembling it is kind of one-to-one operation. But not all information can be restored from the compiled code. So take a look at the call instruction in the executable object file. So here is the line to call printf function. So E8 is x86 opcode to make a function call which should be followed by four bytes displacement to the location where we want to jump next. So the four byte offset has to be filled with the relative location from this very instruction to the beginning of the printf function in memory. But the problem here is that the compiler doesn't know where the printf function is at runtime when we compile source code. So we include header file, but it only tells the compiler of the existence of the function and the type of the function. But it doesn't tell you where the function is at runtime. So the compiler cannot fill that value. So it has no choice other than leave it at zero. But instead, so it leaves a piece of information in the object file saying that it will fix these four byte value with the relative distance to the printf function. And that kind of information is called relocation. So the linker find combining object files interprets relocation records to fix offsets. It's essentially by binary patching object files created by compiler to make the complete program. So based on that understanding, here's a basic picture of what the linker is doing. So linker does combine object files into a single file and then applying relocations so that inter-object file references are correctly referring to the right place. And this is the, this is actually the fundamental understanding of the linker. And everything else is secondary. That being said, there are a bunch of features that linker has to deal with to link real-world program. And notably, linker usually have to handle libraries like the standard libraries or the third party library or not library you wrote for yourself. So let me explain it. So there are two types of libraries, static and dynamic. So I will give an explanation of the static library first. So what is static library? So static library is just a bundle of object files. Fritz usually has .a file extension. And the file is actually just a bundle of object files just like zip or tar. There is no technical reason not to use zip or tar, but historically we are using dedicated file format, which is usually called just archive file format. So you can pick inside of archive file using AR command, just as shown on this slide. ART is a command to print out the file names inside the archive file. And on this slide, we are displaying the contents of the libc static library. And it contains a lot of object files. It actually contains libc.a actually contains an object file for each public function. So it contains for example printf.o or vfprintf.o for printf and vfprintf. So they are separated into their own object file. So why they are separated? Why can't we make a single object file that contains all libc functions? Well, we can. But if we do by default, the linker links the entire object file into the output file. So which means that you always link the entire libc. That's a huge waste of space, right? Because you don't usually use all the functions of libc, but just a fraction of it. So for that reason, libc source code is separated for each function, and then each function is compiled into their own object file and then bundled as an archive file. And in that way, you can save a lot of space. But you don't want to specify which object file to link because you don't really know which files are needed for your program. So that part is automated. So if you give an archive file, a static archive, I mean static library to the linker, then here is what the linker does. Here is what the linker does. So linker reads symbols from object files in an archive and then pulls out object files that's needed to complete your program. So it automatically fills the gap of your program by filling the missing symbols. And that's exactly what you expect from the linker. So what's the advantage of the linker? A static linking. So it's, first of all, pretty simple. It's just a bundle of object files. And once object files are pulled out from archives, then it's just linked. So it's simple. And the other benefit of static libraries is that the libraries code is directly copied to your executable. So your executable is self-contained. So for example, if you completely static link your executable, then you can just copy that executable to other machine to install your program. You don't have to copy any other files. But there are disadvantages of doing this. So, well, first of all, duplicate copies of the same functions are copied into many executables. And the other disadvantage is that if you want to update a function for some security reason or something, then you have to re-link all executable, that link against that library. So there is another style of the library, which is dynamic library. So I'm going to explain it. So dynamic library is a separate file. And the file is loaded to memory along with your executable and other dynamic libraries. So the pros and cons of dynamic library is opposite to the static library. So advantages are you don't have to copy the same function bodies into many different executables because you only have to have a single copy of shared object file. And updating the shared object file is very easy because you only have to replace the shared object file. Then, yeah, you can, then everything else will be updated. But there is, of course, a disadvantage of this mechanism. So first, it's complicated. It's much more complicated than a simple static library. And there is an overhead of process startup time because just loading the dynamic library along side by side with your executable is not enough to make it executable in memory. You have to fix some places to combine them in memory. So there is some work on process startup, so which makes process startup a little bit slower. And the other issue is that, so historically, Windows has an issue with DLL health, which is if you replace the shared library with the newer version and the newer version behavior is slightly different from the previous version. Or if your previous version of, if your executable depends on the buggy behavior of the previous version, then your executable stops working even if you are not touching the executable file itself. So how is dynamic library implemented? So after loading two files into memory, you have to fix a little bit of data and code so that the internal file differences are correctly pointing to the right places. So for example, let's say you are linking against dynamic library version of DBC and you are using printf function in your program. Then after loading your program into memory, you have to fix the machine instructions of code printf because the exact location of printf is not determined until you actually load DBC into memory. So you have to fix them at runtime just like we did for static linking from file. But there is a problem of naively applying that approach. So if you are fixing a function called directory, then you essentially have to mutate all pages of your program because you have to, for example, if you are, if you want to fix all locations of printf code, then chances are there are thousands of printf function calls and you don't want to fix thousands of places in your program because it slows down your program loading and it also breaks physical page sharing. So there is a trick of circumventing the problem. So while there is an all problem can be resolved by introducing one more in direction. So in this case, we create two special sections called got and PLT. And we get all the external references through got or PLT. So for example, PLT, if your program has printf function call, then PLT has an entry for printf and all printf calls jumps to PLT first. And the relative location between your text segment and the PLT segment in the same file are always fixed wherever the image is loaded. So you don't have to fix the location of printf call. So you only have to fix the PLT entry. So it significantly reduces the amount of data you have to fix for dynamic linking. So in order to do that, got and PLT are created by linkers. And you don't have to create got and PLT if they are not external references. So you have to scan relocations first to fix the contents of PLT and got and then apply relocations in the linker. So this is a tricky part because you have to do this first. This has to be two steps because in order to fix the file layout or memory layout of the executable, you have to know the exact size of got and PLT. But you don't know the exact size of PLT and got until you scan relocations. So you scan relocations first to determine the size of got and PLT so that you can fix the file layout and then apply relocations by scanning the relocations again. So here is a summary of the talk I gave so far. So this is what linker has to do. So it reads object files, needs symbols and scan relocations to determine the size of PLT and got and then once we fix the file layout, we can copy data from input object files to output executable while we are applying relocations. So that was the overview of what the linker is doing. So this is going to be the second part of the talk, which is specific to mold. So why does the linker's speed matter in the first place? Well, because first size is always better. So I mean, linking is still one of the slowest steps of the build, especially if you are building the program from scratch, chances are the entire build time is dominated by compiling, not linking, but if you are doing incremental linking, I mean, if you modify only in a file and then build again, then the compiler compiles only in a file and then linker invokes, linker is invoked to recreate executable. So the second part takes sometimes more than 10 seconds or minutes or even more. And that's very important because that's an interactive session in which you are actively writing code. You are in front of an IDE or editor. So if we can save, say, 27 seconds for each build, it's not only saving 27 seconds, it keeps you maintained focus, which is very important because if it takes 30 seconds, you would switch to the browser and start browsing, but if it takes only two seconds, then you can just wait. So this is the fun part. How fast is mode? It's extremely fast. So this is a graph of GNU Gold LVM-LD, which I originally created a few years ago, and the mode linker. And the default linker is not GNU Gold. The default linker on Linux is GNU-LD, and I omitted these numbers from this graph because it's too slow. So mode is sometimes more than 10 times faster than the other. So roughly speaking, you can expect that on the high-core count modern x86 machine like 10-core or 16-core machine, you can expect that mode throughput is 1 gigabyte per second. So if your executable is like 2 gigabyte, then mode can link it in two seconds, which is very easy to understand. And it's extremely fast because it's actually only twice as slow as CP, copying the executable file to other files on the same machine. So considering that the linker is doing more stuff than just copying file content, it's extremely fast, which probably means that it's almost impossible to create a linker that's significantly faster than mode anymore because it's already almost eye-opened. So why is mode so fast? What's the secret behind it? So that's a question that I often got, and there is no single answer. So if you improve some existing program and get a better number, then you can attribute the speed to your change. But if you create something from scratch, then everything is different, and you cannot really break down which contributes how much. But I have a few ideas as to why that's so fast. So the first thing is that we do not do too much. I mean, we just end up object files into memory and directly consume data structures on the file. We do not create intermediate representation of the data structures if possible because while constructing the intermediate representation takes time and memory, and taking memory means taking more time to construct that data structure. So we do this. And the other important thing is we parallelize all internal passes of the linker. And that's important because if you partially parallelize an internal pass, you can still speed it up, but not that much because of the arm-down flow. So let's say half of your program is parallelized. And you cannot make it any faster than 2x because the single threaded part is going to be the bottleneck. How many, no matter how many cores you have. And we have other clever techniques in mode, I'll talk about that later. So here is the visual explanation of the differences of reality and mode. So to the left is LLD, and this is the usage of power core CPU usage. So during the execution, LLD runs only on a single core and sometimes uses all cores. But throughout its execution, it's mostly single threaded. And that limits the overall speed because of the arm-down flow. So on the left is the same benchmark of mode. And in mode, it uses all cores from the beginning to the end. So in this demo, I captured the number of cores to 16. It can scale more. So yeah, it's parallelized. In all internal processes, and it finishes very quickly, as you can see. So how do we parallelize the internals of the linker? So in order to understand that, you have to understand the size of the problem. So here is the list of elements in input object files when you are linking Chrome with debug info. The output size is about 2 gigabytes. So for example, in total, you have 16 million relocations and like more than 20 millions of symbols. So you have a very large number of data of the same type, sometimes in the order of millions of tens of millions. So we employ the data parallelism in order to parallelize the linker. So data parallelism is a paradigm to process large amount of data of the same type in parallel. So it scales very well because in order to process each individual piece of data, you don't have to synchronize to bit of a threat. And it's also easier to understand than the program with more complicated synchronization mechanism and communication mechanism because data parallelism is essentially in a for loop that happen to be executed in parallel. So besides that, we also use some clever smart data structures for concurrent programming. One is that we use concurrent map, which is implemented in Intel TVB. Intel TVB is a library to support concurrent programming and I really recommend using that library if you are doing concurrent programming. So concurrent map is just a map, but it is designed in a way that you can efficiently insert new elements from multiple sets into a single map. So we use concurrent map for symbol name resolution. So there is only, there is a central hash map, which knows all symbol names. And for each object file, we run thread to read the symbol table and install symbol names into the concurrent hash map. And since concurrent hash map scales well, we can do that in parallel. And that speeds up a lot compared to reading each file individually, sequentially. We also use other parallelism patterns like map reduce pattern. So here's the example. So if you pass hyphen hyphen build hyphen ID to the linker, then that instructs the linker to compute a crypto hash of the object file and then embed that hash function, hash value as a signature to the output file. So as long as you get the same output file, you get the same signature, so it can be used as a signature. And computing the crypto hash for a large file takes time, like seconds, in the order of seconds. So we want to parallelize it. So here's what we were doing. So we consider the output file as the sequences of small records, like 10 megabytes of records. And we compute share hashes for each pieces of data. And then we compute the share hash of the share hashes as a signature of the file. So this way we can parallelize the first step. And then since share hashes is very small, the second step doesn't take too much time. So I think this is a, no, this is not. And we use other techniques to speed things up because more point is to make to is a speed. And we use we are using all techniques that we can use to speed it up. One of the significant difference, one of the changes that made a significant difference is to use alternative mark function. So G libc mark function function didn't scale well for many calls. So we benchmarked with J mark, TVV mark, TC mark and the mean mark, which are excellent implementations of mark function. And then based on the benchmark, we chose the mean mark, which is created by Microsoft that scales best for us. So thank you for Microsoft for creating the mean mark and open sourcing it. And the other technique we are using is we noticed that writing to an existing file that's already in buffer cache is much faster than creating the new file and writing to it. So we overwrite an existing file if we exist, unlike other linkers. The other techniques that we use is that so we noticed that if we end up a lot of input files and large output file, exit system call takes time like few hundred milliseconds. So if you are aiming one second or two seconds ago, a few hundred milliseconds is a huge waste of time. So we organize the linker into two processes. So the first process, the parent process, invokes the child process and the child process does its job to link the program. And as soon as the second child program completes its job, it notifies the parent and then parent exits. And it looks like the exit of the program from the user's point of view. And the child takes time to exit because it's not an interactive process. So it's just a trick, but it's effective. So I think that I can give a few hints on writing faster program. That's not specific to the mode. So this is I think the last slide of the talk. So the first piece of advice I can give is don't guess measure as often set as actions. So because speculations are usually wrong. So if you optimize based on speculations, you are optimizing code that doesn't really matter. And that's not just a waste of your time, but it could complicate your program unnecessarily. So don't optimize until it's proved to be optimized. The second piece of advice is that don't try to write faster code. I mean, faster code is important, but you should design your data structures so that it is natural to write faster code. So as Rob Fike said, data is the central piece of the program and the code is around it. I think that's correct statement. The third piece of advice is that sometimes you have to write, implement multiple algorithms for the same part of the program and then choose the best one. It's because you cannot make a guess as to what algorithm will be the best one for your program. So there is sometimes no choice other than just do it and choose the best one. So it's a reality of writing code. The last piece of advice is the extent of the third one. So sometimes you have to rewrite the entire program just like I did to LLD and the mode linker. So you can learn from your first implementation and with that knowledge, you can write a better program on the second and third iteration. And this is sometimes necessary because if you change the fundamental data structure of the program, then you have no choice other than doing everything from scratch. So I don't think that that's totally in the waste of time because writing the same program twice doesn't make take two more times of your time because, yeah, second time attempt will be much faster. So it's sometimes worth it. I don't recommend it casually, but in reality you sometimes have to do it. And I think that's the end of my talk. Right. Thank you very much, Ruri. I think this was really good, really good focus and technical but not too deep, let's say. And I'm sure there's lots of more technical tricks that you pull in mold to make it faster. Yeah, if you are interested in the actual optimizations I made to the mold linker, you should take a look at the source code. And I also wrote design documents and saved to GitHub so please take a look at the source code and the documentation. All right. We have a, I think a bit of a technical question from someone watching the stream. So I'll just read the question. Do all calls to a common external symbol share a unique record and the and the ptl and God's tables so the advantages there's only one relocation per symbol instead of and relocations. Can you say the beginning part of the question. So do all calls to a common external symbol share a unique record in the ptl and God's tables. Do all God. All calls also all calls. Yes. Yeah, so that you're basically making, making those tables so there's a unique place to then reference from. So you aggregate all function calls into one place before it jumps to external module. Right. Yeah, okay. And the follow up question that I had related to that is that you mentioned that before you can create those tables the God and plt you need to know how big they are so you have to scan all the symbols first you need to know how much room is needed there. Yeah, that that seems like something that you may be able to win some, some time on as well. So, and I'm not sure if this could ever work but what if you just make an educated guess for the size. Let's say, I don't know one megabyte I don't know if that's sensible but just to name something so you say, okay let's assume it's one megabyte and it will start filling it. And only when you notice it's not big enough. You, you redo things so I think you may end up with a situation where in most cases it's very fast. And in some cases you have to redo some work and then it's maybe a little bit slower. You can do that. And you can also put the plt and got to the end of the file so that it can grow without pushing other stuff. Down in the memory address space. Yeah. There are. So the thing is that maybe the shortest answer is that scanning all relocation doesn't take too much time. We can do that for Chrome size program in 1550 milliseconds, or at least less than 100 milliseconds. There is no strong reason not to scan relocations and go proceed and try again or whatever. It will make much of a difference, I guess. No, it doesn't. And so we just scan relocations for the sake of simplicity. Yeah, it goes back to, well, one of the hints here on the slide to optimize where it actually makes sense. Right. So, yeah, right. And there is another issue if you place gotten the plt at the end of the section, because now you have two text sections, one text section for the main executable, and the other text section for, for plt. So plt is an executable piece of code. So, which means that the kernel has to map pages twice. And I guess that that incurred some memory pressure, if you have tons of processes in the single computer. So, if you can, if it's doable, continuous piece of data is always better than separated piece of data. Yeah, in terms of caching and things like this. Yeah. Okay, maybe. Well, I see other questions coming in, but I'll ask a high level on first. And I'm not sure if it matters, but does those both also support linking of object files that that came from a for trend compiler. Is that relevant at all what what the source programming languages or is that irrelevant. It does. I personally didn't try, but obviously people are using more to link for some programs and common this program. So, and sometimes the d language. And of course last. So, the object file format is essentially the same. Otherwise, you cannot link for some programs to see plus plus code. So there are some differences of feature set that they are using. So there could be a bug, but the. So, in order to test the mold, I built all gen two packages using mold. And I found a few issues with photon because gen two contains a few photon programs. And even though I cannot write or even read photon programs. I somehow managed all unit to test field failures for photon programs so I believe it works. And if it doesn't please find a bug so that I can fix. Yeah, for us this may be important because in scientific software there's still quite a bit of Fortran. Yeah. But it's good to hear that it's supporting lots of different languages and it sounds like you're not aware of a language where it doesn't work right there's you haven't found anything that it's really it's known not to work at all or No, essentially it should work for all languages. Well, speaking of which, some languages has its own object file format. For example, go. The goal language is so isolated that it has all on set of compiler to linker to runtime and its own object file format so it doesn't work with any other language. And, but I don't think that that matters to you because if you are writing code go, then you must use goes linker anyways. Okay, we have another. Another question coming in. So how easy is it to use or tweak a linker to change the content of a constant string in the data segment of a binary and relocate all the addresses properly. So, and then the follow up question would that even be possible. Having access only to a binary and not the originating source files. Um, I don't quite get the meaning of the question. So it is. So can, can you use or change a linker use or use a linker to change the content of a constant string in the data segment of a binary and then relocate everything properly. For what. It's been basically rewriting. Yeah, I don't know what the use cases. Yeah. So, rewriting the stuff doesn't shouldn't, shouldn't hard be hard. It depends on what you want to do. Exactly. But if you want to rewrite, for example, string constant with other string. Then you can just hack it up. So it shouldn't be too hard. It all depends on exactly what you want to do. And the use cases is rewriting paths to files on relocation. So, Okay. So you basically have a part hard coded. I guess this is with our part or run without linking. You have a part hard coded in the binary anyone is you want to change that to something else. I think that that's not too hard. Of course, it's not a feature that's supported by the standard linker. So you have to modify the source code of the linker to look up that string and rewriting, but the change itself shouldn't be too hard. Okay. I hope that answers the question. Okay. One, one thing that I was thinking about as well so you're when the linker glues together these object files and then make sure that link it to a shared library all of that works nice. There's there's lots of choices you have to make in terms of how things are laid out in memory right. There are also opportunities there for making the actual binary itself faster to loads, just by putting by puzzling the text, the text segment together differently so maybe trying to minimize jumps between functions or not go too far in memory. Yeah, there is a lot of research is in that area. So recently, Facebook donated both BOLT to LLVM and what that is, is, is to optimize a program so that its memory layout is optimized for better CPU caching. For example, if you. So it's based on profiling. So it runs the program first and then use the profiling result to optimize the same program further. So if you identity can identify cold functions and the hot functions of the program, then you, you don't want to mix them because that's the waste of, you know, cash in cash entries. So you want to separate cold functions from hot functions. And also you want to separate hot functions into two functions that often called sequentially next to each other so that they are being cashed in very well. So there are a lot of research is done in that area and some linker has a feature to rearrange object files as they are instructed by now. Command line mode doesn't have the feature yet, but we eventually have to support that feature. Okay. Let's say I have two more questions if I don't see any others popping up. I'll just raise mine. So yeah, you mentioned that a big part of mold is that all the data structures internally are being processed in parallel so you're using all the available course and that's a big part of the speed. So why is this new for a link for like why wasn't this done on LLV. It never occurred to you that this did make sense or you didn't see the point that we're not enough course maybe back then or. That's a good question so speaking of new linkers. The original new linkers that original new linkers was written in 1980s, I guess, so it's extremely old and I don't think that multiple core machines are popular in that area. It's designed to be scale well for multi-threading, but it turns out that it didn't scale well in practice for some technical reason. And when I was creating LLV, my main focus was to create a linker which is compatible with GNU and the speed was the secondary. So it turns out to be much faster than GNU, but it wasn't the original goal. So if I were aiming to the better performance from the beginning, it could have been different. I think that's the answer but the other thing is that if I were trying to get a better performance when I was writing LLV, maybe I couldn't do that because I didn't know very well about the linker and how to optimize it in terms of realism. So I learned a lot from writing LLV and then use that knowledge to make it a better one. No, that makes a lot of sense. There's one more question coming in which is quite close to one I had as well. So how does mold figure out how many cores it can use? Just use everything in view and can you limit it as well? So can you tell it how many cores it can use rather than using everything it sees? So it's automatically recognized by the library, which is Intel TBB. So it by default tries to use all cores. So essentially it spawns the same number of threads as the number of cores and then try to parallelize as much as possible. But you can limit the number of cores for the linker. So the mold takes hyphen-thread-count or thread number. I don't remember the exact command line option, but it takes the number as an argument for the command line. So if you pass that option, then mold wouldn't start more than the given number of threads. Yep. Okay, excellent. That makes a lot of sense. Yeah. Our main worry here is that we're often building software in a constrained environment, like C-groups or anything like this, where maybe there are 16 cores available on the system. But for that build, you're only supposed to use four. And then, I'm not sure if Intel TBB is aware of that. I would guess it is, but I'm not sure. But if it's not, we can just tell mold how many threads it can use, then that's okay. Yeah. All right. Cool. I think we're running out of questions to ask. Yeah, I think that's pretty much it. So I want to wrap up with thanking you very much for the excellent talk. I think the technical level and the amount of depth was perfect. So thank you very much for spending the time to really explain to us what the linker is and what it does. And yeah, all the best of luck with your work on mold. And again, I just want to echo one of the things you made at the start. You're doing this basically independent. So if people are out for sponsoring you to keep working on mold, they should definitely consider that. So they can take a look at your GitHub page for more information. Yeah. Thank you for inviting me to give the talk. Thank you very much. Bye-bye. Bye-bye.