 So I've created a program for you today which will greet the conference. It's a very straightforward C++ program. We will invoke a function that's defined in this file. We will print a string using this method from a library that allows me to print strings. And if everything goes well, we should see that in the output. So let's take a look. We've compiled it, we'll run it. Indeed, we see the output. Great. Let us iteratively improve this program. I've made another library that will allow me to bid farewell to this conference. Specifically, once we're done greeting it, we can call this method that comes from this header file farewell bang bang con. If we look at it, it just tells us how to use it. There's no actual implementation there. The actual implementation lives in this other file called farewell.cpp. And this means that when we call this method, we will then print some more strings. And if everything goes well when we run this program, we should then first see a greeting followed by a farewell. So why don't we give that a try? Because the trick here is that we need to not just compile one file now, we compile two files. And then we need to take the results of that compilation and put them into one final binary. And we should see it all work unless it doesn't. And that's actually really surprising. There's two mysteries here. One is that first, this program is aborting and it's supposed to just be printing strings. That's weird. Secondly, the only change we made to the code was adding more code after the initial code that we saw worked correctly. The greeting that used to be printed is no longer printed. How did code that should not have had a chance to run yet affect the behavior of the code that already ran? This is very curious. Let us dig into exactly what the new code actually does. So in this farewell.cpp, we're using this thing called print strings. So let's take a look at where that comes from and what it's actually doing. So there's one mystery out of the way. We have this function called print strings. It takes two strings. We call print string with each of them. And the print string that is being called here, it actually is poorly named because it just aborts the program rather than printing anything. Clearly what we can learn here is that somehow this code is being executed instead of or before the code that we saw in the original file, which we saw, it did work. But somehow this is being interposed before that. That's really curious. Let's learn a little bit about what compilation and linking actually means in this environment. So you have a program that contains a function called foo, for example, and it uses a function called put s to put a string to the screen. It doesn't define put s though. It just says I'm using it. So we send this through the compiler and we get an object file as an output. This is an intermediate state. It contains compiled code, but it can't actually be executed yet. And this actually has something called a symbol table associated with it. A symbol table is a table of symbols. A symbol is just a name for something in your object file. In this case, the fact that we defined a function called foo means that we have a symbol. This is code that this object file provides. We have a foo function in here. But we're also using something we don't define. So that means that this object file cannot be used until we find some other code that will provide us the missing piece of put s. And that's where linking comes in. Because if you start compiling multiple things, they all have their own symbol tables. The first thing we compile, we know it has a foo, but it needs a put s. So the linker says, great. I'm keeping track of this. I know that we still need to find a put s before I can make a real binary out of this. So then we look at the next object file. And maybe that defines a put s that we can use. And the linker says, great. I now know where foo is. I now know where put s is. I can now put that together and make a binary that contains the sum of all of these objects. And it has its own symbol table, which now says I have a foo, I have a put s. These are the parts of the binary in which those pieces of code reside. So we can actually investigate what a symbol table is. This is a real thing. Let us look at the symbol table for... Oh, that's not an object file. Let's look at the symbol table for this object file for where the main function is. We can see the symbols here. There's a main function. We defined that. That has an offset within the file. We can see... There's some weird mangling happening here, but the point is that we can see where greet bang bang con. That function we defined is defined in this file. Great. And there's a print string that we used. That's defined in the file. And we're using that farewell bang bang con, which was defined in a different file. So it does not actually have a location in this object file we're investigating. It is an unresolved symbol. However, if we look at the symbol table for the farewell object file we compiled, we see this actually defines the farewell bang bang con symbol. This symbol table contains print strings and print string. It doesn't define abort because that's part of the C Center library that gets resolved later. And that means that we can now look at the final symbol table for the binary we created, which has the sum of all those other symbol tables. That means that it has the farewell bang bang con, it has greet bang bang con, it has the main function. The final binary knows all about these symbols. But there is something very suspicious here because I said that the final symbol table contains the sum of all the other symbol tables it brought in. And if we look at this, there's a print string symbol in the main 2.0 file that we looked at. There's also a print string symbol in this farewell object file that we looked at. But there's only one print string symbol in the final binary symbol table. And that's because symbols actually have to be unique. Because otherwise, if code can't actually tell which instance of a symbol it needs, you're going to have trouble. So the linker is actually responsible for noticing when it has duplicate symbols coming in and throwing an error. Except that it's not. So why would that be? Let's look at the code a little bit more. So print strings and print string, the two different files. So we see something a little bit odd here. The print strings function is defined as void print strings. Print string in both files is an inline void. This is unusual and curious. So let's see what happens if we take it out. So if we compile that again, suddenly our linker is actually noticing there is a duplicate print string symbol. Okay, so we've learned something. Somehow this inline annotation changes the way that the linker behaves when it encounters an error situation. What does that mean? Let's talk about the one definition rule in C++. This is a rule that says you can declare things as many times as you want. Here's an API to square a number. It takes a number, it returns the squared value. That's great. Every file that wants a square a number can declare the fact that it there will be someone somewhere that says the actual implementation of this, but we can just use it and assume it exists. And that's fine for two different CPP files. And the third one that actually has the implementation, that's the definition. That is the single unique definition of this square number function. And that says how does square a number? Great. Those will all have different symbol tables. This will be an unresolved symbol. This one will have a resolved symbol. So when you have an inline function, things change. Inlining is a hint to the compiler saying that rather than every time I use a function, I have to do all the work to call it. That's going to be slow in a loop sometimes. Inlining says copy and paste the body of the function into the place that's trying to use it to take more code space but reduce the time necessary to execute it. So the result will look more like this. Every file that uses this inline function will get its own copy of the function while it's being compiled. Because otherwise the compiler needs the body of the function to be able to perform that transformation in each caller. So instead of actually calling it, we'll just inline the squaring mathematical operation. But this has an implication for the symbol tables. Suddenly, if you have a copy of this function in every single file that makes use of it, that's going to break the rule that I talked about, the one definition rule. And you're going to have non-unique symbol names. So the one definition rule actually says it's okay if it's an inline function because they're all going to be the same. So as long as they're all identical you can just you can just chill. You don't need to report any error. So that's great. But the linker still needs to choose which one will be the canonical version because you've got you need to have a symbol table that only has a single unique symbol in the end. And so as long as they're all unique the linker can choose anyone it wants. But linkers don't actually verify that property which means we have the opportunity to introduce an evil twin that'll turn around stab you in the back as soon as you actually stop looking. And so we can actually affect this. If we read order the way that we provide the object files to our linker suddenly we have an application that doesn't abort anymore and that's kind of spooky actually. So the other way to solve this properly would be to avoid breaking this rule entirely, rename the function, you'll get unique symbol names and that's great. So what I'd like to leave you with is that uh not farewell I'm sorry um finale what I'd like to leave you with is that this is a real problem it's not made up I encounter this in the wild and like I challenge you to figure out whether your language is vulnerable to this or can support this because the answers will actually surprise you. That's all.