 Welcome back everyone this it's been a while since the last stream, which I feel like at this point I now say basically every stream which is a little unfortunate But somehow life tends to get in the way This is one that I've been promising for for a while So this is gonna be a crust of rust on build scripts and FFI so foreign function interfaces Which is basically the way that you get rust code to interact with code written in other languages And we'll talk some more about what that actually means The reason this came about is because you know build scripts are Decently well documented actually and then the cargo manual, but they are very general mechanisms So it can be hard to wrap your sort of Brain around why it's why they're useful what they should be used for what they should not be used for what their limitations are that the patterns they usually apply when you do use them And for FFI, you know, there aren't great resources just talking about in general what's going on There are some like the the nomicon has a pretty good chapter on you know, the basics of FFI The rust reference is pretty decent talking about like the external keyword But sometimes as you know is sort of the The standard for crust of rust It's useful to really just dig into an actual real example to see how the pieces fit together Now because this is a crust of rust it's gonna be you know on the shorter side as as far as it goes for my streams so I'm guessing about two hours and In two hours, it's kind of limited how much we can actually do in terms of implementing something the use of real FFI So I won't get too much into a reverse FFI So that is for other languages to call rust like to generate see compatible bindings from rust code I'll talk a little bit about it But but not too much and I I also primarily be focusing on linking against C This the mechanisms that we're gonna talk about do apply to linking against other languages as well like anything that is a C-compatible ABI Including things like C++, but there are a lot of nuances to you know, bridging the gap from the C ABI to the other language as well and then that part is Related to rust and in some cases like it's pretty important what those distinctions are But we're not going to dig too much into that I'll point at some crates that might be useful, but but that's sort of the extent of it There's also a yeah, so someone mentioned in chat Ryan Levick has a great video about FFI and rust Which is also worth checking out. I'll link it up here somewhere in the in the recorded version So I will talk a little bit about bind gen when it's appropriate to use when it's not appropriate to use I'll talk a little bit about C bind gen and in particular the the plan here is to Write bindings to Libs sodium now Libs sodium if you're not already aware, let me see if I can make this dark for those of you who are a Verse to light Which I know is as many of you apparently I cannot that's fine Yeah, so so Libs sodium is a Cryptography library written in C That is you know fairly widely Both widely used but also widely known to be a very you know sane Implementation of many of these cryptographic primitives and in particular it aims to be implementations that are hard to misuse And relatively low in configuration like it's really just you want to encrypt a thing We have this is the way you encrypt a thing you wanted random key to use for your encryption We generate one for you in the appropriate way So it's intended to be sort of foolproof And the other thing that's nice is it's written entirely in C so the API is decently easy to bind to And you know, it's it's a real library. It's not something that where I just hack together a little C file It's that way we'll get a feel from some of the subtleties that are involved Okay, let's start from a little bit before we get to Libs sodium in the actual bindings and talk about what build scripts are so build scripts are effectively a Program that gets to run before your crate is compiled So the way this is structured in rust is that and it's a property of cargo It's not a property of the rust the language. It's entirely a cargo constructs and and the idea is that if you have a file in your The root of your cargo project called build dot RS the root of your cargo package technically Called build dot RS then cargo will compile that file run that file and then build your Your crate and it will do so even if your crate is being consumed by some other crate Like build RS runs in the context of whenever your crate is compiled them because rust always does, you know that the transitive compile The build script that you run on your local computer the output of it is not included in cargo publish The build script is but not its output Which means that the build script has to be written in such a way that it's going to run on other people's computers And that's part of where it gets really tricky to get a bind and script or sorry build scripts to to always do the right thing It can be fairly convoluted, you know to handle all the possible ways in which your consumers build environments are set up And we'll talk a little bit about that and look at some build scripts from from other existing libraries as well That might be interesting Now as I mentioned the build script is essentially just a program like it defines a main function That that's really the main entry point to a build script. It doesn't have to be called build RS You can also in your cargo tumble said like build equals and the name of path and I'll compile that instead And you can use like a build slash main dot RS and then have your modules and whatnot Well, we'll see some crates that do that as well If you if you end up with a very calm convoluted build process Now when that program runs it doesn't really have, you know, any any Special integration with the crate that is actually being compiled The way that these two communicate is primarily through environment variables and the out directory so When a build script runs, let me see if I can Go down here. Yes, when the build script runs It gets to emit It gets access to a couple of things it get access to the outdoor environment variable and the outdoor environment variable is a Subdirectory of target sort of transitively like deep inside the the target out build output directory that is writable by the build script and When your crate is compiled after running the build script The out there environment variable will be set for it as well to that same directory the idea being here that the build script can like generate rust files for example put them in Outdoor and then the your rust code can include things from under Outdoor and Things and then get access to that generative source code So let's let's just sort of see what that what that actually looks like so let me do I Guess we'll do a bin and we'll say build and FFI Very boring crate name. I know So if I hear make a build out RS I Give it an fn main and Then I do standard and var Outdoor And then I do cargo run So you'll see what it does is I get a warning first of all I've this was just I haven't changed my source lib at our daughter I said at all the bill that I rest does not generate any code nothing like that But you see that when I now run cargo run I get a warning from the build script and that's cargo deciding to build my build script And then it runs the build script and then it compiles my crate and then it runs my crate And you might have noticed you don't actually see the output from that build script I might strike you as strange because in the build script. We did a print like debug basically does a print And the reason for this is because build script output is not printed to the terminal unless the build script fails So we can see this if I hear stick in a panic and Then do cargo run Then you'll see that now Do Here you see prints the standard error of the build script when it ran But it does not do that if the build script succeeds and the reasoning behind this is is that build script tends to They tend to produce a bunch of output and we'll see a little bit about why they do that and Part of it has to do with the fact that in their output they can issue special instructions to cargo That that allow it to like change the the link search path and those kinds of things set environment variables And so the the standard output is going to be fairly large And so therefore cargo just eliminates it by default now It is possible to get at this output even if it doesn't fail. So let's remove the panic again Oh, I have that set don't I let me just See if I can Yeah So I have the cargo target durr environment variable set So that all of my cargo projects across my host use a single target directory rather than having one per project Which is a save disk space and save on compile time sometimes, but it's a little inconvenient in this particular case So I'm gonna unset cargo target durr And then I'm gonna run cargo run again and now you'll see I'll have a I have a target directory in the in the current directory And if we now look into target and debug and depths Not depths build So inside of target debug build where it could be released if you're doing a release build You'll see that there's a sub directory for You'll see there's one for every crate and in this case They're actually two for this crate and the reason for that is one is the build script and one is the crate itself So if we do star you'll see that one here this is This is the build script being run this is the build script being built So this one here is this is the result of compiling our build script So this is sort of the crate that is the build script And this is the crate that is the real crate and included in there is the Output and the standard error of its build script. So if we do cat target debug build build an ffi a FC output Standard output is empty, which is fine, right? We didn't print it standard to standard out because the debug macro prints the standard error But if we cat standard error, you see there is the output from that debug And if you do a build where you know, you take a dependency on like Open SSL for example the open SSL crate takes a dependency on a crate called open SSL sys Which has a build script that does lots of things and with open SSL And so if you build something that depends on it You can use this to look at the output of the open SSL sys build script Even if it's deep in your dependency graph, it'll always end up at this kind of path and so this is really really useful for For for trying to figure out You know if the build script did something weird, what did it do and why? It's a little hard to discover, but but it is really useful to know about you're going to run into these kinds of things Okay, so the outdoor environment variable when our build script ran You'll notice was this path And you'll see that under the target directory under debug under build and under this this sub directory of Build that is the same one as Unsurprisingly where the output of the build script ends up There's an out directory and this out directory Happens to or not happens to this is entirely on purpose if I here do instead of hello world, I'm gonna print out Config outer and I'll talk a little bit about a not config and And I'll explain why I'm using the end of macro here and not Standard and in a second. So if I now do cargo run You'll notice that and I need to do it with and you'll see that this path Here is the same path as this path here And that's entirely intentional right the idea is that this is a shared directory between the build script So where it can put RS files or or any other kind of file and the The crate that's being built that it can pull those generated sources from Now the reason why this is n as in a macro rather than Standard and var is because the end Macro reads environment variables at as set at compile time not at runtime So remember for a main dot RS here There are two steps one is you compile main dot RS into a binary and the second one is run that binary and so this Outdoor is only set at compile time cargo has no way of setting it at runtime because you could imagine you build your binary You copy it to another system and you run it. There's no cargo involved So this is only set at compile time as we access it using and here as you might wonder, okay Well, how do I actually use this the way you use this in practice is you do something like, you know standard FS? right to Don't join Hollowed RS And what are we gonna write into that file? We're gonna write pub fn foo Sure pub fn foo. That's fine. It doesn't need to do anything special. We can unwrap in the in this build script. That's fine and it's gonna be standard path and It gets sad because this and This So what I'm doing here is I'm creating a file inside of the out directory called hello dot RS I guess it's called it food RS. Maybe and it has a And it has a single function that does nothing. It's just called foo so if I now do my You know the same thing here and then I LS what's in that out directory You'll see that it's now a foo dot RS file in there And what I can do in my source main is I can do something like you know mod foo include So the include macro takes a path and at compile time Substitutes the call of the macro with the contents of the file. So this is it's not eval, right? It's it's a copy paste that file into here and then treat it as rust source. So we can do outer food RS and At least now in theory, we should be able to call foo And indeed we can write this compiled and the reason being you know at Maybe if I have do I have cargo install cargo expand because otherwise I want cargo expand So cargo expand is a really handy Tool for expanding macros and showing the result after macro expansion And so that should show us that the actual code that gets compiled here after expanding the macros Includes that inline source file that we just generated That makes sense so far like what the the connection here is between Between the build scripts and and not If I do expand You'll see here the mod foo now just contains pubf and foo This is the result of running cargo expand and so our call here now ends up being valid by the time this actually gets called Include is kind of like pound-include in C. Yeah, it's more or less the same thing that Yeah, yeah, they're they're basically the same Okay, so now we understand sort of how these connect together and and As I as I mentioned earlier, you know the outer is one of the ways that build script can communicate with a crate being compiled There are others so one is a way for Build scripts to communicate with cargo, which is not really about the the crate being built but rather telling cargo I have discovered a thing that cargo needs to deal with And these are all by printing to standard out something that begins with cargo colon and there are a lot of these So there's cargo rerun if we'll talk about those in a second. There's cargo rust see, you know, various arcs This is usually used for things like You know, imagine you have a build script and it's looking for a particular shared library file And it discovers that it's not in a standard system path like it's not in user lib. For example, it's somewhere else The build script can tell cargo. Hey, when you go to link to the final binary Also look in this directory for a particular library And so you can use the use a using rusty link arg, which is to pass additional Flags to the linker. They don't have to be a path necessarily. They can be any linker flag or You know only pass this additional linker argument when building binaries or tests or examples or benchmarks or You can set rusty link search, which is specifically for setting a search path So it's sort of equivalent to dash capital L for regular linkers You can also pass rusty link lib, which is telling cargo When you link also link this library and this is very often the at least part of the job for a syscrate, right? Is that the build script is going to say? Which Shared library to link against which is does by emitting rusty link lib This is equivalent to dash lowercase l and if you're familiar with like linker syntax And also where to find where to search for that library, which is the rusty link search Which is gives a path Separately from linking the other thing that you can instruct cargo to do is to pass additional flags to rust C itself So that could be trivial things like you know Changing the optimization level or something but very often what this is used for is in the context of rust C config So what this will do and you can see this too if I do something like It's a good example of this Config hello, right so the config macro and this config macro is sort of similar to if you write this syntax, right? That they both end up evaluating and then sort of the name same namespace config is used for compile time properties and These can be used for all sorts of things, but they're basically a way to say, you know, is this thing available or not? Is this feat so it's used for features, right? So you've all seen, you know feature equals foo So what this means is when cargo builds your crate it looks at what features are enabled and it passes to rust C config Equals feature equals foo if the foo feature is enabled and that turn that that in turn Enables this this rust language feature So this is no longer related to cargo that allows you to do conditional compilation based on what these flags are passed in so if you use this Rust C config then that means, you know pass additional config flags like let's say, you know One thing that the open SSL library does for example is pass in open SSL 110 which indicates, you know It's that open SSL is at least that version 1.1.0 and what that allows is that the Open SSL crate can then do things like config open SSL 110 to say only compile, you know, this this function If Open SSL is at least that version So it's a way for the build script to inform the crate about Conditional compilation options. It might need to enable or disable All right. Does that does that make sense? Does the config aspect of this make sense and what these sort of cargo? Options do well. We'll look at some examples of this later as we look at, you know, real build scripts Well, one thing that's interesting here, you know that the people don't think about very often is the distinction here between cargo things and rust C things so config is a rust C like it's a it's a language feature of just saying you can be you can conditionally compile based on properties that are passed to the compiler and this is sort of similar to like in C where you can do like a dash D define and then you can conditionally compile based on that It's entire. It has nothing to do with cargo and then there are cargo features, right? Which you set in your cargo.toml and what cargo does is it turns features into rust C config flags Which you can then use to do conditional compilation But the two are sort of disconnected config is like if definite if yeah And you can also access them through a macro like this so you can say, you know config Open SL 1 1 0 in this case for example Okay So that's the other way that we can communicate between build scripts and cargo and ultimately the crate itself And there's there are two more here that I want to touch on one is cargo warning This is you know, I mentioned that in general the output from build scripts does not get printed when the run unless the build script fails Cargo warning is the exception. So if a build script writes to standard out a line that starts with cargo colon warning equals then the remainder of that message does get printed to Like when cargo runs and we can try this right so if we go to our build.rs and we do print line cargo warning equals Generating food.rs and in general these are intended for warnings. They're not intended for like logging But if I now run cargo are you see that I get this warning saying generating food.rs It's worth noting that these kind of warnings from the build script are only generated if you are working on that crate If you have a path dependency on them, they're not generated if you have a transit of dependency on them So for example, if the open SSL syscrate build script generates these kind of warnings You will not generally see them when you do a build if you just happen to have a dependency on it You'll only see it if you have a path dependency on open SSL sys or if you are specifically building open SSL sys manually And the other one is cargo colon key equals values. So this is metadata that essentially turns into environment variables and there's a lot more discussion of this in the Cargo book in the cargo reference but but basically, you know, if you if you omit something that starts with cargo colon and Is not one of the like known properties like let's say I say here include equals foo Cargo colon include is not a special value in that list but what this does if we then change swatch main RS is it declares a De Oh It might only work if I have a links which I haven't set up Ah That's a good question Build and FFI include Let's see if that's true. Yeah, of course not Okay, so the the intention here is that you can omit additional Information for your dependence that actually contain values to config is for conditional compilation But sometimes you want your build script to do things like Figure out where the include directory for like a C dependency, for example is and communicate that path to You know that the downstream crate that builds because it might be useful for it to like, you know Pick up a particular C header file or something And you can do that by using the cargo colon key equals values syntax and it will declare environment variables for those that those downstream builds can consume but the name of the environment variable is Dep underscore the links attribute which we haven't talked about yet But it's basically the the library that the package links against Underscore and then the key and then the value out and that there's an example in here We're not going to look at it because it's a little little separate Which in fact is a good reason to talk about links so In General this is not a requirement is worth pointing out But in general if you have a crate that links against a particular shared library You're supposed to declare links equal in the name of the shared library in that crates cargo.tomel This doesn't do anything magical as far as you know linking is concerned You're still expected to use things like rusty link lib to tell cargo to also link against that library You might still need to set the search path and everything But what it does is it it allows cargo to check That only one crate in the entire dependency graph links against the shared library and This is useful because if you have multiple crates that try to link against the same library You can get into really weird cases where they're linking a slightly different versions or they both Statically link against something and therefore you get duplicate symbols It and then you get like weird linker failures So so doing this allows cargo to basically do a sanity check of the build that there's actually only one crate that binds again And again and against any given shared library Or or static library for that matter now the There's some interesting implications here, which is because cargo checks that only an exactly one crate or at most one crate I should say links against any given, you know library name You don't want to end up with an ecosystem where you have lots of crates that provide bindings to the same thing In general the way that we approach this in the Rust world is that for most shared libraries We try to have a single crate called dash sis So this would be you know for Libsodium would be called Libsodium dash sis That crate the only thing it does is binding as the shared library and expose the sort of raw FFI Methods to that library it is not supposed to do any kind of like, you know safe wrapping or providing an ergonomic interface None of that is a pure binding to the library and then you can have lots of different libraries that all use that sis library And generate nice bindings on top of it But the idea being that if because you only have a single sis crate that does the binding That one has the links keyword and that way you you know, it just sort of works out that you only Link against it in one place This also has some implications for semantic versioning So if you have such a sis crate one of the things that you want to do is try to avoid doing major version bumps to it because if you do a major version bump What will happen is that it will be impossible to use that The the two use two different major versions of that sis crate at the same time in the dependency graph because two major versions are considered two different crates And if two different crates both have the links keyword for the same name Cargo will complain it won't build So generally if you cut a breaking release to sis You're gonna want to make every consumer of that sis crate also bump to that new version sort of at the same time Yeah, so I haven't gone into how you generate a sis crate yet, you know, you can write manually you can use bind gen What we'll talk about that in a second Okay, so There is yeah, so so it talks a little bit about this in the in the cargo manual as well about sis packages and It has some other nice properties like you can override Particularly properties of links There's one more thing I wanted to talk about which is to actually yeah, so you can set In these sort of cargo output things from build script, you can also set arbitrary environment variables. This is sometimes useful But one key thing that you set with build scripts is rerun if changed By default if a crate has a build script the build script gets rerun Every single time that crate is built Even if there have been no changes to you anything it gets rerun because cargo doesn't know The conditions under with it to rerun right who knows that build script might you know fetch something from the internet that needs to be done every time So the expectation from cargo is that every build script If it knows it only needs to run in certain circumstances should emit either the rerun have changed or rerun If and changed or both you can emit multiple of them to say run me if these files or these Environment variables have changed not otherwise So very commonly you'll see things like rerun if and changed for something like you know open SSL Libder, which the open SSL syscrate uses to locate open SSL. Well if the Path environment variable that the user uses to locate open SSL has changed the build script should be rerun And so therefore that build scripts emits rerun if and changed open SSL Libder rerun if changed is usually used for things like you know if you have a If your build script compiles a little C program Into let's say a shared library file or a static library or whatever that your crate then links against Well, if the C file changes the build script should get rerun and so you can use this this first stanza to do that Okay, I think that's all I wanted to talk about for the for build script sort of generally We're gonna dig into some and actually look at what they do and why mmm One thing to keep in mind with build scripts is that at least at the moment they're very Blunt tools in maybe not blunt tools, but they are In fact, I'm gonna say the exact opposite. They're very sharp tools and you can easily hurt yourself with them because build scripts are Not really sandboxed in any meaningful way at least not at the moment You know, they can go talk to a database. They can connect the internet. They can read arbitrary files They can write arbitrary files, you know, whatever the current user has access to we should already be setting off alarm bells right like things like reproducible build sort of go out the window if a build script does something weird it might just overwrite the files in source and Now you've like lost changes that you had pending in git It might you know read your read all your documents from your home directory and upload them to the internet like build scripts are very sharp tools and They're sort of implicitly trusted because they're automatically built and run for all your dependencies, which is worrying and You know This is an example of with great power comes great responsibility but but also They're very troubling because you don't really control what build scripts your dependencies have and you were probably not auditing all of them there is a bunch of work in here that In the sort of ecosystem of trying to figure out how can we sandbox build scripts in meaningful ways? and one way to do that would be to do something like Compile and build them in wasm so that they have a very constrained API to the rest of the system for what they can do But this is like it's it's concerning And it's worth knowing about build scripts Okay, with that out of the way, let's actually look at the build scripts for some very common crates, and I Forgot to change my get up to dark mode. So give me one second I'm sorry to burn all of your eyes It'll be over in a second I promise you can just close your eyes in the meantime There we go Okay, you can open your eyes again if you're avert over if you have an aversion to the light We'll talk about bind gen a little bit later and see by general a bit later So the first thing we're gonna look at is there's a crate called get to and it binds against lib get to which is a Bindings to the get a the get C library but not the real get get C library but a re-implementation of get in C And it's a great little library. You'll see that in its cargo tumble It takes it takes the dependency on lib get to dash sys which is a path dependency here so there's one repository that contains both get to and Lib get to sys if you look at Lib so this is the pattern that we talked about right of having a separate crate That does the actual linking So let's look at Lib get to sys and it's cargo tumble You'll see it declares build equals build rs, which is not necessary Um, if as long as your build script is called build rs cargo will pick it up automatically But it doesn't hurt to declare it. This just allows you to name it something else You'll see it declares links equals get to like we talked about And it has a bunch of features and whatnot. Let's look at that build rs Me increase the size of this a little bit So very often that the pattern you'll see for these kind of sys crates is very similar In general what they do is they first first try to locate the library in question on you know, the standard system paths if they can't locate it in the standard system paths it looks for Well actually change that slightly it first looks at whether there are environment variables telling it where to look If there aren't it uses the system paths then it looks to see whether it can find the library It needs in that path And whether those are at the right version and if it does then it just you know generates the bindings and Tells cargo to links against it otherwise It will often Build that dependency from source. This is known as vendoring, right? So the package contains the source code of the shared library it links against and I'll build it for you Into outer and then link against it and then Usually, you know, however it ended up With the shared library, it'll also then use something like bind gen to generate rust bindings To that shared library and what we'll look a little bit about what those bindings look like because that's when we get into more of the the FFI space here So for in the case of get to you'll see it reads out a bunch of environment variables And I have some opinions on these environment variables that we can talk about after we've looked at a couple of these One thing that's pretty common is to have a feature that says whether or not to vendor the The creating question which you can set to one if you wanted to always vendor like never use the The one from the system even if it's available always build it from source Or you can set it to zero if you want to say never build it from source and Some crates will also have it as a feature. I forget whether Libgit 2 does Yeah, so it has a feature called Vendored that you know carries the same property of saying if this feature is enabled for this crate Then build from source of course a challenge with Features is that they can get enabled anywhere in the dependency graph and then they get enabled for all consumers of Because cargo will only build a given package once we're giving crate version once so let's say, you know You you're you're a crate down here. You take a dependency on get to and you don't set the Vendored flag But you also take a dependency on you know Foo crate foo and crate foo also takes a dependency on get to but with the Vendored flag then you will get it Vendored Because cargo takes the the the union of all feature flags set across to the dependency closure So if you go back to the build RS You'll see that first it looks at whether it's allowed to look for the The system provided version of the library so that is if it's not explicitly asked to vendor then it will look It uses this crate called package config and this is one that you'll see used a lot In these kind of contexts so the package config crate is a relatively thin wrapper around a command that ships on most Unix system called package config and We can look at package config pretty easily so what package config does is you give it the name of a library and You say, you know what? Information about that library would you like in this case? We want libs So we want the the link Properties used to get this library and in this case I just for libsodium I just get dash L sodium and that's because libsodium is on my Standard system path so no dash capital L is needed So it can just be linked directly with dash L sodium Then it has other properties to like C flags Which is additional flags you have to pass to a C compiler if you want to compile this this library Libs will often include dash capital L if it's in some other system path This is a very you know standard way of locating shared libraries And that's why you will you'll generally see that these sys crates will be using package config to locate the shared libraries rather than like implement their own mechanism for searching through user lib and the like and The other thing that's nice about package config is it lets you do things like version requirements so you can say you know at least version or Max version you can say things like I want to Link against this statically and so you'll see here with dash lips I can also pass that static to package config and I'll say if you want to link statically Then you also have to pass pass dash p thread. I don't know why but that's the rule for Libsodium and So package config just gives you the linker flags you're going to admit And so and furthermore it actually prints. Let's see if I can find Here after running package config all appropriate cargo metadata will be printed on standard out if the search was successful So what this is saying is as long as you use this crate not only will it tell you whether a given library was available? But it'll also output all of the necessary of these cargo standard out instructions For a build script to do things like set the link search path and the linker args and stuff So it's a really convenient way to do these kind of bindings. Oh, yeah, I can show the So the way package config knows Whether or not a library is available where it's available what version is it uses the these package config files, which So there's a package config path environment variables so you can set to Tell package config to also look in other places But if we look at one of these files, you know, it really just says where the thing is located Where it's lib directory is where it's include directory is the name of the library the version and additional, you know Link properties link arguments that might be useful. So it's a very straightforward, you know syntax for for declaring these So you'll see back to Libgit 2 sys it if it's allowed to use the system version of Libgit 2 then it creates a package config It uses that to scan for a range of versions that it knows that it can bind against it probes for Libgit 2 And if it finds it then it also walks the include paths. I don't know why it emits this is probably so that When building the crate itself it has access to it it knows where the include Files are I'm not sure why it does that but it does this is just the the cargo colon key equals value Like root doesn't have any special meaning as far as build scripts do so this is just to communicate information about where The include directories are to the the downstream builds And then you say it returns like if it finds it this way There's nothing more to do the the build script here is done and We'll talk about how the the rust side of the bindings actually get produced in a second If it doesn't find it on the on the system path though, you'll see it it emits this Rusty config saying Libgit 2 Vendor'd the reason that does this is because that means you can now do conditional compilation on Whether or not you built Libgit 2 from source right you saying, you know Using config Libgit 2 Vendor'd I Don't know whether they use this anywhere, but at least now this means that they can And then it does you know in order to build from source So in order to vendor it needs to have the source for the library that it's going to build And the way it does that is it it has Libgit 2 checked out as a sub module But because when you get clone a repository it doesn't include sub modules by default They have this extra, you know stanza that if the sub module hasn't been checked out Then run sub module update in it so that you get access to the sub module This of course won't work if the build a sandbox for example because you wouldn't be able to run get commands because they need to access the internet but this is more of a Convenience thing where you know if someone wants to build Libgit 2 sys You want it to just be they can just run cargo run and it works And so that's why you have these kinds of additional stances When you run cargo publish what happens is and we'll see if they did anything special with it in cargo Toml So you see they have an X so cargo Toml by or cargo when you run cargo publish by default Will include anything that's not ignored by get ignore So if they haven't get ignored Libgit 2 all of Libgit 2 is going to include it as well So basically the Libgit 2 sys crate Source tarball on crates.io Includes the source code for Libgit 2 And you'll see this is one of the reasons why the version for Libgit 2 sys includes this plus here Includes which version of Libgit 2 itself is Vendored and bundled with that version of Libgit 2 And you see they have an exclude stanza in here to say don't include all these other files when you do a publish because they're just irrelevant and large And so in the builder s, you know once they've made sure that they have the sub module then What follows is basically all of the steps needed to build Libgit 2 from source So, you know, it it uses Outdoor to figure out where to build it like where you know What the scratch directory is essentially for for building the artifact? And then it's all the you know traditional things of you know, figure out what target you're compiling for they use the CC crate which is a Really nice crate that's the wrapper around, you know a standard C compiler that knows about all of the standard environment variables like CC and AR and LD and LD flags and C flags and CXX flags and CXX like all of those things that you know, the C world have accepted as Things we use for for compiling C code So you can just do CC build new and then you know set things like additional include paths Where to build whether warnings are enabled And you see it all of this is just you know, someone had to figure out All the steps that are required to build Libgit 2 from source You see there are all these defines and then ultimately somewhere down here, so you see there there are a lot of steps And So down here ultimately it calls config which is still the the CC builder dot compile which is going to actually invoke the compiler and linker and stuff And at least as long as that succeeds It's gonna then emit all of the necessary, you know, rusty link Properties are needed which the CC crate also takes care of emitting it just emits some extra ones for Windows and Apple devices And then it emits some additional like rerun if properties here To make sure that if the vendor source changes rerun the build script Okay, so that's like the the entirety of the build.rs here Any questions about all the stuff that this build.rs is doing before we dig into how the rust bindings of this come up someone asked why the API for package config uses the name static with a K instead of C This is because static with a C is a keyword in rust So you're not allowed to use it for function names for example. You can there's like Is it R Which lets you Yes, you can use this to declare that the name of a function is Specifically intended to not be interpreted as its keyword But it means calling the function is a little annoying too So most people just use like the the standard is you know instead of static use static That tick instead of class instead of crate use crate And and so on Does the user have to build the C library on their machines or can we publish a pre-built so So Cargo doesn't prevent you from including a dot so in your build artifact And in very rare cases, it's a good idea. Usually not so's but you'll see this with like More like dot o's or dot a's where there's like a usually for embedded platforms where it's really annoying to build, you know, the The hardware bindings for that device and so they'll just bundle it with it. The problem with doing that is Those artifacts are tied to your build an execution environment. So imagine for example that you're building on To take a stupid example, you're building on 64-bit Linux and someone else is trying to run on 32-bit Linux Your so will not work on their machine That particular example is outdated but but any kind of like difference in target difference in Versions of libraries can come into play here in General it's safer for you to use whatever they have on their machine or build on their machine then trying to Build the so's for any possible consumer. It's not impossible, but but usually you want to avoid it Okay, so so All of the stuff that we've seen and build at RS right here All it does is tell cargo how to link against lib get to right which is either just dash l get to if it's already on the system or You know built from source dash l to something that's in the outer But that doesn't explain how do we actually call into these functions, right? like Ultimately that just means that the symbols are available in the binary, but how do we call them from rust? There are many ways to do this if we here go to lib RS You'll see that it actually has Lots and lots of code in this Libd RS that has this extern C keyword on it You know they declare lots of types structs with rep or C all of this stuff and That is one way to do it and it's actually a fairly common one for very stable libraries So this is you know probably a I'm gonna guess that this was Generated by bind gen and then manually changed to Be better So this is a good time to talk about bind gen bind gen is a tool that As it says automatically generates rust ffi bindings to C libraries the idea here being if you have a C header file That contains you know that the C function definitions and type definitions for an interface You can call bind gen on it and it'll generate a rust file that has the equivalent rust types and extern offense So if we went to Wonder if I can pull this up easily Like RS So this is the get to source code and if I go to include get to dot H and I guess Any arbitrary file in here, let's do something like Commit dot H. This is just from the the actual C library get to his header files One of the things that it declares is get commit lookup If we go and live that RS you see there's you know far down the file. I'm guessing this is oh Lord Down here you see they have a giant extern C block that just has lots of function definitions and one of them is get commit lookup and here You know, they have the rust equivalent types for all of the arguments You know, this takes a get a pointer to a pointer to a get commit a pointer to get repository and a pointer to a get OID And those are the these are the rust equivalents of those arguments And this is also how you do FFI in general is that you declare So extern How do I want to describe this in the best way? The external keyword all it really does is it changes the Calling convention used for that function It says that you know in order to wait if you call this get commit lookup There are two things that are different with external one is you don't give a body for the function Which is to say this function is not defined here This is a declaration of the function, but it's not the definition of the function Or I get that backwards it doesn't contain the body of the function basically extern is saying this is defined elsewhere So just look in the symbol table of the binary and if you see a call to this actually call that And then it changes the calling convention to say, you know If you just write extern as our extern fn or in this case an extern block with an fn inside of it What it's saying is Use the C calling convention for this function. Don't use the rust calling convention use the C one Which is what you would expect and for anything that is extern C You have to make sure that all of the arguments are you know the the valid C equivalent arguments So basically any struct here has to be rep or C so that it can actually be used over that kind of that kind of call and So if we went here and looked for, you know struct get commit Oh Pub enum ah So this is another pattern. You'll see sometimes in ffi pub enum with an empty value What this is saying is get commit is an opaque type to us So in a bunch of places in the code, we're gonna pass around, you know, star mute get commit And what we're saying is we don't want to try to turn get commit into a rust struct Like it commit is defined somewhere in the get to library But we don't like its internals are none of our business We're only ever gonna pass around pointers to it and then use methods to Get access to inner fields like if we want to look at the author of a git commit We're gonna call like git commit author and pass in that same pointer. So this is a way to say this type is Really just an opaque type that we're only ever gonna handle through pointers And you'll see this is the case for for a lot of these types and there's something that that's not true for So git rev spec, for example, you see we actually have a a struct defined for with rep or see with fields and Here the assumption is that git rev spec is defined here to be the equivalent of how it's defined in lib get to and in this case they've written all of these out manually and There are some upsides and downsides to this the upside of this is You can control the exact layout for all these types Not just the layout in sort of a memory sense, but you can control like the naming of every field you can control the The exact, you know, there's some type definitions that are equivalent but are different ergonomically You can you know add Accessors you can implement clone as you want like you have more control over exactly what the bindings are and over time They're not that they're stable like you Implemented these and then you can just keep using them forever You know that you know the the definition of git rev spec here is not gonna change under you because you wrote the definition It does however mean that if the underlying library like if lib get to changed in some way And it matters for the bindings that you wrote you're gonna have to change those bindings The alternative to that is to use something like bind gen so what bind gen does is it it takes the See header files and Generates a file like this for you. So no longer do you write this by hand it gets generated for you by a program But that also means you have less control like bind gen is gonna generate Sometimes some very convoluted bindings that are gonna be hard to you know sit down to manually read They're gonna be fairly Unorganomic, but they are in general going to be correct Sometimes there are some really subtle things about making sure the memory layout is exactly the same between the rust type type and the C type That bind gen you know will know about and if you hand write these bindings you're gonna have to know about them The problem with using something like bind gen of automatically generating the rust bindings here is that they might not be stable over time so if bind gen changes It might start generating different rust code for the same C code And so this can be a backwards compatibility hazard for sys crates Imagine that your sys crate just calls bind gen in build at RS and that this is a fairly common pattern by the way In fact, we can we can show how that works So if I pull up the bind gen docs Library usage Tutorial yeah, so you add a build dependency on by Jen Oops, so build dependencies are dependencies of your build script Then you create a Rapper.h and in our case that's gonna include sodium and in your build RS I Guess we can just copy paste this whole thing actually and we'll see it what the contents actually are but We don't need extra crate here you see it This assumes that you already have the necessary Link stuff, so this is where you would use something like package config in our case. We can probably just do link to sodium rerun if rapper.h changes And we say, you know bind gen Generate based on what's in rapper.h and And write the bindings out to outdoorbindings.rs So if I now run cargo r It'll build bind gen Oh, so so before I go into this someone asked why Why are these empty enums rather than just being unit structs the reason is because you don't want someone to be able to construct an instance of it Because you're saying these are this type is entirely managed by that library You don't want someone to be able to construct a pointer to one out of nowhere And an empty enum is impossible to initialize in Rust You can't create a git commit because there are no valid variants And so this is a way to truly declare this type is opaque as far as the rust side here is concerned If it was a Pub struct git commit semi colon You could just construct it by saying you know git commit and we don't want people to be allowed to construct Right our source main is gonna have to change because now you know Let's let's name this ffi and this is now going to be bindings Dot RS All right, so now If we go into build no target debug Build no build build an ffi Ooh That's actually print out Doesn't want to let me do that Yes, you see we're getting lot in fact this already prints the path when we're talking about you see we get lots and lots of warnings now And so if I open this file, which is gonna be giant you see this is in the out directory of the build script bindings dot RS And you see it says automatically generated by rust bind gen and you see this is Taking the sodium h header file, which is the main header definition file for all of lib sodium and Generating the equivalent rust bindings for everything in there like every type every function Now gets a generated thing But of course, this is huge like this is Lots of stuff and everything is marked pub Which means that if I publish this as a syscrate and then bind gen changed usually it ends up being a major version change But even so if I try to upgrade bind gen I have to make sure that my public API did not change at all in backwards incompatible ways because if it did I would have to do a new major release of my syscrate Which as we talked about because there can only be one crate with links for a given library I would also have to make everyone who uses my syscrate also bump there. So it ends up being this like giant Explosion of things that have to change usually you can draw the boundaries at the crates that use the syscrate because their API can usually stay stable, but it's still a major undertaking if you have these these Syscrates just use whatever bind gen produces directly But it is also really convenient because it means you don't have to hand write these bindings There are a couple of ways around this. So for example You can have a private module that Includes the the bindings and then you can have a you know pub use ffi The specific things you want to expose from there so that you don't expose the whole thing you can also You can also in your build at RS and we'll do this once we actually start writing these bindings for real You can do things like Blacklist function Blacklist Functions or types to say, you know any type that contains Foo bar don't include in the bindings or you can even do this the other way around where you can do white white list Only include things that include on our crypt box And that way you're limiting what bind gen will actually generate bindings for We'll take a look at that when we actually look into the Libsodium API So so you in general for syscrates You want to find ways to ensure that your public interface is at least somewhat stable and for lib get to the way they do The this was just check in the bindings Whether they're auto-generated and then tuned or you know, they were entirely auto-generated and just checked in or they were hand written Check them in so that you don't generate them automatically on every build And in fact, this is one of the reasons why the bind gen crate comes with a command line tool as well That you can use to generate the bindings just as a once-off Okay, so that's what bind gen does that's the the sort of Step for generating these bindings that lib could lib get to does not make use of it just hand writes these instead Before we get to Libsodium I want to continue walking through some the build scripts of some existing crates to do this Just because it's useful to see a little bit of variety here So this is the SSH to crate which generates Bindings to the lib SSH to see library It also has, you know a workspace with two crates one called Lib SSH to and one called lib SSH to sis if you look at it You're gonna be unsurprised to see that it's cargo Tamil says that it links against SSH to and it has a build script And if we now go to it's bit our build that RS In fact, let's look at its lib RS first. You see it to has Manually written bindings for the same reason it wants to be able to control its stability over time But you recognize a lot of the patterns, right? So you see these pub const if we look at our The generate bindings here you see bind gen also generated a bunch of pub const, right? And I'm I bet you there's a If we go down a little bit So you see it dinners at all a lot of pub type which are going to be type aliases for types that are used elsewhere in the definitions and Some of these, you know, we don't really want it to be using like it shouldn't need to generate bindings for max align T So some of these we would we would sort of allow a list or block list But I want to see if I can find some non trivial types here Yes here for example, you know, there's a type crypto State this is an example of something that should probably just be an opaque type Right like we shouldn't be knowing about the field called opaque here So this is something that we might, you know in manual bindings turn into one of these because we know it should never be constructed in the first place So those are examples of patterns that bind gen doesn't know that we want to treat that type that way and if we look at the The bind gen documentation for builder You'll see that it actually has a lot of Configuration options so you can say things like how do we want it to generate the equivalent rust code for C enums? And there are a bunch of different options If I minimize this you'll see you can say things like Whether to generate comments Which things to allow this and block list You can inject arbitrary rust code or C code you can mark a type as opaque So this is the example of it's going to generate, you know an empty enum instead of actually trying to traverse into the fields And so we'll end up using a bunch of this when we when we work with lip sodium type aliases Whether to derive different Different traits like sometimes for f of i a types Maybe you don't want to derive any of them and implement them all manually You can control that here whether to support namespacing What to do with callbacks? There's lots and lots of of stuff you can do in bind gen here So back to SSH to have let's look at its build script. This is gonna look pretty familiar, right? It's going to down here So lib SSH to is a little weird by default. It doesn't use package config It requires you specifically opt into it and there's some debate about whether that should be the case or not But if it's allowed to use package config to discover the system one Then it will use package config like we talked about and use find library for lib SSH to which, you know Automatically emits the necessary builder as cargo instructions It also sets the include path which might be convenient for downstream libraries And then it returns so this is the same structure as we saw for lib get to you'll notice that this doesn't set see any It doesn't set any version requirements for the library at all which if you can avoid it is nice because it means that in More cases you can avoid building from source And you see it does the same you know lib SSH to is Checked out as a sub module make sure you actually check out that sub module so that you can build it and then Go and try to build that entire C library from source. So it creates a Where is the config a CC build Into Outdoor and this will all look familiar, you know, it does all these other things It might need to do is to set the include paths and invite here You know, this is a depth Z include. This is an example of one of those cargo Instructions using just key equals value. So this means that there's a the lib Z syscrate emits a You know cargo colon include equals of its include path for the Z library the compression library And we consume that here so that we can tell our build of lib SSH to to also discover the include path for the compression library by using this this Environment variable the cargo automatically sets based on cargo colon key equals value And you see, you know, if we consume an environment variable We also tell cargo about it so that we get we get rerun if it changes And It doesn't stuff to you know parse out which version of lib SSH to we actually used store that in the file Then builds as they to and then it's done You'll see the you'll see the mention of this VC package as well. This is if you're on windows using MS VC then you can't use package config because it's windows so you have to use VC package instead So nothing terribly surprising here either you'll see it makes slightly different decisions about whether to use things from the system or not and There are other examples here So the openness is Elcrate is A little bit of a beast because there are so many different versions of open SSL and different variants like this Libre SSL as well You'll see if we go to the it actually has a sub directory for its build script So it's build slash main dot RS because they have all these other things to use If we look at the main dot RS here, it looks somewhat familiar so it Finds open SSL Links against it tries to find the version by looking at the include directories Determined mode is it tries to figure out whether you want to build against open SSL statically or as a shared library And if we look down here at determined mode, it's down here somewhere. I think So it reads this environment variable called the open SSL static This is also a pretty common pattern in sys crates where there's an environment variable called the name of the library underscore static And if it's set to zero then we don't statically link if it's set to anything else Then we do statically link and if it's unset then we just We Use the fact use whether or not a dot a and a dot so exist to decide what to do So if only the dot a exists then we statically link if only the dot so exists then we dynamically link And if it contains both then we do a dynamic linking So this is like, you know, the open SSL one is taking a lot of care to try to make sure that every possible combination of flags work And I don't think we're gonna dig into too much of all the details of what it does But if we go down here a little bit further in Maine, you'll see it does find open SSL Which calls this function find open SSL which if the Vendored feature is enabled and We haven't explicitly through an environment variable said don't vendor Then use this find vendor module to get open a cell and you know find vendor It is gonna check out the source do a build and then link against it and otherwise So that means if the feature isn't set or the feature is set But the environment variable to not vendor is set then find it from the system paths, which is going to use package config Okay, so so this is a very long, you know procedure of steps But you'll see that they're fairly similar between these different libraries the general pattern is If you can use it from the system, then use it from the system otherwise Build it from source What I would recommend is if you're doing this yourself for some library Think really hard about whether it's worth rendering like Vendoring is convenient for consumers, right? Because it means that if they don't have the library installed locally You just build it for them and it works But the chances that building it from source is extremely complicated and error prone is pretty high and so there's a decent argument for if You can't find it in the system You should just error and tell the user to install this library instead like in the case of Libsodium If someone doesn't have Libsodium installed, I don't want to have to build Libsodium from source So instead our build script can just issue an error saying Install Libsodium and then try again, and that's what we're gonna do as well We're not gonna try to figure out how to build it from source through CC in rust. I don't think it's worth it The other thing is how you decide whether or not to vendor is also somewhat convoluted It seems like the general best practice that's emerged is you have a feature that lets people opt into Vendoring if they Specifically need Vendoring. I I don't know why that is but if you specifically need Vendoring And then you have an environment variable that overrides the feature saying never vendor and the reason this is useful is because there are some Users who have very strict requirements about how Source code is brought into their builds So if they're trying to do hermetic builds for example Or you know if you're at a company and you want to make sure every source that's brought in has been like checked in all sorts of ways You generally don't want any Vendoring you want to make sure that everything is provided by your build system Like imagine if you're in something like you know buck or or basil or whatever You want to you want it to error if open SSL wasn't available because that means it hasn't been declared in like the standard build environment You don't want like most of your application to use one open SSL But your rust parts use a different open SSL usually not what you want and so in general you want to provide this kind of override mechanism to say if You can't vendor if you can't sorry if you can't find it in the system, then I just want an error I don't want you to build it yourself Okay, so now we've looked at a bunch of these These other libraries and how they do their their bindings so let's now talk about Libsodium so Libsodium no, I'm not gonna go through all of like what it does. That's not really important for what we're looking at here Also, Libsodium does have rust bindings. So there's a crate called Libsodium sys stable Which does basically everything that we're gonna do today and more you'll see it as a It has a build.rs and that build.rs. You're gonna be unsurprised to learn you know uses package config and To find the library if it doesn't it vendors it by building it from source like all of the stuff that we just saw and it too has You know in its source It has sodium bindings which have all of these auto-generated things from bind gen It has a script that calls bind gen to regenerate that file But that way it's not automatically run so they can choose when they Regenerate the bindings and then it has a lib.rs Which just pub uses everything from the generated bindings so What we're building today is not intended to be like published as this is the thing to use use the one that already exists in In fact, they're even like you know nice ergonomic wrappers around the ffi stuff Which is usually how you want to structure this right you want the sys crate to just be the ffi bindings and you write another crate that provides like safe wrappers and ergonomic wrappers about every around everything a Vendoring usually means built from source. Yeah, okay So Let's now figure out what to do about Libsodium So the place we're going to start here is Installation I've already installed Libsodium. I installed it just on my system so let's now go over here and I actually want to exit from that and I want to do cargo new Lib and we're gonna call it Libsodium sys and in our cargo tumble we're gonna say links equals sodium And in our Build dependencies, we're gonna steal that from over here And then we're gonna we're gonna do exactly what they tell us to do which is wrapper.h which doesn't include sodium.h So this is basically copying over what I what we had elsewhere We're gonna copy the build R. So in this case, I'm not gonna be too concerned with Oh, I'm not gonna be too concerned with API stability for this one Test I guess that's fine as I'm not gonna be concerned with API stability. So I'm just gonna run bind-gen at Build time And we can do a little bit better here like here we could use package config So if we do docs.rush package config We just do in our cargo.toml. We're gonna say another build dependency is package config equals 0.3 and so up here we're gonna say And you see here in the in the docs for package config it says not recommended to have no version requirements And the reason for this is because it's unlikely that your your bindings truly are not do not have a version requirement, right? When you generate the bindings you're generating them from a particular version of the C header files and That means they're gonna include, you know function that exists in that version, but not earlier ones So you do generally want to include like an at least version So we'll do that too like so Probe sodium and you know if you're curious about what version of sodium is we can do that with So You can provide version requirements to package config to you'll say here I just put in a version that I happen no works It says dependency requirement lip sodium greater than 1 0 18 could not be satisfied because lip sodium has version 1 0 18 Which suggests that we that's where we should start right? We're generating bindings based on 1 0 18 so that's what we're gonna use and Package config is gonna generate the link search and the link lib for us So those we can just remove We do want to make sure that if the wrapper that we had changes then we rerun the build script and this Bind gen config we're almost certainly gonna have some Some some things to change and in fact if we go back to the bind gen docs Go to the builder Um allow list function we want to do dot allow list function So if we go back to the sodium we need to start somewhere, right? So quick start That's a lot of stuff I want to see That's I want Usage or quick start maybe ah quick start. Okay. There's a function called sodium in it I just want to check that like the ffi bit works So for now what we're gonna do is we're gonna only allow list that function And then we're gonna generate the bindings based on that. Why does it not? Am I blind? Oh? It's cuz there's a newer version of Bind gen the tutorial is wrong That's why The package config command line tool is a Linux tool. It's unrelated to the herb It is not provided by a rust package Okay, so it says Package config Could not find system library sodium. Oh, that's because it is not Called that it is called lib sodium Great. So if we try to do something like build with verbose Touch wrapper dot H Which is gonna redo a bunch of the things what we'll see here is at the build at the end You'll see it includes Dash L sodium which comes from package config. You'll also see it does this dash capital L. So search path user lip this is I'm gonna go with a bug in package config, which is that it will emit the the cargo instruction for setting the dynamic Library search path even if that search path is the standard location for search paths for Shared libraries and so we want to opt out of that because that's just wrong and it causes them really hard to debug Problems sometimes so I want Print system libs false so if I now run this You'll see it just does dash L sodium which works because Libs sodium is in user lip for me Okay, so now we have at least in theory Our build RRS is gonna generate the bindings for us and hopefully it just generated the sodium in it function and our lib.rs currently doesn't really Source Lib R. S currently doesn't do anything. So what we want here is something like mod ffi include concat and Outer bindings.rs and I forget whether there's a What does bind gen tell me to do here? yeah, great and They say to allow these things. So what we'll do is we'll actually allow them on Just that module And so in theory here We should now be able to do pub use FFI sodium in it you see I got auto completion here It's because the the the build script was run And so this include actually, you know, it's just a regular macro that includes that file which exists So if I hear do you know FFI colon pub use FFI colon colon actually get what's in there and you can see here the Signature the bind gen generated for sodium in it is it's an unsafe fn the returns an i32 And in fact, you know, if we if we go to definition on this, it's not gonna let me do that. That's fine Show me I want that file It's gonna be target debug build Yeah, so if you see if I LS there you see it generates Directories for all sorts of the dependencies that are built in our case what we want is slib sodium sys Which has a bunch of different files? And the one in particular we want now is anything that's slash out bindings rs Why there are two of it? I'm not sure But let's go look at this file. See this is what bind gen generated. This is an extern C block which says For these these are defined elsewhere and they use the C calling convention It's a pub fn Every extern function is inherently unsafe because we don't know whether the signature even matches what the real function does much less What that function does in the first place? So there's no need to write unsafe here. It just is unsafe So do a minute takes no arguments which makes it really easy to generate and the return value is a C int Great so far so good And if we go back now to Libs sodium You'll see here that they also recommend use it using package config to link against it They say that in you know int main all you have to do is run sodium in it and if returns minus one Then it's an error. Otherwise use the library in fact sodium in it initializes the library And should therefore be called before any function provided Libs sodium It's safe to call this function more than once and from different threads subsequent calls won't have any effects Okay, so this is pretty common in C libraries that you have like an initialized function. You have to call like for GTK This is the same There are ways in rust to like run a function at load time So there's a crate called Ctor That lets you Annotate a function with Ctor and it'll run before main As long as your crates is actually included and we could do this we could have a Ctor for our library that runs sodium in it and then everything else just uses it now The the downside of using something like Ctor is that if there's an error There's no good way to report it to the user And my guess in fact is that over here And I might stall that's fine I Wonder if it tells us anything about The sodium in it function must be called before any other function It's safe to call sodium in it multiple times or from different threads will immediately return minus one if it's already been initialized Yeah, so what we probably want here is actually something more like You know the here I'm both doing the f of i bindings and the nice thing around it in the syscrate, which is sort of a no-no This is to demonstrate what's going on So if we now do you know pub we declare a pub struct sodium We're gonna market non-exhaustive because we want people to not be able to construct this type without Without explicitly calling our constructor But we're gonna say it's clone and it's debug and then we're gonna do impulse sodium and this is gonna return you know a result of Self or we don't really know yet And what that's gonna do is unsafe ffi sodium in it and if this is Less than zero then Her and it's gonna be a pub fm so the intention here being that You know any other function that we want to declare that uses lip sodium is gonna be like I don't know I don't have a good example Ah bindings for the languages A quick start in FAQ I want to write bindings for my favorite language. Where should I start start with the crypto generic hash API, okay great So that's probably the function we want to start with So, you know, there's gonna be a function like this and by doing this. I don't know what it's gonna return yet What we're basically guaranteeing is that the user will have called like we will have satisfied the library invariant that Sodium in it has been called first By virtue of calling by virtue of having a sodium which they can only get by calling you There are other ways to do this to you right so you can have it in it And you could have like a a static has been in it Which starts out to be In fact, this could be something like a one-cell of a bool that starts out as being one-cell new false and and then we could have in it instead do you know if Basically if it hasn't already been in it then in it and that way you don't need to actually keep around self You could just have every function down here like a search has been in it The problem with doing it that way is that you're gonna enforce this at runtime rather than compile time If you have an API like the one we originally started with You're gonna guarantee it compile time that in it has been called non exhaustive on a struct mostly has the effect that You can't that external users as in users of this crates library API Cannot construct one of these or destruct one of these They have to use our constructors. So it works for Basically, this has the same effect as giving this type a A non-public field the same effect But without needing the extra field Okay, so this is this now does ffi like if we now do here, you know, it works We should be able to do sodium sodium new and unwrap Lip sodium from cargo test Okay, so this this now called sodium in it, right? We we know that it works because sodium new calls this which calls sodium in it And it should return an error if that returns a value that's less than zero But it returned okay because the unwrap didn't fail. So we are now doing ffi, right? We're we saw that it links against lip sodium This test actually calls that method and so it it works like we now have ffi working Okay Before we continue from here. Is there anything in the path to where we are right now that Doesn't make sense or that you'd like me to talk more about but like separately from My words are escaping me Does any of this not make sense is there anything you'd like me to go into in more detail Is there anything that you think would be useful to get a second explanation of to talk at me? I Always wanted to port or write a wrapper for a simple C library for learning purposes But every time I try to tackle that it becomes very complicated to understand the C code So usually you shouldn't need to understand the C code as much as you need to understand the the C API And you know with with something like bind gen you might not even need to do that because it'll Generate the equivalent rust types for you and all you need to figure out is like what are the semantics of these methods like basically? How do you turn this? The sort of direct unsafe C API into a nice ergonomic interface and that can definitely be some work Like we did with with sodium new just now And I mean we'll try to do Crypt generic hash to just to see whether we can get it to work But but most of the time you shouldn't need to dig deep into like the C code itself Okay So let's see if we can't get crypto generic hash to work as well So we're now gonna go back to our build out RS and we're gonna say in addition to allow listing that function We're gonna allow list crypto generic hash Great and Let's see what that generates if I now go back to my libsh And in fact we can get rid of the pub use here of the ffi Again remember we should probably have the ffi bindings being a separate syscrate in the ergonomic interface should not be in That syscrate and then the reason here being you're more likely to make breaking changes to the wrapping API And you don't want to have to do breaking changes to the syscrate because those are really annoying But for now just for you know exposition. Let's pretend that ffi here is a separate crate. This is a little easier to set up So let's look at what the binding that we got here was Something weird about my All right, so crypto generic hash does this Aha Crypto generic hash function puts a fingerprint of the message in whose length is inland bytes into out the output size can be chosen by the application The minimum recommended output size is this Okay, so there's a constant here that we also want allow list Var and We probably also want In fact, we probably want anything that starts with this because there's a constant for the minimum and the maximum and For the you know recommended bytes and so here, you know that this is a Here they're basically giving the invariance that this API tries to enforce and these are things that we should turn into Basically invariance in our rust code that we assert and panic on if they're wrong Why did the bindings you see it so see it does not quite nest they're not necessarily the same as you 32 Sometimes the C ones are platform dependent for example, and we want to capture that in the generated bindings Which is why we use the the C int instead Okay, so this suggests that there's gonna be a There's an out Which is gonna be a mute You wait There's and in fact, you know here we could use Because we don't require this to be initialized already we could here say What's the rust completely spacing on the Uninit maybe on it. That's what it's called So we can say here may be on in it so we're gonna take you know the the input which is gonna be a U8 an output or a key I suppose And an output and you know there's an argument here for instead of having the the call to provide the output We could allocate a vector internally for example, and you know do the right thing But now if we now go to generic hash, let's go back to the definition of it We want to assert that The output size is at least this so we want to assert that out dot Len is Greater than or equal to This Expected you size found you 32 That's interesting. I wonder why let's go look at the bindings again Yeah, I Don't actually want it to I just wanted to regenerate the bindings Target debug build lib sodium sys star out Binding star s I don't want to look at this one Yeah, so you see it actually did bring in all of these con this these consts now I Wonder why they're defined as you 30 twos You see it did bring in exactly the two once we wanted and then all of the consts So this is one way to just build up the bindings and a useful ways to just allow this specifically the things that you're like ready to handle Okay, so what this means is As you size And in fact here we can do this is what Clippy would also yell at us for this To ensure that if the type of this ever changed in the future We could we would get an error saying that it might not fit in you size You size from you 32 is not defined What else did they require minimum required output size a Minimum recommended. Okay, so we know it has to be more than min We know it has to be less than or equal to max and we're not actually going to deal with the Recommended one because it's not something we have to enforce in the API. It's up to the user to try to fit that Key can be null. Okay, so you don't have to provide a key Recommend a key size But the key is gonna be the same right, let's go to our build that are us so we should still get Yeah, key bytes min and key bytes max. So we're gonna do the same thing here where if let some key is key Then we're gonna assert that the key length is Key bytes key bytes Okay, and then we should be able to do you know out dot as As mute pointer Don Len in dot as pointer This is input this is input Len Key is So let's do key and key Len is If let some key equals key then key dot as pointer and Key dot Len Otherwise it's gonna be pointer No, and zero I didn't like that Did I do something foolish is it here we're gonna do So the out is a maybe on in it and the reason for this is because we know it's gonna be overwritten by the By the generic hash method and so we can just we We're totally fine with these this being uninitialized memory that we get in because we know we're gonna be overwriting it And so here in fact what we can do is we can say that it returns a Mutu wait But in order to do that we need to you know We need to turn it into the appropriate type when we pass it in here, and that's gonna be Taking that and turning it into one of these is this That's fine And this is gonna be out as you 64 as you 64 And 64 Go to definition doesn't work here because the definition is The include which is a little sad actually like if I do this it just takes me there and I And if I go to definition on that I get the include There might be a way to like Go to implementation But I can't find that now This operation is unsafe That's fine And so here we you know we can say safety We've checked the requirements of the function Min Max and the presence of Self means that in it has been called and so here I don't know what the result here is What does it return? Why does it not say what the return value in just actually say what the return value is that's unhelpful? I'm guessing it's just success like if so we're gonna do you know We probably want a real error value here But it sounds like a lot of them just use like a generic failure error So in this case we're just gonna use unit And so we're gonna say you know if rest is less than zero So this is the same as what we did for in it, right? Then we're gonna return error otherwise And we we could do this if we want Otherwise we need to you know turn this into the appropriate thing Which is we know the crypto generic hash has overwritten the contents of out and so what we'll do here is Maybe an in it and it's like assume It's a ray assume in it slice slice as Slice assume in it mute of out and I think this one actually has a maybe an in it slice as Mute pointer Which is what we actually want here And these are of course unstable library features because why would anything be on fine fine fine fine fine Said lightly that's fine So we want to feature Maybe on in its slice maybe on in its slice This is also pretty common right where you're converting between the the types that the C API expects Which is usually you know the pointer to the first element of a slice and the the real you know rust slice types So in our case, you know when we call in to see we turn them into raw pointers and then coming back We need to then you know Do the checking that we might need to do and then basically assume that the C library did what it proposed Or what a promised rather and that now out contains the the appropriate lights So here, you know, we can do this and say safety Crypto generic hash writes to all the rights to and thus Initializes all the bytes of out Yeah, so the question here then becomes What does generic hash actually promise does it promise that it writes to all the bytes of out And I don't know whether it says Yeah, it doesn't actually say that it does But it says the output size can be chosen by the application someone is saying is chat to that Generic hash and Blake to be which is what generic hash uses guarantees that it writes the number of bytes you dictate Which is what we're what we're going for as long as it's less than max, which we check is the case up here And so as long as that is true, this is fine So now the question becomes, you know, can can this actually hash something? So if I now do it, you know hashes I do sodium new unwrap And then I do and in fact Sodium here can be copied to all that matters is a new has been called at least once Just to make it a little more convenient to use and because it can be copied. This can just take self So it hashes we should be able to do call s Crypto generic hash of I don't know what the input should be here I wonder if they have like an example that'd be useful if there was like a Test we could do Aha nice. Let's use this one. So We're gonna say input is gonna be arbitrary data to hash The bee prefix here is saying this use treat. This is a byte string rather than a uda string Let's see equals this The key we're gonna say we don't have a key And the out It's a little annoying that we have to have an out here Well, one of the things that we could actually do here is with a slightly fancier cons generics We could make this generic over the length of Output and key and then put restrictions on it to say, you know, this must be an array That's short that's longer than min and shorter than max But I sort of out of scope for this particular implementation here I just want to see that it works. So what we will do is we'll do a let mute Out and what do they use for out here they do this Yeah, so we're gonna do let mute out equals Wonder if I can I wish there was a sort of alloc here but oh Maybe there is actually so we can do maybe on in it on in it Of FFI Crypto generic hash bites as you size And then here we can do mute out and now the these bites should be the the hash of this So if I now do, you know Let me bring in in Dev dependencies something like the hex crate zero four three And I want to print line Um hex and code of bytes And now if I do cargo tests here Okay, so both the test pass that doesn't really help us here. Oh, we probably want to say that it's okay for the Uh We're okay for dead code in here too This is like a bunch of like never used for the constants But what I want here is it hashes And I want to see its output Okay, so prints a hash the hash is the same every time that seems promising And if I change the input Arbitrary data to hash in rust And then run it I get a different hash. So this seems like it hashes It seems like we have the hashing api working um And that's sort of all there is to the this this part of the ffi right like You make sure that you get the the sort of raw definitions working the raw externs and the wrong types Uh, not wrong raw types, um, which bind gen can really be helpful with Um, and then you write wrappers that you know make the assertions That are needed by the the real the api's invariance Structure it with with more ergonomic rust types And then you have a safe api that implements that's implemented on top of the raw api Check it against b2 some do I have b2 some I do have b2 some That's a good point Um, I wonder if That will work actually Yeah, fine Uh That's unhelpful Can I set the output size here? length I need to know what this length is 32 So if I do l 32 32 times 8 256 And this is going to be different because it includes a new line And hey, we get the same hash So this hash that we got from doing the ffi is the same as this hash, which is what we get from the real Blake 2 Nice Great. So now we have, you know working ffi in this direction now Um, you might wonder well, what about going the other way like what if The c code wants to call rust code and the process is exactly the same, right? So so well mostly so what you would do is let's say that the c api expects a function point Of some kind. Well, you can just declare extern fn This is rust And you have to make sure that, you know, the the arguments are You know valid c types What is the actual definition of this I thought I could just cheat here, but I guess not um So you declare a function like this, um, and then, you know, let's say that there was some ffi function that required a function pointer to this You can actually just do, you know, you can just pass this is rust as um I think you have to cast it specifically as like a star fn type thing, but that's all you really need to do and then the the c code can Treat this just like it would any other Any other function pointer in c itself. It just happens to be calling a rust function Usually the the thing you want to do here is you got to make sure that the that the type is extern You got to make sure that all the arguments and all the return types are, you know, validated representation in c and match what the c code expects to be calling And The other thing you want to be careful about is memory allocation So in general, if you allocate memory in rust, you'll want to make sure it gets freed and dropped in rust If you if it gets allocated in c, you want to make sure it gets deallocated in c as well where things gets weird is, you know, if Imagine there's a c function that expects to call You know some callback you provide it and it's going to return memory Like it calls a rust function that returns like a vector or something but cast into the appropriate array pointer for c If c then tries to free it, you're going to be in for a bad time So you want to keep track about where the allocations and the allocations happens and usually It's best if it all happens either in the c code or it all happens in the rust code rather than try to mix and match and remember The other thing that's worth knowing about is a no-mangle So no-mangle you can put on a function so that its name actually ends up exactly like this In the final binary symbol table If you don't do this, what's going to happen is the compiler is going to still compile this function, but it's going to be It's going to be have a sort of auto-generated name Which isn't the problem if you only ever pass pointers to it But it can be a problem if you actually want to name this function from c like imagine that in the c code You know, let's let's pretend that this is c Uh, you know in c you're going to have something like this is rust uh int uh extern's car I'd like imagine you have this definition in c Then it expects that the final symbol table of the binary has a function called this is rust So there's no function pointer being passed here. It's just a c Implementation is expecting this function to exist under that name So in that case you actually have to declare this as no-mangle to ensure it gets included in the final binary, right? So like it's basically it's pub uh, and also pub And also that it retains its name so that it actually ends up under the the name that the c implementation expects But that's all there is like there's nothing more special about the f of i going the other way It just make sure that you match the calling convention and the types and the representations and then you're kind of good um Okay Are you duplicating the out reference? Is that allowed? I'm not duplicating the out reference I'm given because what I'm saying here is that the The return type here is tied to the same lifetime as this So if someone gives me a mutable borrow of this the mutable borrow I give them back um Depends on this borrow so they're not allowed to then continue using this mutably as long as this value still lives Which we can test in the in the test here like if I here try to then do You know Bites here is referring into out right And this is therefore also referring into out So I should be I should not be allowed to do something like out zero equals one uh Or even this And indeed I get the error cannot use out because it was mutably be borrowed up here One thing I could do here right is I could Restrict this a little bit and say that the thing you get back is expected to be read only there's no real reason to do that um But what we can if we want to um okay I think that's actually all I wanted to cover In this we've talked about build scripts in a lot of depth um, we've talked about ffi in both directions bindings We did a basic binding to libsodium, which we've tested actually works um Is there anything else I want to talk about here? Let's see So yeah, so cargo gives example of like what to use build scripts for bundled c libraries finding c libraries generating rust modules Platform specific configuration. Yeah, so there's there's one last that I wanted to talk about which is um, so there's a crate called auto config um, and auto config captures um Actually, there are two things I want to talk about but let's do auto config first auto config Tries to do something that a bunch of different crates were previously using build.rs4 And they were sort of doing it in ad hoc ways and it's doing basically compiler feature detection So the idea here is that sometimes You know, you want your code to be compatible with old versions of rust But on newer versions of rust you want to take advantage of newer features And auto config can let you do this because what it basically does is It can test compile a program and emit a config that you can do conditional compilation based on uh Based on whether or not a given program compiles. So for example, you know, it can say Uh declare a config if the compiler has this type Or if it supports nightly features or if it has a given type available Um, there's a there's a lot of stuff you can use this for very commonly. It's used for you know Uh detection on whether you can use a particular nightly feature Detection whether whether a given type is available Um getting the sys root from rust c in case you need that somewhere else in your binary Check whether a trait exists Check whether a constant exists basically all of these things that You can do detection of in build at rs to figure out which conditional compilation Properties you can make use of Auto configure the very light dependency like it has no transitive dependencies And it's intended entirely to be used in in build rs And it's used by things like you know, I think anyhow uses it to figure out Whether it can use nightly features to do things like make backtraces night nicer. And so that's one way to have your your crate optimistically or Uh conditionally use nightly features rather than making the whole crate nightly only There there is some work on uh config accessible So this is also a could uh cool feature That we don't have yet, but Config accessible is something that they're working on added to the language There's another one which is config version. I think and both of these are going to let you be Do conditional compilation based on whether a given path. So like a type trait constant Is available in the current version of rust being used Um and this is going to basically mean that you can do this without needing auto config for a lot of it The other thing I wanted to talk about was so we talked about bind gen Which lets you generate rust bindings for based on c header files There's also a tool called c bind gen, which is the inverse it takes A rust api and it generates c header files That that then let you call the rust code from c and this is useful if you say have a Rust library that you build as a shared library and you want to be able to use it from you know Python or node j s or just c or c plus plus where those languages know nothing about rust They also expect to use the the c abi And so you generate a a c header file that they can then use basically their equivalent of bind gen To get bindings into their language and thus call your rust code through the c abi If you happen to be talking between rust and c plus plus Um bind gen doesn't work that well for c header files and Calling rust apis from c plus plus when you're constrained to the c api is a little limiting So there's a great crate called c xx which tries to Allow you to build a better A more ergonomic interface between specifically rust and c plus plus It does require that you make some changes on both sides of the interface So you have to be able to control the c plus plus code and the rust code But if you do you can basically use this little Thing Like this crate which lets you basically define the bridge Between these two languages and have it generate bindings that are they're much nicer for each language So if you happen to be in that situation, I recommend giving c xx a look Okay, I think that's where I want to end. I don't think there's anything else I wanted to talk about There are examples of a lot of the things that I talked about today in the The cargo book under build script examples So definitely give that a look to if you want to refresh some of this after the fact Any questions at the at the tail end here before we end for the day I think two hours is a pretty good estimate. Happy with that I don't know when the next stream is going to be I've given up on trying to promise I've gotten some good suggestions for more crust of rusts I could do I do have some longer implementation streams. I want to do but I'm not going to promise what I'm going to do them All right in that case Thank you for coming out everyone. Hopefully this was interesting and I'll see you all Next time whenever next time ends up being if you want to keep an eye out for when I stream You can follow me on twitter I am generally pretty good about mostly tweeting things that are related to my streams or at least related to rust Um, there's also a discord. I'll put the link in in the video description For rustation station for the podcast that also has um, you know sub channels for Uh for streams and so you can also keep an eye out there All right, so long for well. I'll see you this end. Goodbye