 about using Rust from Sierra and any other language. I gave a talk at a meetup a few weeks ago or a month ago where I basically talked in great technical depth about exactly what Skylight strategy is, and that talk is gonna be way better for the technical details. What I wanna talk about today is more about the philosophy of how you should think about doing it because it took me a while of writing Rust code that interacted with Ruby, which is our use case, to figure out philosophically how to think about it so that you avoid bugs, and I'll talk about more why that ends up happening. First I wanna say my usual spiel about Rust, which is that I think the thing that's exciting about Rust in general, and it reflects a lot of people in this room, is that Rust is fundamentally a language that enables a lot of people who weren't able to write systems code, to write systems code now. And what that means without going into great detail about what systems code is, there's a lot of companies in the world that have a lot of engineers and they have some wizards floating around and what those wizards job is, is to write the C++ code that powers their company. And really in order for you to build a project that is very efficient in those spaces, you really do need to have some wizards floating around that know C++. And what that has meant historically is that people who work in either competing companies or startups have not really been able to go and compete in any area where you needed a legion of C++ wizards to get the job done. And so there's all these areas where startups just couldn't really compete unless you happen to be like a startup made up of people who quit some company and did it. But you, a person, a web developer, can't ever expect to be able to go and ever compete in these areas. And I think what Rust does is it lets people, it lets people do work that they couldn't have done before and that helps you as a developer, it helps you solve problems that you might not have been able to solve and that could have a lot of manifestations. But one that I've been thinking about recently is just the fact that there's all these areas where people feel like they're safe. There are these huge companies that have these teams of wizards that feel that they're safe because how is a little agile startup gonna compete with my C++ grizzled wizards? And that might not be true anymore. And I think that that's cool. And I would also say that if you yourself are a grizzled C++ wizard and there are many such people in this room, I think Rust helps you do your job better but it also helps you bring more people into the fray and if you really wanna be competitive then that's good for you also. So I think that that's awesome and actually that's sort of the story of Skylight. So Skylight is a small company. We have about seven engineers and those, I say about because it's been a while and fluctuates but those seven engineers largely had never written systems code before. And what we were building was basically a product that allowed you to instrument your Rails application so we could give you better information. And there were already people that existed in our field that were very big 800 pound gorillas that had many engineers. And we wanted to compete in this area with a very small number of engineers and fundamentally what this thing does is it basically collects information, sends it off to a daemon, the daemon sends it to the server, we collect it, we process it, we give you some information. And of course because we were Ruby programmers, the first thing that we did was we wrote the whole system just in Ruby. So the daemon was written in Ruby, the individual pieces that run inside your Rails app was written in Ruby and this actually worked great. It's actually a pretty good way of getting off the ground when you're a startup. But pretty quickly we realized that we had other things that we wanted to do. We wanted to for example collect information about memory allocations and report it. And with Ruby code we had hit the limit of what we were able to collect without imposing unacceptable overhead on you the application, on your application. And at this point we had a few options, right? We could have said we're just gonna stay where we are, we're gonna collect whatever information we could collect within the overhead of Ruby and then that's it. But what that would have meant is that we couldn't have been that competitive with our competitors, right? We really wanted to do things that our competitors couldn't do. So that wasn't really a great option. That felt like giving up. Another thing that we could do that we explored was rewriting pieces of this, some of these purple boxes in C++. And in fact Carl, who just spoke, actually wrote a working prototype of a lot of that stuff written in C++. And I took a look at it and I said, Carl, you are a much better C++ programmer than me. I do not feel confident that I could personally make any changes to this whatsoever without possibly introducing segfaults. And I think that's basically the story of C++, right? Even the best hackers in the world who work on browsers are constantly introducing, not just segfaults, but exploitable crashes. So C++ is not a great option for a small company, a small company that's trying to compete with the big boys because you might get some better performance that lets you do things you couldn't do but now you're gonna spend all your time tracking down bugs. And in our case, our code runs on environments that we can't even see, right? There are thousands of machines that we have no control over. So we can't even log in and debug. We really need to avoid segfaults. So we were kind of at an impasse. Our options were bad and worse, right? Our options were can't build a product you wanna build or possibly introduce very likely segfaults into our users' computers, which probably would cause them to stop being customers. So that felt pretty bad. And I got lucky, which is that around the time that we were starting to run into these limitations and customers are starting to report too much memory and things like this, Rust, Patrick wrote the blog post where he said Rust is getting rid of the garbage collector and Rust hadn't yet gotten rid of the garbage collector but Rust was talking about it. And so it became obvious at that point that it would be possible to write Rust code that integrated with Ruby code. So the first thing that we actually did, actually this is kind of backwards. So the first thing that we did was we took all the Ruby code that was, or a lot of the Ruby code that was running inside of your Rails process and we converted it into Rust. And the really awesome thing about this was that we were able to take, we didn't have to create, change everything. In fact, today there's still a fair amount of Ruby code because we're instrumenting Ruby code, right? But we were able to take big chunks of it. The things that we're using up the most memory or taking up the most CPU time, we were able to rewrite them in Rust. And I think this is a really nice thing about how Rust works, which is that you can sort of dip your toe in, you can take a piece that's expensive and you can go rewrite it without the fear that you might cause some psych faults. So we did that and then some time after that we rewrote the demon in Rust. And that was actually the biggest one because the demon in Rust we could run on eight megabytes of rock solid memory forever where in Ruby, as you probably know, that's not even a fever dream, as someone said earlier today. So we were able to run eight megabyte processes that never, that are stable and that was a pretty big win. But, so Ruby talks to C and Ruby talk, and if you basically wanna write any kind of native code in Ruby, like we were doing in Rust, at some point you have to go through the C boundary. And the C boundary is sort of what my talk is about because while everything I just said about writing Rust code is true and I can give a thousand talks about why it's awesome that we wrote all Skylight, the demon in Rust and all this stuff. At the end of the day at some point Ruby talks to Rust through a boundary that is unsafe. So Ruby's safe and Rust is safe but there's this dangerous boundary. And what I wanna talk about today and what I've spent a lot of time thinking about for Skylight and Carl as well is how do we both think about that boundary more correctly and how do we try to, how do we use that boundary in a safe way, right? How do we, so again Carl is an awesome C++ hacker and I trust him to write the C code but if anybody has to go in and fix bugs if anyone has to change like a login message even we wanna make sure that there's a pretty strong way of thinking about what's going on in that boundary that doesn't create so many bugs or that creates ideally zero bugs. So what this means is that the Rust compiler protects us from low level bugs but it only protects us from low level bugs in Rust. But, and this is the big point of what I'm gonna be talking about is Rust not only has a compiler that protects us from bugs it also helps us communicate about more universal constraints that exist in all low level languages. So whenever code you write in Rust is actually describing ownership problems that aren't specific to Rust the programming language they're specific to systems programming. So let me show you what I mean by that. So here is a function that you can write and see and this is, you can imagine we have a trace object and the trace object is a thing that's collecting up information before it gets sent to the server. So I wrote a function here and the function returns a car star, takes a trace name and it has, sorry, takes a pointer and it returns a car star and has some implementation. So there's a bunch of questions that we have here like is the trace name function allowed to retain an alias to trace star? Is it allowed to mutate it? Can a caller mutate the trace star during the call to trace name? And this is actually one of the more pernicious kind of bugs that can occur. And who's actually responsible for freeing that car star? Basically all these questions are left up to the imagination of the implementer and in a really, really good library you somebody will write documentation that explains it but every single time you write a function you sort of have to go through this checklist of things for every parameter and every return value to try to understand what is allowed and this is why there's a lot of bugs. Now let's look at the equivalent story in Rust. So in Rust we have a function takes a trace name borrow trace string very similar function but we actually have an answer to all these questions, right? No you can't retain an alias and if you write Rust code the compiler stops you. No you can't mutate it, I didn't ask for it in a mutated way. No I cannot, the caller can't mutate me and the caller has to free the return string and so these are all very useful things and importantly just like when you write in a type language you don't have to reiterate all the piece of information about every type and every piece of documentation and when you write in Rust you don't have to reiterate all these pieces of information about what these types mean in every piece of documentation. In fact this is what is awesome about Rust, right? You just look at the type signature and it's expressing in a very short, terse way a ton of information. So before I go more into the methodology I wanna say what's in scope here. So what's in scope today is writing a library like Skylight that is meant to be called from C and because C is the lingua franca that's the or any other language in my talk, right? So we're talking about writing a library that you compile into a .a file or .so file and then you're going to call it from some C code or some language that has a C API. And so that's the perspective of we're using Rust to create something that is just a C library. What I am not talking about today is calling C code from Rust. There's actually some similar ways that you could think about things but I am explicitly keeping it out of scope to keep things relatively simple. So before I showed the trace name function what happens if we change the function to an extern C function with no mango, right? So we have said this function is callable from C. It's important to note that all of the things that we said before about what this trace name function said are still things that are true, right? So it is still the case that this function is communicating that trace name cannot retain an alias to the trace that trace name cannot mutate the trace that the caller cannot mutate the trace, right? But there's a little bit of a difference here, right? Before what we were communicating we were communicating only the same things but we were expecting the Rust compiler to catch us if we make a mistake, right? We were expecting some other Rust code to call into this and the contract between the caller and the callee and some of those that contract is enumerated here we're expecting to be satisfied by the Rust compiler. So if I make a mistake it gets caught. And what happens when you say extern C function you're just moving the responsibility from the Rust compiler to the person writing the collaborating C code, right? So you're expressing all the same requirements and in fact if the person writing the C code violates these requirements, Rust is no longer safe, right? So if I call this function from C and violate one of these rules, Rust does not promise me that any code called from this function is safe. It can have second faults galore, right? So we're really talking about exactly the same requirements that we were talking about before. We're just saying that because we're calling the code from C, the C code is responsible for maintaining the invariant. So one way that I would say this is Rust is just a DSL for describing ownership concepts that you already have to think about when you're writing C or C++ code, right? So when you're writing C code all those questions still applied, right? I asked them, we didn't have an answer for them but I asked them. But we have to communicate them at ad hoc ways just like in a dynamic language you have to communicate things in an ad hoc way through documentation. And what Rust gives us is a way to communicate these things directly and tersely. The concepts of ownership are really fundamental. There are things that Rust makes you think about and this is sort of why a lot of people say when I go back to write some C++ code or even Ruby code after writing Rust code I think a lot clearer. Why do they say that? They say that because the concepts that Rust is making you think about are fundamental concepts. They're describing things that exist in C or C++. It's just that Rust gives us vocabulary to talk about it and it makes the contracts tersen enforceable. So let's look at another example and I'll use the same example throughout the process. So here we have an example of a function called trace ID. It takes a borrowed trace and returns a U64. Actually this is not the example I will use for the rest of the talk but it's an example. So we have a function that's trace ID. It takes a borrowed trace returns a U64. It's important to know that from a simple level, from a high level, if you're not thinking about anything else, Rust is expressing some things that are basically the same thing in that exist in C. We have a pointer, amp-cent-trace is basically a pointer in C and a U64 is basically a UN64T that is the wrong type. That's fine. And so from a basic level, when we don't think about ownership contracts, we're actually talking about something that's rather similar. And so there's a process here that we can use to get from start to finish when we're communicating. Step one is we need to identify and implement the shared type definitions in Rust in C. So we have a Rust function and we need to figure out how to express all the types in that extra Rust function in C. So that's step one. And we can sort of see how to do that. Step two is, okay, now that we have done that, we can use Rust's ownership system to flesh out and describe what the ownership and mutability rules are. And then step three is maybe, and this is sort of an optional step, you might in your C code or your C++ code find ways to dynamically enforce some of these requirements and I'll show an example of that later. So step one is identify and implement the shared type definitions in Rust in C. So here we have this function definition of extra in C function that takes a trace and returns a U64. And the first question is always, okay, I have this Rust definition, what is this type in terms of C? And when we're talking about basic numeric types, this is actually, I feel like I need to fix this, I'm sorry. So in simple types, for simple numerical types, there's basic one-to-one mapping. You might have to read the C standard to figure out what things are exactly guaranteed and I wish there was a table somewhere. That'd be awesome. But the go-to answer for simple types is basically, effectively the same thing in C. And basically these are like most simple copy types that are built in. So if we wanna write a prototype for this Rust function in C, we're gonna write something that looks like this, right? It takes a pointer to a trace and it takes a U in 64 T. Now what about this other thing? What about this trace object? So here we have a function that says new trace and what it returns is a box of trace. What is this type in terms of C? And I'm actually wanting to talk about this sort of in reverse here, which is to say when we're communicating arbitrary structure from Rust to C, I think the best starting point, unless you have a really good reason otherwise, is to treat anything in Rust that's not a basic numerical type as an opaque structure. And it's important to note that you might think, well, I have this structure, I wanna get access to its fields in C. And I would point out that you can just write an external function that does literally anything you could possibly want. But writing an external function in Rust, makes sure that all the rules that you're supposed to follow are followed and trying to do the same thing from C introduces a whole bunch of rules that you have to remember to follow. So my go-to solution whenever I'm talking about anything that's not a basic numerical type is to treat it like an opaque void pointer and write any functions in Rust in which I need to manipulate. So my go-to answer for structures that are not numeric is to treat them like void pointers. So you can basically, simple thing is to just do a type def from void pointer to trace and then you can do the right thing. So here is the simple example that I'm gonna use a little bit is I made a simple structure called point. It has an X and Y, two I64s and I have a function that returns returns a point, a box of point and it takes two I64s and I have a function that returns the length of line between two points as an F64 and it takes two points. And I'm not going to write the info here on this tiny slide. So this is a pretty simple thing and what I want to talk about here is how do you represent that in C and in terms of the rules that I just said. So basically what you do is you say, okay, a point is a void pointer like I said before and I'll just make some type defs here to say I64 is just in 64t, F64 is double according to relatively modern C standards and now I'm going to write two functions that look like the equivalent rust function. So now I can say point new point of I64x and I64y and I can take line length right. So far all we're doing remember we're not talking about ownership at all. So far all we're doing is we're finding sort of consistent methodology for taking a thing that looks like an extern rust function and writing and accessing it from C, again ignoring all ownership rules at all. And basically what I would say is the simplest way to get started if you're just trying to get started you don't have any very, very complicated requirements is for things that are numerical portable copy types just use the portability for your own structures just return a box of t and type defs at the void star if your function takes a bar pointer that's again the same thing it's a pointer in C and then there's an exception which I don't want to go into which is that there's strings and vectors are a little complicated and you probably want to define some kind of mechanism for transferring it. But the TLDR here is just effectively anything that is not a number treat as an opaque structure that you don't have access to at all and see except as an opaque structure. I think that actually simplifies your ability to reason about things. So, and I would say also in practice you basically always end up wrapping the rust pointer that you get in some kind of object for your language. So in C of course you don't have to do that because pointers are pointers but if you're writing like a Python extension or a node extension or Ruby extension you're gonna end up taking that pointer and putting it into some other object and out of scope for this talk. So that was step one. Again I think step one is actually the boringest part it's the part you could sort of reason through yourself if you're building, if you're trying to write C extensions that talk to rust is we have some types of rust, we have rep or C, we have U64, it looks like a U in 64T, pointers look like pointers, so basically everything's fine, you can sort of muddle along and everything's fine and if you in fact write this code without any more methodology, basically everything will work fine because basically most of the time people don't have bugs, right? So in the same way that you would basically be fine if you wrote everything in C if you do this stuff you will basically be fine. But what we really wanna do and the really interesting part here is to say okay now that we have done that what are the actual requirements of the C code, right? So right now we have written some code that just talks about rust complicated ownership types as bare pointers. And if you just write some random C code you're not going to know to do the right thing. So how do we describe what the requirements are of the C code? How the C code can think about its communication with rust. So let's go back to this example, right? This example we have an extern function, it takes two points, it returns an F64 and it calculates a line length. So what do we do so far? So what we did so far was we said these ampersand points are the same thing as a void pointer and we said that this F64 is the same thing as a double, right? So that's what we did so far and again it's pretty straightforward and if you muddle along it will work, right? Your program will run. But again, rust types are saying more things than just that this is a pointer of type, you know, float with 64 bits in it. So let's imagine, let's do an exercise and let's imagine that what we're going to do is write some docs for this hypothetical C API and if you were trying to hide the fact that you're writing rust code and produce a library, you might do this anyway, right? Write some docs that say what guarantees you can have and so what we have here is we've done the same type definitions but now we want to say a few more things. So number one we say, okay, we take two point structs and you can rely on the fact that the line len function will not create aliases to the points that outlive the invocation and it is old, so that's the guarantee that the line len function is giving you and your requirement, your part of the contract is that you should produce memory that is valid and immutable for the duration of the call to the line len function. So we're basically saying these two things and I also added const here because I didn't want to type the sentence that const means in English but it actually adds very little. It basically just says that the line len function will not mutate things but that if you look at the other pieces of text there's more points about mutability that the rust thing is saying, right? So let's go look at again the documentation that we wrote and the rust function. So we have these two ampersands here and yes, yeah, yes. So I think the point that he said the const is not actually saying the thing I wanted to say. Okay, this is actually why C is painful. So the point is you might have wanted to write a sentence that said by the way this function is not going to mutate the memory that you give it and that would have been a useful, it seems that sentence would have been a more useful sentence than writing const regardless of whether I got it right or wrong. So we have these ampersands and first of all these ampersands are saying const but they're also saying these other things, right? They're saying you can rely on the fact that there's no aliases being created and again, once you get into the rust side the rust compiler will actually enforce the callee's part of this, right? So the callee has some requirements like not making aliases. The rust compiler totally doesn't force its side of the contract but there's some parts of the contract you have to enforce. I'm not going to fix that. The point should be memory that is valid and immutable for the duration of the call to the line line function is a requirement that you have on you, right? So I guess the point that I'm trying to make here is rust is expressing in very short a terse form, a thing that you have to write a lot of documentation for. So if you are actually writing rust code that you are yourself consuming from C I would claim that you should just use the rust type signature as the explanation of what the requirements are. So instead of trying to make sure that you always remember all the exact set of rules and transcribe them correctly and keep them up to date as you might take different signatures I would claim that the signature that we have here the ampersands that we have here are enough to express all those things and that that is actually the right way to think about it, right? The right way to think about it is when you write an extern function the extern function has a bunch of rust annotations and the rust compiler will enforce the part of the contract that is the callee side, the rust code side and it is also expressing some requirements on the C code side and that when you are writing C code you should look at that signature and you should remember that you have to actually follow those rules, okay? And of course the fact that this contract is unenforced is why we tend to not like writing a lot of C but in this case we can't get around it that's the premise of this talk so at least we have some clear understanding of what the rules actually are. Now step three is an optional step but I do it and I think it's pretty useful which is try to find some ways to do some dynamic enforcement of the rules that we set up and I'm going to show a hypothetical library without any implementation that you could imagine working and then talk about momentarily what we actually do but the hypothetical library is you import an ownership.h and every single time you make a new rust type you put it inside of a cell and every single time you want to pass it to rust for borrow purposes you call the borrow macro so that's not so interesting so far that's basically just getting a pointer out. The reason it's interesting is that there's another there might be another macro called transfer and when you call transfer what that is doing is it's removing the pointer from the cell and so if you actually try to borrow again in the future you get a dynamic error so that's a really simple set of macros so you can imagine someone writing you can do a more robust set of macros that handles more cases but the point is that I think it's pretty useful when you're writing C code to have at least even if the macros don't even do anything and don't provide dynamic errors I think it's pretty useful to have a set of macros that at least express to future readers of your C code what you were trying to do and I think it's especially nice if the macros can enforce things. Okay so that's sort of my methodology. Step one do the sort of boring muddle thing where you convert your rust types into C types mostly by using opaque pointers and writing externe functions to do anything interesting. Step two is now you actually have the set of rules that are the ownership requirements and mutation requirements and either document them if your consumer is somebody who doesn't know about rust or use them as your documentation if your consumer is you and then step three maybe do some dynamic enforcement I want to say some things that I didn't cover here so I talked about only return types that are super simple and if you have a non-simple return type I think or you want to express errors I think the right answer is to use an out pointer to express return and a boolean to express whether an error happened but there's some other thing that you could theoretically do but you should make it consistent. I also haven't talked about like rep or C of structs and just like using a rust struct from C and getting at its fields again I think that most of the time that's more trouble than it's worth and you should think carefully about why you actually need it that's not to say you might not if you have an extremely chatty situation but it adds, it makes you have to think about all these problems very granularly instead of very coarsely which is not great. I also didn't talk about exactly how to implement dynamic ownership and that's because if you're writing an embedding for like Ruby or Python or something like that there is probably already a type that is the object type of your system that lets you put like a C pointer into it and if that type exists then you should use that as your cell and then you can write some macros against that type. So I don't, it's gonna be specific to the thing that you're actually trying to do but the rough confers are what I said are correct I think. I also didn't talk about the fact that when you call into Rust code you probably want to catch any panics at the boundary because panics that cross the boundary abort your process and we do some nightly unstable stuff today in Skylight to make that work. The correct answer is to have a catch panic solution that is stable and maybe I'm speaking out of term but I think that that's coming soon or eventually. And then there's another thing which is like writing an external function that is meant to be called by people who you don't trust at all and you want to like take your pointers and convert them, first convert them to options and then check to make sure they're not nulls and do other checking and I think that's totally a thing you might want to do but I think it's a different kind of programming, right? This is the kind of programming where you want to use the Rust type system as a way of expressing the contract with C code and that's a, I'm writing a library that's pretty paranoid about who's calling me and whether I trust them to actually follow the rules. So yep, not covered. Thanks.