 Thank you. I'm super happy to be here and present our work today. This is joint work with Galera and the company I'm from, Immunant. I'll get into that in a second. But just in case... I've got to figure out how to switch slides. Okay, just in case you thought this was going to be a talk with cute cat pictures and gifs and jokes, it's not. So what we do is has been funded in part by DARPA and they require us to start with this legal disclaimer. So I'm sorry, but this is what I have to show you. We asked if we could please talk to people about this at Rosconn and DARPA said yes. So that's all the legalese I have to bore you guys with. So just really quickly, my name is Pierre. I'm a co-founder at Immunant. I work with a crack team of compiler and linker experts from Denmark originally if in case you wonder why my accent is weird. I enjoy living in Southern California these days and I'm actually not an expert on Rust. So I have a little bit of imposter syndrome going on today. My background is mainly in C and C++ exploit mitigation. And I'll talk about how that led us to the C to Rust work. But first, I just want to acknowledge that I'm part of a much larger team. There's three gentlemen from Galawashown here and three of my colleagues at Immunant that are all crack coders and they do much of the heavy lifting and if you have any really hard questions, I'm going to defer to Andre who's in the audience today. There's also some researchers at UC Irvine that help with this work from Michael Franz and Sten Falker. So with that out of the way, why should you consider even moving your well-tested C code to Rust? Even as awesome as Rust is, I think that's a fair question to ask. So I'm showing a graph here from the National Institute of Technology's National Vulnerability Database and it essentially tracks 20 years worth of stats on buffer errors which is not the only security issue you can have in C and C++ code but it's among one of the most common. So that's the kind of stuff where you under overflow an array index and you access memory that you're not supposed to and in many cases that's exploitable and all hell could break loose. So as I said, I have a background in working on C and C++ exploit mitigations where we try to change the way we compile C and C++ code and you can do something to sort of take the lowest hanging fruit off the table to adversaries but in the end the pattern is that you come up with a new exploit mitigation and then the adversaries find another way to work around that and still exploit the C and C++ code. So it's an arms race really and Rust is an attractive migration target. I'm sure I won't have to sell you guys on Rust but it does have this interesting property of providing not just type and memory safety but also freedom from data races. That's like catnip to people that have written a lot of C and C++. So we thought, well, okay, Rust is great but it's obviously not easy to get into Rust. Is there anything we can do to lower the barrier to entry there if you have a C code base that you sort of depend on and must use? So we're going to be talking about doing two things and the first one is reducing the tedium of just getting into Rust syntax. I'm going to show you some really disturbing and ugly Rust code later and that's something you're going to want to refactor into pretty and idiomatic Rust that doesn't make your eyes bleed and as you're doing that, that's going to be a mostly manual process. So we're also going to do something to try and help you catch errors during refactoring into more idiomatic Rust. So this is the grand idea. This is what we pitched initially and what I hope we can get to one day in the far future. So the main points of notice here is to notice is the flow on the top where we're taking untrusted unsafe C code and we run it through a transpiler and outcomes unsafe Rust. It's a syntax driven translation into Rust that basically projects the structure of the C code into Rust and uses C types and it doesn't buy you much in terms of security but at least it compiles with Rust C and that's something. So that's the initial stage from 0 to 1 and then the latter stages is sort of iterative from N to N plus 1 where you have some Rust code that you're not entirely happy with yet and you want to make it more idiomatic and you basically have two options. You can either manually rewrite it or you can hope that we provide some sort of refactoring that can either suggest or do a rewrite for you that improves the quality of the Rust in some way and in case you are forced to do a manual rewriting we know that humans tend to make errors, we're not perfect so what we do to catch those errors is that we cross check the Rust code you have against the original C program so we have a tacit assumption here that you want to at some level preserve the functionality of what your C program is doing so let's say you're migrating a decoder you still want the decoded output from your Rust code to be the same as your C code so there's lots of opportunities in there to check the two versions or sanitize them against each other so we'll talk about that later but first, transpiling the things we wanted to accomplish with our transpiler I'm gonna talk about a few other transpilers in a second so we have some unique goals partially informed by the experiences of others that thought to build a transpiler before us and inspired us in that way is that we wanted to do robust C and C++ parsing we basically want to handle more than hello world we want to be able to take huge and crusty old C code bases and parse them and we want to preserve the functionality of that code because it's most likely well tested especially if you care enough about it to move it into Rust so we've got to preserve the functionality of the code we take in and convert to Rust we also want to to the extent possible generate output that's fit for human consumption somebody's gonna be hopefully refactoring it later so if there's something we can do to not make your eyes bleed we will try and do that finally, we're also excited about Rust so we want to write as much as the transpiler in Rust and we want to reuse some of the Rust compiler internals on the back end so first a hat tip to other efforts Corode was the first C to Rust transpiler it's written in Haskell by Jamie Sharpe we met him yesterday nice gentleman it uses the Haskell C parsing library and because that is less used and less maintained and battle tested than the clang C compiler that leads to certain limitations but it's certainly an impressive effort for one guy and so that's good work and there's also citrus.rs which is interesting because it's based on clang like we do but it's not making any effort to generate C code sorry Rust code that would actually run right away it's trying to generate the closest approximation of what you would want to write ultimately so it's merely trying to help you with the syntactical changes so we sort of slot in between these two by handling all C input and also generating something that runs that you can work further on so our transpiler is kind of a Chimera it's C++ in the front end because of clang and then it's Rust in the back end so the flow is that we take in a bunch of C sources and we take a JSON file that's called CompileCommandN which informs us of how the compiler was invoked so we can have clang preprocess the C code and then we translate it and that has some interesting consequences that I'll mention briefly later so all of this is driven by a glue Python script and underneath the hood we have two binaries one is a C++ binary called the ASC exporter it's fairly boring it merely serializes the clang AST into a C++ file it's a complete arbitrary choice of format and that C++ file is then consumed by our AST importer so the AST importer is the most interesting part of the transpiler it deserializes the C++ file and then that gives you a clang AST which we represent in Rust and I think we use bind gen 2 to make sure that we synchronize some of the data structures there we transform that clang AST into our own internal importer AST and then as a second step we do a similar transformation I walk over the importer AST and build up a Rust C syntax AST from the Rust compiler which means that all this code has to use the nightly channel because the Rust compiler syntax tree is only exposed on the nightly channel for now in the process of this second stage conversion we prune C declarations that we don't see being used in the current translation unit because you don't have even in your hello world you'll probably have to include a header file for printf and that pulls in a lot of other gunk that you don't necessarily use so we prune that out and we also look at loops that contain unstructured control flows such as go-tos and we try to generate a valid Rust code for that so I have a few details on that later too but first pre-processing we do our transpiling after pre-processing and that means that we need to know how to invoke the compiler with the right flags for a given platform so this is a problem that IDEs always also have an analysis tool so there's an existing solution where clang will actually read this JSON file I mentioned earlier there's an example on the right showing the contents it's very simple it just gives the arguments to the compiler and the build context and you can get this file automatically but the way you do so depends a little bit on what build system you have and what platform you're on so if you're using CMake, great you just add another flag if you're not using CMake, clang comes with a script that's called intercept build that you can use for make file projects on Linux and Mac OS or you can use Bayer on Linux so there's a few different ways and you've got to find out what's right for your environment it's messy and sea lane as usual so let's get back to transpiling loops in the simple case where the C code doesn't have any unstructured control flow we simply generate Rust while loops so that's fairly simple and straightforward we generate while loops no matter whether you put in a for loop or a while loop on the C side in case you have go to's things get a lot more interesting so this code example we have a demo, I don't know if you've seen it out front but the Galwegians from Galway they've been running the demo website and somebody actually submitted this code example to the demo website and we're presenting here for you viewing pleasure it actually re-loops quite nicely to a while loop in Rust so I'm calling it re-looping because we're using the re-looper algorithm by Alun Sakhai of the inscription project they face the similar project problem insofar that they're translating LLVM IR into JavaScript code which also doesn't allow unstructured control flow so we re-used Alun's work but we're doing a little bit more because we are interested in human consumption of our output where inscription is simply feeding the JavaScript to a JavaScript compiler hopefully nobody touches that once it's been transpiled so we try to preserve commons while re-looping and we also optimize for readability and this is obviously an example of where it works alright so there's a couple of things we can't transpile today the one we run into the most is lack of support for variadic function definitions we can call variadic functions that are external in C code so we can call printf for instance no problem there's syntax for that but there's still a blocking issue on actually having rust function definitions that have a variadic argument list bitfills is also something that we're blocked on but there's also a Rust RFC for that so if that ever gets implemented in Rust we can translate it I'm not going to go into long doubles and complex types that's a libc rust create issue but macros is something that we're not showing in the output because we do the translation after macro expansion so that means whatever the macros expand to on your platform that's what the Rust code is going to reflect now it would be much nicer if you could preserve the portability and flexibility of your C code by at least handling macro cases where there is a reasonable translation into Rust so remember that C macros don't operate on the syntax tree they're purely textural replacements so it's possible for a macro to expand to something that doesn't generate a valid syntax tree and sometimes you have pairs of macros begin and end that have to be used together to get valid C code so stuff like that is something we don't see ourselves ever supporting in Rust but there's probably a good deal of macros that we can't support with reasonable effort but as of now we don't so this is the web demo I hope everybody has seen this already now and actually talked to people at the demo slot instead but if you haven't go to ctorust.com and you can see that you can either type in your own C code or you can choose from a few pre-baked examples you can translate it, download the output hopefully it runs and there's also links to the source code in an FAQ so we consider this our business card and way to get a hold of us if you want to complain so if you want to do more than just translate one file at a time you gotta clone the software and build it locally and the way you do that depends a little bit on what platform you are on so some of our code is Linux only so if you're on another platform I'd encourage you to either take a look at Docker or Vagrant we provide scripts for both one is a containerization technology another one is a virtualization technology and we provide scripts that will build a Docker or Vagrant environment and provision it such that you have all the right packages in the right places for the ctorust build systems to just function flawlessly and if they don't I hope you'll file an issue or write me a sternly worded letter so because it's kind of a chimera we use Python to glue together the build processes so you simply run the build translator script with Clang and that gets you the translator and you can similarly build all the projects that are required for cross checking on the C and C and Rust side and the refactoring tool is pure Rust that's built with cargo the way we know and love so here's an example of how to transpile I have a little buffer library that I cloned from C lips and I removed variadic functions so I won't be embarrassed by warning messages and errors and you simply use bear to make it such that you get a compile commands adjacent file as you build it and it automatically runs the test suite and prints out okay so that's just the C code and then you can use our transpiled.py script to point it to the compile commands and it will look at all the C files and it will run the transpiler on them and it will pick up the main method from the test translation unit because we pass in the dash M argument and then you can simply go into the C to Rust build sub tier and run cargo and lo and behold it does the same thing great success so here's an example input function from that buffer library that I just showed you and here is the translated Rust output so you can see it's not exactly the Rust that a human would write it's all unsafe it uses no mangle it basically calls malloc just like the C code did it has the same kind of error handling and initializes the allocation after it's returned you can see pointer accesses are not pretty ads we use wrapping ad to preserve C semantics so that's also not pretty and there's a few superfluous casts in there or not superfluous but ideally one wouldn't have them in idiomatic Rust so here's something we rewrote by hand and you can see it does the same thing it's much cleaner and more compact there's one little change which is that when we allocate on the heap with Rust we get zero allocated memory so this is actually not a semantics preserving transformation and that's why we do refactoring separately so we can have a programmer approve these changes rather than just doing them silently so when we're doing manual refactoring I mentioned that there's the possibility for errors creeping in so we have a cross checking tool to verify that your current Rust version does something similar to the original C code to the extent you want to so what you gotta do is instrument the original C code and the translated Rust code we provide plugins to do that I'll get to those in a second and once you have your instrumented C and Rust code you run both of the programs and you feed them identical inputs such that you'd expect them to carry out the same computations so you see an example here where we have a simple ID function what we'll do is we'll cross check that the function names are the same and that the argument values coming into the function are the same and we'll also cross check that they return the same value so this is a very simple example it gets a little bit more hairy once the functions are non-trivial so you have two options for cross checking one is to do the cross checking online using something that's called an MVE I'll explain that in a second but for now just think of it as a way to execute two processes side by side and make sure that they get the same input so you want to do this because you don't have to do any logging and the MVE will replicate inputs to the two processes for you it does have some drawbacks in the area of compatibility you can take something as complex as a web browser and run two copies and cross check them with an MVE for a variety of reasons so if you can't use an MVE you have the option of doing offline checking where we log the program behavior to two log files and if your program is small enough that's something you can do when you avoid some of the compatibility issues of using MVEs so really quickly this is a fairly large research area but the idea is you have two variants of the same program so in our case it would be a C program on one side and the translated to Rust program on the other side and as soon as they make a system call we have a monitoring component that intercepts system calls and forwards them to the kernel so you can see first we call a BRK to allocate memory and in that case we want to allocate memory for both processes so the monitor forwards both calls and then later we make a write to system call and in that case the monitor intercepts both calls but it only forwards and it waits for the results and then it sends the results back to both of the processes so essentially the monitor provides the surrounding host with the illusion of one process running when in fact you have two processes running and doing the same thing receiving the same input and the monitor will also cross check that the two variants produce the same output so that's how we can detect differences in the C code and the Rust code online without logging so as I said we have plugins to instrument the code we have different run times based on whether you use MVE based cross checking or log based cross checking we have a zeroing malloc replacement so in case we end up cross checking data that would be uninitialized on the C side that does not create any problem for us I'm going to skip the last because we're running a little behind schedule so here's an example of cross checking a very simple library that we just transpiled we make it again, we transpile it and this time we add two new flags to transpile, we add dash x to enable cross checking and dash u to select log based cross checking and then we build it again and we point I set a ld library path to point to our log based cross checking run time simply cargo run and output I log standard error to a file and then I run the C program which has also been instrumented because I have a make file target called tests that underscore x check which passes the right plugin parameters to claim such that the C code will be instrumented for cross checking and again when it's run I log the standard error output to buffer.c.x checks and then I simply diff the two and that returns zero so they did the same thing which is lucky for this example so finally briefly I mentioned that we also want to do refactoring this is the least mature area of our work right now so I'm just going to show you a simple example where you have some while loops that step from zero to nine with a stride of one and two we have simple patterns I can recognize that this is better expressed as a four range loop in Rust this is obviously fairly simple and we hope that we'll get to substantially more advanced refactoring across files and involving substantial amounts of program analysis to do much more useful things so if we have a few minutes I'll just really briefly talk about pie in the sky stuff that we hope we'll be able to do so the main feedback we've gotten so far in our project is that it would be really nice if we could do something to generate safer Rust right away or automate some of the tasks so 100% automation is really really difficult so I have this illustration here that shows that the arrows we're sort of moving on rings and without any kind of safety transformation we're just staying on the outer rings and what we want to do is we want to bend those rings so we go into a provably safe subset of Rust and the challenges of doing so is that we have no an automated analysis has no domain knowledge that's really hard and then Rust has a substantially different type system so things like ownership get in our way of simply propagating types around so a really quick example here is something I stumbled upon while just refactoring a very simple quick sort example the partition method function sorry calls swap and this is all done with raw pointer so there are no restrictions on how we can do this as opposed to whether if we use erase slices so if I start using erase slices in my partition and swap functions I get into a problem because I want to pass two mutable references to the swap function and that's not allowed under Rust's ownership rules and for good reason so that's kind of a bummer for us so like any same Rust programmer would realize that the erase slice type provides exactly what we need and one should simply use that and that allows us to get rid of the C code hooray right that's awesome the problem is that we have no domain knowledge in a mechanized analysis so we have to do something well I'm not sure what the best solution is but one thing I could imagine or be convinced that a mechanized analysis could do is do some magic swapping where we basically take elements out of the array and pass them to swap both mutable and then put them back into the array and what that lets us do is actually preserve the swap function as we have it so that's one way to get to safer Rust but it's pie in the sky it's an open question if somebody has a really good idea in this area I'd love to hear about it and speaking of that we would love to work with anyone that has C code that they want to migrate to Rust because we think we really need to get some hints on porting experience to figure out where to go from here and we need to learn where to focus our effort please get in touch if this sounds like something you care about so this is the link to the demo website and this is the link to the source code please go to our GitHub and clone it and open issues if you run into problems thank you very much