 I can't see anything. Hi. My name is Emily Stolfo. I work for MongoDB on the Ruby driver to the database. It's the Mongo and BISAN gems, if you've ever used them. I'll soon be working on MongoDB as well, which if you're a Rails developer using MongoDB, you're probably familiar with. I am an adjunct faculty at Columbia, where I teach Ruby on Rails. I've recently moved to Berlin, so I actually haven't been teaching this semester. But I probably will come back in the spring and teach them more. And I'm going to talk about peeling back Ruby's layers and C extensions. So who here has written a C extension? OK, who's an expert on writing a C extension? Because I'm not, so. If you're an expert, this might not be the talk for you. Who wants to write a C extension? Are you curious about writing a C extension is that why you're in this talk? OK. So I had the experience of writing a C extension over the last year to provide Kerberos authentication support for our driver. And I learned a lot from the experience, both about Ruby itself, about the C interpreter, MRI, and about Ruby gems, what to do, what not to do. And I'm going to talk about what you need to know if you want to write a C extension for Ruby, and if you want to package it with your gem. What mistakes you can avoid that I've made. I think Ruby itself as a language is really unique because it has these different implementations. So it has the primary ones are like a C implementation, MRI, and a Java implementation, so JRuby. And I think that's really unique and interesting, partially because it means that you can write these extensions that work with external libraries. So in the case of Kerberos, which is a certain kind of authentication protocol, there are external libraries that you use to do all of the fancy math algorithms. And they're not implemented in Ruby. They're implemented in Java or C in this case. And you have to write some kind of glue that can go back and forth between your Ruby code and these external libraries. So once upon a time, in January 2013, Ticket was created for the Ruby team at MongoDB. There were three of us at the time saying implement GSEP, the Kerberos authentication support. And it was originally assigned to one of my colleagues, but it was a hot potato. We just sort of pestered around between the three of us for about a year, and nobody did anything about it because nobody wanted to deal with this ticket. So Kerberos authentication is something that's not usually popular amongst Rubyists. It's this kind of authentication that is used in pretty large enterprises and is usually some kind of policy where they say like every kind of authentication with the database has to be done using Kerberos. And if you ever find a Ruby project needing to use Kerberos authentication, it's usually in the context of one of these big enterprises as part of a suite of technologies that need to meet this certain policy. So it wasn't really a super high priority or demanded feature of the Ruby driver. It was something we had to do in order to comply with standardization across all drivers. So hence the resistance or just the lack of interest in implement. The return on investment in writing the C extension to provide this authentication was not very high. So we spent about a year trying to figure out how we could avoid writing the C extension. We researched a bunch of alternatives which were, there was a gem called G-Stappy which was, it wasn't a C extension for working with the Kerberos library. It was using something called Ruby FFI which allows you to write Ruby code that goes directly into C code. It doesn't really need the glue or the C extension glue in between. And essentially this gem was doing the Kerberos authentication in Ruby code and calling into a C external library directly using this glue called Ruby FFI. I tried using that and I got a bunch of segmentation faults and I consulted with our C developers and after some time and them helping me read assembly code we realized that it wasn't really good use of our time to debug this. It would have been time better spent if I just wrote a C extension myself. The other thing I investigated was Ruby FFI which again was this way of just going directly from Ruby code into a C library and skipping the whole extension writing itself and that didn't really work out either. I got also a segmentation faults and I figured it would be time better spent if I actually just sucked it up and wrote the extension. So I did finally write it. Learned a lot in the process. Wrote it in over the course of about a month like August into September and by that time the PHP team had implemented it also in C so I was able to talk with them and learn about how they actually wrote the C code to use the Sazel C library. And I released it the day before by Ruko Barcelona RubyConf on September 9th and on September 10th, yeah, September 10th I had to yank it. So I'll tell you why I had to yank it towards the end and how you can avoid having to yank your gems depending on C extensions. So let's talk about what it means to write a C extension, why you shouldn't be afraid of writing a C extension and what you can get out of writing a C extension. So thinking about Ruby as an onion in the context of MRI, Ruby is a very high level language, pretty elegant, very expressive, but underneath all of that expression or that simplicity that we like, there's a lot of C code. So we're sort of under water in C code and Ruby is like up there above water and there are these like beautiful floats floating above us. So let's think about it almost like an onion in the sense that we have this like core understanding of how Ruby objects work and how C works and then we provide increasing numbers of abstractions on top of the C code and eventually we get up to a gem that's released that has an extension that we've written sort of at the core of this Ruby red onion. So what Ruby knowledge do you gain in writing a C extension? I'm gonna talk about that. I'm gonna talk about how you actually write the extension, how you work with Ruby objects, how you go back and forth between C data and Ruby data. How do you package it with your gem? And finally, what are the Ruby gem limitations? So what did I learn after having made a mistake with packaging it with my gem? So starting with knowledge gains, resources, what can you use to learn about this besides looking at source code? There are a bunch of extensions out there I know like Nokojiri is one really known gem that has a C extension. You can look at C at other implementation other C extensions, but the core documentation in my opinion that you should look at is the readme.ext in the Ruby source code. So it's pretty straightforward, it's not very rich, it's just text, but it's really clear it has an appendix at the end and it tells you exactly what you need to do as a Ruby developer who's running C code working with the Ruby implementation in C. And the other thing is there are a lot of blogs out there as well that talk about how to write C extensions and there are a couple of really good ones over the last three or four years. None of them are, I wouldn't call, I wouldn't highly recommend any of them, I would highly recommend looking at this readme, which is short and sweet, but really gets to the core of what you need to know. And one line in it that I find sort of charming is when you're writing a C extension, you're adding new features to Ruby. So again, Ruby is nothing but a way to write code that is then implemented in another language. So when you're writing code in that other language, it allows you to do other things in the higher level code. You're actually sort of adding features to that language. I think that's the sort of cool way of thinking about writing a C extension. You're sort of in the belly of the machine playing around with how it works and providing new abstractions to the thing above you. So the actual knowledge you'll gain in terms of how to go back and forth between Ruby and C and C Ruby is, when you're going from Ruby to C, keep in mind that Ruby objects have types. Ruby variables don't have types. C data doesn't have a type and C variables do have types. So how do you go back and forth between these constructs or these data structures? In C code, when you're working with Ruby objects, you'll be handed this structure and you'll have no way of really knowing what type it is or how it'll behave or what behaviors that can actually have what features this object has unless you check the actual type on that object. And there's a certain way to do that. Each Ruby object when you're working with it in C code will have this identifier, this integer flag on it that tells you how to work with it and how to convert into C data. So these are, there are 18 of them that correspond to the Ruby core objects. And you'll recognize these as like nil, object, class, string, array, false symbol, and the list goes on. And again, these are integers. These are like constants that tell you what Ruby objects, what the object that you're working with how it should behave and how you can use it. So when you interact with these data types, when you have this Ruby object and you wanna convert it to C code, you first check the type and you get the integer, so that constant, and then you convert value into C data. So I didn't explain what value is. Value is sort of this like generic C structure that doesn't really have a specific type or specific object class, but that could be any one of those Ruby objects. So when you want to convert into C data or C data into Ruby, the go-between is this value generic structure. So how do you actually check the type? There are two main ways. There's a macro called type where you provided this value, so this generic structure, this Ruby object, and I'll give you back the integer that corresponds to one of the constants in that list that I showed you in this list. And then the other way that you can check is using a function that takes in that value, that generic data structure, and a integer that you wanna check, and we'll throw an exception if it's not that type that you're expecting. So that's how you check the type. So once you know the type, you'll know what to do with the type or how to convert into C data. And Ruby objects that are defined like a string or a false symbol, these have corresponding C data types. So once you have this Ruby object, you check the type, you know what integer it is, what constant in that list it is, you can then convert it into C data. And the way you do that is by using macros like R string. Actually, so on this side, like R string and R array, these are just examples, like each one of these has its own corresponding C data type. And R string and R array are this the C representation of those Ruby objects, C data type. And so when you use something like R string or R array, these functions, they'll take in this generic value, the data structure, and give you back a pointer to the corresponding C data type. So in this case, R string, R array will give you back like a pointer to an R string or R array type. And then you can work directly with those data structures themselves if you want, but it's highly recommended to use functions instead. So there are helper functions for you like Ruby string set length, which will take in the value data and then a length and set the length on that string. You could go into the structure itself and change it yourself, but that's not recommended because there are implementation details of that data type, that C data type that you will avoid clashing with or messing around with if you use the helper functions. And then something like Ruby array entry, this is the equivalent of using brackets with an array. So if you provide a offset, so an index in the array, it'll return to you the object in the array at that offset. And so this is another way of accessing an element in the list. So I can imagine like in a C data R array, you can take that structure and access the list within that structure and get it the element you want, but it's much better to use this function because it knows the implementation details of the C data type R array and can protect you from anything that you might mess up or any misunderstanding that you have. So that's how you go from Ruby to C. Ruby, you have these objects. They have identifiers or this flag, this constant. It tells you how to convert it into C data. C data have corresponding types for Ruby objects that you're familiar with. And we'll talk about what happens when you have your own custom Ruby objects and how you go back and forth with C data. So now we're gonna talk about going from C to Ruby. So how do you go from these variables that have types, these generic structures that don't have types and convert them into Ruby objects that do have types or variables that don't have types? You can take a structure of C data and cast it to a value and then convert it into some kind of user, a function or macro to convert it into the corresponding Ruby object. You can use functions to create Ruby strings based off of C data given a certain length. There are a number of things you can do, but I think the most interesting is wrapping C data in your own custom Ruby object. And that's what I'm going to show you in the next couple of slides. The first two are pretty straightforward. You can look at documentation for that, but the third one is where you running a C extension is the most valuable because you can take C data that you've gotten based on interacting with an external library and turn it into something that your Ruby code can actually use and make use of. In capsulating C data into Ruby objects, these are three lines taken from the C extensions that I wrote that pretty much show you like what you would do if you wanted to create a custom Ruby object in your C code. So I create a variable of type value. So again, this generic data, C data, and then I wrap it using a function called data wrap struct. And what this does is it has sort of this laser, maybe not, it has four arguments. So the first one is the class of the object and I just use a generic Ruby object class. The second argument, I have null here, but what that is is a function you provide that tells the garbage collector when it's executing how to mark that object or mark the objects that that one points to. So if I had in this C data, if I had it pointing to other C objects that I wanted to deal with somehow when the garbage collector came around and needed to mark this one object, I would put it inside that function. The third argument is another really interesting one that I actually do have a function for, sasocon free. What that does is it tells the garbage collector how to free the pointer. So when I'm in this Kerberos authentication C extension, I work with an external C library. I set some variables, I allocate memory, and then I create this Ruby object that's wrapped up and available for me in my Ruby code. When I free up that object in my Ruby code, I need to free up some other C stuff that I've done that the Ruby interpreter has no idea about. And you do that in that function. So sasocon free will take some C variables, so the C memory that I've allocated and free it up because the Ruby interpreter doesn't know about it. And then the last one is the actual C data that I'm wrapping. And then the third line is another really interesting one. And this is how you tell your Ruby code how it can actually access this data now that you've wrapped in a nice Ruby understandable way. This code is written within a function that can be called from my own Ruby code, an instance of a class that I've created. And I've actually defined that class inside my C code and I'll show you later how I do that. And what this does is it takes that, it works with the external Kerberos library. It creates this object, this data, it wraps it up in Ruby and then it says here I'm assigning it to this instance variable that's available on this class to my Ruby code that I will be writing at some later point or using at some later point. And so it's pretty easy to understand. Ruby instance variable set, it'll set it on self because this method is an instance method being called on a specific Ruby object and it's setting this wrapped C data to the variable context and then I can access it from my Ruby code. So again, that's encapsulating C data into a Ruby object and that's that third thing you could do when you're going between C and Ruby that I find the most interesting and it's most customizable and this is where you can have the most fun in your C extension. Okay, so now we understand how to go from Ruby to C and from C to Ruby. How do we actually write this extension and include it in the context of our gem? So include an extension with your gem. There are four things you need to do and I don't find the documentation for this to be that straightforward online. So maybe you could even use these slides as a reference in the future if you need to do this. The first one is you need to write a file called extconf.rb which essentially is a script that runs that checks the environment or your platform in the system, checks to see if the external library that you're dependent on is there if you are dependent on external library and creates a make file that can be used to build and install the extension. And what it looks like is like this. Ours is actually pretty simple. It requires MKMF which allows you to use these functions which finds header. It looks for this header file in your system. It says you have this library. If so, create a make file and then either we'll look at a rake test later that will actually install the extension or RubyGems will do it or if it doesn't find the library, it'll abort. So it won't even get to this step of installing the extension, compiling and installing. So that's a script that you have to write that's like essential for having a C extension. The second thing which isn't completely essential but if you're ever going to test your extension or do anything else with it like either testing building the extension or actually just using it in your own Ruby test you're gonna have to write a rake compile test. You'll probably use rake compiler that's sort of the go-to gem for this and you'll write a task that will build the extension and install it for you so that you can actually use it in a testing environment. That's not that interesting to look at. You can find really good documentation on rake compiler online but you'll have to probably write this task yourself. Then the third thing is the gem spec. So gem spec is the file that you have inside your gem that has all the metadata about your gem. It says like the authors, it says what to do based on different platforms, has all this information and in the context of having an extension in your gem you're gonna have to tell, you're gonna have to specify in your gem spec that you have a C extension or a Java extension so you have to check the platform in there and do something depending on which one you want Ruby gems to look for. So the gem spec will look like this. This is the simplified version of our gem spec. It says if the Ruby platform is Java, it's going to include the jar. So Java extensions function differently from C extensions. I'm not gonna explain it but you already have the binary there and you add it into your files that will be used when Ruby gems go to install the gem but when you're doing C you wanna add the header files, the C files, the Ruby file will be the exe comp file that it'll be looking for in that directory and then you tell it to run the exe comp file when it wants to install the extension as you just basically need to have those two lines inside your gem spec and then Ruby gems will know this gem has an extension, I need to run this file, create the make file, et cetera, et cetera. And then finally, obviously the C code. You need the C code in your gem directory. There are some conventions on where you're supposed to put that and you can find that in documentation online also but in looking at an example of some C code this is taken again from the C extension and this is probably like the most interesting or the most valuable part thing that you need to know about running a C extension. When you want to use, so I wrote a class called C-Sazzle that I instantiate and use within my Ruby code inside the gem and when you write require-sazzle this is the code that actually runs. So this defines the class and defines the methods on that class and other behaviors. So in going line by line, it, the first line creates a structure that will hold the definition of this class. So C, G-Sappy, off. And then within this init-sazzle it'll, when you do require-sazzle it'll look for a function called init-sazzle. So require-x will look for init-x. It starts going through these lines and the first one, Ruby constant get will look for the module-mongo and the second one will look for the module-sazzle and have a pointer each one of those modules and then define the class under the, I think here I'm doing it under the sazzle module. So it defines this class G-Sappy-off under-sazzle and this is the same thing as actually just creating a file with module-mongo-module-sazzle and then class G-Sappy-off. And then the next couple of lines define methods on that class and what they do is they'll define it on that object, the pointer that I've defined on the top and they'll give the second argument is the name of the method on that class and then the third argument is the name of the C function that corresponds to that Ruby method that we called on that Ruby class. And then the last argument is the number of arguments that it should expect would be passed to that method. So this is just a little snippet of code from the thing that you definitely will write if you write a C extension. This is essential for defining anything that will be loaded when you say require X. So this is the C code. So again, the four things you need to do when you package a C extension with your gem is the ext comp file, so that creates the make file. You need a rate compiler task, not necessarily but you probably should if you wanna test your gem. You need the C code and you need your gem spec which will tell Ruby gems that you have an extension and it needs to be installed when you install the gem. So that seems pretty straightforward. I understand how to work with a Ruby API. I know how to write C code. I've packaged it up in my gem and so now I'm ready to release. What you need to know about releasing is back to this story of how I released this gem and I had to yank it. I released it again on September 9th and then September 10th, I went to the airport. I was on my way to Barcelona for Barcelona Ruby comp and my flight was delayed by four hours. I took out my phone to check my email. There's no wifi in the airport so it's one of those tiny domestic airports and I see this ticket saying Ruby driver 111 does not install in systems without Libsazzle and all I had to do was look at the first couple of lines of the error and I knew exactly what the problem is. So it says that 110 cannot be installed on Amazon bamboo because of the following error. It says failed to build gem native extension and then it shows the lines that happen when it goes through xtconf and looks for the header files and then tries to install the extension in the process of installing the gem. And so the one line that told me everything was it says sazzle down sort of towards the bottom fatal error sazzle.h no such file or directory. So when I saw this I knew that xtconf was running so that was fine like Ruby gems is finding everything just fine I put all the files in the right place. What was happening was in xtconf which I have in the next slide it was going through this file and it said I don't have sazzle.h I can't find this header file I don't have this library so I'm just gonna abort installation and aborting installation does not mean aborting installation of the C extension it means aborting installation of the gem itself. So you can't optionally install a C extension inside your gem with Ruby gems which means that the Ruby gem limitation is a gem extension cannot I just said that an extension dependency is a gem dependency. So remember in the beginning when I said like nobody really uses Kerberos authentication with Ruby. So I just released the Ruby driver that has a hard dependency on a C sazzle library that allows you to do Kerberos authentication but nobody does Kerberos authentication with Ruby driver so why am I requiring that? So quickly I consulted a lot of people and I realized that the only solution to this was to put the extension in a separate gem and so what you gain by doing this is you can have a dependency on the other gem or you can programmatically require the other gem. In this case the other gem was called Mongo Kerberos and programmatically if it can't require it if it doesn't find it then I can just say you know what you can't do Kerberos authentication but doesn't affect the installation of the Ruby driver of the Mongo gem at all. So that's what we did we put the C extension in its own gem called Mongo Kerberos and we released it it was pretty easy to do that just a matter of like moving directly around and creating another gem specification and then as I said programmatically when someone would go to do authentication I would do require C sazzle and if it couldn't find it that meant that the dependency on Mongo Kerberos was not met and in that case I knew that I could just like throw an exception to say like hey you can't actually do Kerberos authentication but that wouldn't affect the installation of the driver itself. And as you can see we have 309 downloads and I can assure you 300 of those are a testing environment. But anyway I've learned a lot from that and I hope that if you get the chance to write a C extension it'll be a little bit more clear to you than it was to me when I started out dropping around trying doing things by trial and error because now you know that you'll gain a lot of Ruby knowledge by writing a C extension even though you're writing C code you'll understand what the Ruby language is or the C implementation of the Ruby language is what it actually is by doing this. You'll know actually how to write the extensions of those four things that you have to do and you'll know the Ruby gem's limitations so you'll know that if you have a hard dependency on a header file on the system that better be a hard dependency of the gem you're trying to install itself because if you abort in the extconf.rb file you're going to abort the installation of your gem not just the extension. And that's it. Thank you.