 So, this time I am going to talk about an introduction to the simplified wrapper and interface generator commonly known as SWIG. So, if you say SWIG to somebody they know what you are talking about usually at least in the Python world. Outline of my talk is basically like this I am going to introduce you to SWIG what SWIG is what it does and I am not going to get into the details of how SWIG works internally. Instead I am going to focus on how you can use SWIG and then focus on a specific detail called the interface file which you understand and give you a bunch of examples about how you can actually do real life things with an interface file. As we saw in some of the examples which are excited writing things in pure Python is ridiculously slow especially if you are doing for loops. But the nice thing about Python is it lets you prototype your algorithms extremely rapidly. So, in half a day you are finished writing your core algorithm, you know how it should look, you know how it should be written and then it is time to say okay let me write that in C and speed it up and we saw that you can actually get a thousand fold speed increase of the order of if you go to C directly. So, how do you know go about doing this? One way is to use weave but often it is not enough supposing you have a more complicated situation. So, in my case during my PhD I had to build a particle based solver and it is absolutely not clear as to how you can do things with arrays. You need an object oriented hierarchy, you want to do things where C++ is probably better suited. So, that is the premise I started on first. So, I went ahead and built a huge C++ library in respect of being an engineer and not a computer scientist. So, it ended up being something like 30,000 lines of C++ code and then I found that just doing experimentation, figuring things out was just so inconvenient with C++ that I had to wrap it to Python. And I just started using Python. And to do this it really helps if you have an automated tool like F2Py. So, think of Swig like F2Py but which can actually deal with a really complex language like C++. So, I was using fairly sophisticated features in C++ like templates. I was using containers, I had inheritance trees and I could derive classes from Python. It's a large code base so I had lots of documentation. I had written the documentation in the C++ code. I wanted that documentation to be reflected in my Python code so that if people are scripting it, they should be able to look at the docs. So, I needed all of this and hand wrapping codes, we've not covered that today is non-trivial. It's a lot of effort. So, what Swig does is it gives you this approach by which you can actually take a C library and glue it to Python by building an interface, which is basically an extension module. It's relatively easier to use. You have to learn this how to build a Swig interface file. But it does let you wrap C and C++ libraries to various target languages. And it's open source. It supports at the time when I made this slide 11 different target languages. Guile, Java, MZ Scheme, I don't know what that is. OCaml, Perl, PHP, Python, Ruby, TCL, Chicken. I don't even know what Chicken is. It's a language they support and C Sharp. There's a lot more. So, I think there's a Lua target language. So basically they have a generic way by which they pass C++ and generate wrappers for different target languages. And the Python version, which I've highlighted here, is very mature. You can do really lots of neat things with Python wrapping. The chief architect is David Beasley. And the code base has actually been around since 1996, so it's been there for a while. The initial versions didn't have great C++ support. And in 1.3, which is the current development version, they call it the development version, but it's really quite stable. Things change. It's a development version, but it really works and it works on really large code bases. So I have like 80 classes and things like that which just wrap and build, no problems. The nice thing about SWIG is it's implemented entirely in NCC and the sources that it generates. The wrapper codes that it generates, I'll explain what that is in a bit, are implemented entirely in NCC. So which means if you have really complex code and you're generating a wrapper, it's not using any sophisticated C++ feature which takes hours to compile. So if you actually have very sophisticated C++, if you're using template metaprogramming and things like that. It could take you an hour to compile some bit of code. Because it can get so complex that the compiler has to figure out way too much. Whereas SWIG generates pure NCC, which is a lot easier and a lot faster to compile. It's also highly extensible and configurable using what's called type maps, which I will not be covering in this talk. So basically the way it works is in your C language, you have basic types like integers, floats, character strings, non-terminated character strings. You have structs and classes. Now all of the basic types SWIG handles tries to handle it natively. But anything more complex than that, it deals with it as a pointer. So I'll come to that in the next slide. But the basic features is it lets you actually deal with data types, structures, classes. It will let you deal with pointers, references and smart pointers to an extent. It will deal with functions. It also deals with function overloading pretty well. It supports inheritance, which is pretty amazing. What you can do is you have a class in C++ and you have a virtual function for that class. And now you can actually derive that class in Python. Which means if you pass a derived object from Python to a C++ function, it'll actually call the Python code, which is very neat. So if you want to do something and test something out, you can do it. It supports function and operator overloading. It supports templates. It supports C++ exceptions. It also supports some of the standard library containers like vector, map, and it also supports NumPy arrays. It basically, there are various wrappers for NumPy array support. And what Bill did was he put them together and really made it nice and had some very good documentation as well to it. So basically, we have a C function. We'll look at an example later. You can actually send it a NumPy array. You can do things like that. So it's extremely convenient. It's available on most Nulanox distributions. There are, it's available on Mac parts for the Mac. N-thought eggs for the Win32 are available. There's, it's very well documented. Huge amount of documentation. Okay, so with all that done, how do you use Swig? So first, let's say we have some library called Lib A. So let's say you built a bunch of functions. Let's say you've implemented sign. You shouldn't, you shouldn't do that. But let's say you did, you implemented sign, you implemented a root finder, you implemented some derivatives. You did some calculations. And you see that some of these things can be reused. Let's say you have sign and cosine. You have lots of other code that's going to use sign and cosine. So you want to put that together in one thing where you can reuse it. So the way you do it in C++ is you build or C. I'm going to use C and C++ interchangeably, which is not right, but. So you build a header file which declares your functions and your classes. It basically says this is what it is. These are the, this is the kind of API that I have. And then you implement the details of each of those functions in a .c or .cpp or .cxx file. Now, you don't want every time somebody who wants to use your functions to have to build your code. So you want to give him something that's reusable. So what you do is you build a library. And typically you build what's called a shared library. A shared library basically has all of this code that you've written that's compiled into object code and put together inside what's called a library file. And typically a library file under Unix looks like some lib something.so. So it stands for shared object. You also have what are called static libraries called, then they typically look like lib something.a, which is basically an archive file which has all your object files. Now, somebody comes along and wants to use your library. What you do is you take his header file. You include the header file. You all know what including header files and see it, right? So you include the header file code against the guy's API. And now you need the object code. The object code is sitting in the library. So you link to that person's library. So the way in which you make reusable pieces and see is you build libraries. You give them header files and you give them library code. Then you write an executable or some other library that links to this particular library and uses the header file. So is that clear? So now let's say you have a shared library lib a. Now what suite does is it parses the header file. That means it looks at the signatures of your functions and your classes. And it uses for help an interface file. So the interface file basically guides the wrapping process. A lot of it is done automatically but you can help it along. And then it generates two files. It generates a Python interface and a C++ or a C code which actually wraps your Python, your C, your library. So it builds a piece of code that creates what's called a Python extension. You did extensions, right? So basically the extension is something that can be imported from Python and used using a Pythonic interface. Then once you have the arap.cpp, you build it into a Python extension. And I'll show you how to do that. And then all you do is you import the generated Py file, the Python module. And that's it. You can call your code. It's as straightforward as that. And usually what people do is you put all these four steps into things like make files or a setup.py or an Scon script. So let's look at the details. So as I said, your library is sitting here. You have some a.cpp and a.hpp, let's say. And you've built a shared library called lib a.so. We'll look at how you do that. But let's say you have it. Now you take swig, give it an interface file sitting here, the green guys. And swig then passes the header files, generates a wrap-up code. The wrap-up code in turn is using your header file. It's a C plus. It's a C program. And swig also generates what's called a shadow module called a.py. Now use it and build this a.wrap using what's called the Python C API. So Python basically has a C API which lets you do things from C with Python. So this arap.cpp that's generated by swig uses Python C API to manage getting imports, returning values, doing all of that stuff. So you use GCC or whatever compiler to build an extension module. So now you just forget about this entire side. In your Python side, all you do is you import the shadow module. The shadow module imports the extension module, which in turn is linking to your shared library. Swig looks at header file, generates arap. Arap is built to give you a underscore a.so, which is an extension module. It's not a shared library. Python then imports a.py, which is a shadow module that swig generates. It's automatically generated. And this guy imports the extension module and gives it a nice pythonic interface. You can use a. underscore a.so. That's fine, but it's not a very clean interface. a.py gives it a clean interface, which underneath is actually calling your library. So now we have a library and a header file and you actually expose that into Python. Okay, so how do you do this? So let's just do a quick GCC primer. I'm only going to cover Unix because the examples today are, the lab session is also only on Unix. But is the Win32, Sigwin is similar on Win32. So if you have Sigwin, it's pretty much the same. The MinW is the same. Okay, so MinW is what they use. And if you use the n-thought edition, it comes with MinGW. So and you can build it pretty much the same way. But I'm going to do the details here so you understand how it actually works underneath. A lot of this is made easier by using what are called setup.py files. But I'm going to sort of expose the underneath just for this part. Basically, you use GCC to build object code from C code. And the typical options that you'll see are minus C, which is to compile object code and not build an executable. So when you build a C file, usually it expects something called main. It's a convention. Basically, when the program starts, it jumps to main and starts executing from there. Whereas when you say minus C, it does not expect that. It knows that it's supposed to be library code that's going to be compiled. It's not expecting anything. It's not going to generate an executable. It's just going to generate an object file, which can be bundled together to generate a library. You can optimize the code. GCC has a whole bunch of optimizing options. I would seriously recommend that you're going to use GCC. At some point in life, you should do main GCC. There's a huge amount of information. Just look at it once so you know what's there. There are various options you typically use minus capital O2. If you're building a shared library, it's a good idea to add the flag minus F, capital P, capital I, capital C. And when you build a shared library, you use minus shared. Don't worry, I'm just listing out the options. We will do an example. Now when the C compiler needs to compile your code and you are including other people's libraries, the compiler needs to know where to find those libraries. So the compiler is told where those libraries, the header files are by giving a minus capital I flag. You give it a directory. You can only specify one directory at a time, so minus I directory, minus I another directory, so on and so forth. And GCC will go to those directories and hunt for headers. Minus L is similar with libraries. So when you're linking your application or your library, you need to know where the other libraries are. So you tell GCC, hey, the libraries are here and this directory. And so you use a minus capital L directory and again, it's a one each. Finally, when you wanna link to a particular library, you say minus L library name. So let's say you have a library called libmath.so or libmath.a. You will say minus L math. So let's say how we build a library from a simple A dot CPP. So you first generate the object file, you say GCC or G++ minus CA dot CPP. Give it whatever flags and minus O will generate an output file into A dot O and then you build a shared library. You say G++ minus shared, give it the object file and stick that into minus O lib A dot SO. And this flag minus shared tells it that the output has to be a shared library. A shared library has a specific format. So the instant you tell GCC to do this, it does the necessary work in order to get this done. And when you link it, you need to specify the library paths and the minus L lib name. So again, using a make file or S cons makes life a lot easier or a setup.py. So now you have the lib A dot SO, which is your library and you want to now use Swig on it. So let's say you built an A dot I interface file. We'll come to that next. All you do is you say Swig minus C++ with C++ code minus Python, which says Swig, please generate C++ for the C++ code. Generate Python wrappers. So if you had minus, if you want to use TCL wrappers, you use minus TCL. And any other extra Swig options which you can find using Swig minus help minus O, the file you want to generate and the interface file. So I'll just recap. You have a library, C library that's consisting of a header file and a library. The library can be built using GCC as I just indicated or anything else. And now you have an interface file that lets you interface this library into Python. So now you say Swig minus C++ minus Python, any other options for Swig. Generate the wrapper file and the shadow class, the shadow Python file using this A dot I. Then you get a wrapper file. You build the wrapper file again using this approach where you say again GCC minus O2, platform independent, and you point it to the Python headers. So when you build the wrapper, the A wrap, it's basically calling into the Python C API. So Python has its own header files and you need to point to those. So this basically tells you how to point it to that. So minus I, that will point it to that directory. And then you generate the wrapper code and you build the extension. Building the extension is exactly the same as building your shared library. Very similar flags, very similar. So it's just GCC minus C++ minus shared minus O underscore A dot SO. The underscore A dot SO is a convention and you must stick to it when you build Swig. Okay, and then the user simply imports A dot pi and that's it. Okay, so what's the big picture on this? Like a recap, plus a little bit of internals of how actually Swig does this job. So first thing is Swig preprocesses all input. So Swig interface file will actually preprocess, C has a preprocessor, you are aware of that? You can do FDFs, you can have conditional code using the preprocessors. So Swig has its own preprocessor. It will process, preprocess your header file so you can actually tell it to define certain things and undefine certain things if you want. And then it passes the C++ declarations that you give it either through a header file directly or explicitly in the interface file. And then it keeps an internal representation of whatever you have given it. So this is very useful when you're trying to do inheritance and things like that because it keeps track of it and lets you do all of that properly in Python and then has to generate the suitable Python shadow class. And then it generates the wrapper code for the target language. So basically it passes it, keeps a representation and depending on which target language you want it's gonna generate that suitable code. So it's very nicely designed that way. So what it first does when it generates Python code is it converts all the basic types to equivalents in Python. So if you have an integer, it converts it to the suitable integer type in Python and it has the basic code for all of that. Now the trouble is if you have a class or a pointer there's no native representation of these in target languages. So what SWIG does is it basically replaces each of these by a pointer, an opaque pointer which basically means you don't ask what's underneath. Uses wrapped at pointer the address of that pointer and pass the address of that pointer into the target language. Whenever you need this pointer SWIG has code which will take that pointer, unravel it, get the actual pointer and pass it along to the C library. So the C library doesn't care because it's getting the pointer. Your Python code doesn't care because it's just dealing with some string of sorts. And that's it. It used to be a string, it's no longer a string right now. You have pointers and pointers are non-trivial to express, you can't express them in Python. There is no pointer in Python. So what SWIG does is it converts anything that's not a simple basic type in Python to something like a string internally. And when it calls from Python into C, it takes that, unravels it, gets the actual pointer and sends it in. So basically what they say is everything else is treated like a pointer and it works. So basically as I said SWIG encodes the pointer into some form in Python like a string with an encoded address and passes it along that along to any C or C++ functions. The shadow class is responsible for making, giving you a nice interface into how to use it from Python. Now in all of this, let's say you have a NumPy array. That NumPy array you want to send down to C. So how does SWIG know how to translate the NumPy array into something that's equivalent in C? The answer is it doesn't know. So what you can do is you can customize it by writing explicit code to say here is how this type is to be translated into Python. And here is how the Python type is to be translated back into C. Now writing this is non-trivial. You need pretty solid C and C++ and the target language know how. But the good thing about SWIG is most of the dirty work has already been done. So if you look at NumPy wrapping, somebody who's already gone and created. Buildspots has already gone, collected what was existing, polished it, improved it and just created something called a NumPy.i, it's another interface file which automatically does all of this for you. And you just have to say use this, use this, use this and it'll use the suitable wrapping, figure it out for you. So basically it's very customizable. It's customizable by you and you don't always have to do it because most of the times it's already taken care of. So what I'll do now is I'll take a bunch of explicit examples. I'm not gonna get into how you write an interface, how you write the type mapping code which does the mapping of types. But I will explain, take specific examples and how you translate that into SWIG. SWIG is huge because you can imagine C++ is a very complex, large language. And SWIG basically can wrap it, can parse it. So it's a very complex piece of software. It has a huge number of options. So I would recommend that you treat this as just sort of introductory primer on SWIG so that you can get started. And if you really want to get your hands dirty with SWIG, please read the SWIG documentation. It's well written, it's replete with examples and it covers a lot of ground. It's not, I won't say it's entirely complete with respect to type maps and things like that. But for the basic user, it's really, really good. So basically interfaces, as I said, control how SWIG generates the wrapper files. So SWIG preprocessor, preprocesses all input files as I said. And you have a bunch of options for that. But SWIG itself has its own preprocessor which is more powerful than the C preprocessor. So which means you can actually define macros in SWIG. So a lot of their internals is done using macros. But basically in an interface file, everything is controlled by what are called directives. And each of these is prepended by a percentage symbol. So here are some common directives that you'll see all the time. So if you put something like this, the code here, percentage header, percentage open curly brace, you can inject any code, C code there, C++ code, which is injected into the wrapper's headers section. So if you want to declare something, you want to include some files, all of that can be done at this point. If you don't put a header, you just say percentage curly brace, percentage close, then it's the same as header. It's just a shorthand for header. The other thing is sometimes certain Python modules need some specific code to be called when they're imported. So if you look at NumPy arrays, they need some setting up that you have to do. And that is usually encapsulated in a single function or a few functions. So in NumPy, you have to say import array all the time. So there's another header section, there's another section directive called percentage init, which will make sure that whatever code you put there gets injected where the module is going to be initialized. So you have a header section where it goes in the headers, you have init which goes into module initialization. The other common directive is inline. So if you look at any of the Swig examples, there are a lot of them, they have about hundreds of tests. They run a test suite and all of them will use inline. Inline basically lets you put C++ or C code as you see here and parse it. So there are two steps. One is the header file which goes in the headers section which basically is the declaration of your functions or your classes. The other is Swig parsing those declarations. So basically you have a header section. So when you're compiling the C code, you need headers. So things need to be injected like include so and so, needs to be injected there. But you also need Swig to know that I want you to parse this code explicitly. I want you to generate a wrapper for this code. What inline does is it does precisely that. It injects it in the header and also parses it. So it's convenient. And it's often used to generate what are called wrapper functions. So here's the simplest C++ example code. So here's a simple library. It's a stupid library. I say if and def example. This is just what's called an if def guard which means the header file will not be included twice which is a problem. If you do two definitions of a function, it's an error in C. If you have long fact const long n, if you have another long fact with a long argument, it's illegal. So this if def guard is something you'll commonly see in well-written libraries will basically prevent this from occurring. You'll never include it twice. So if two guys are including the same thing, it will not be re-included. So this is example header file. The bottom one is the example C++ file which is just a silly implementation which just returns the factorial in a recursive manner. Very straightforward. So this is how you wrap it. It's as simple as this. You say module is the first directive. It says the name of the module to generate will be example. So it will generate an example underscore wrap.cpp and an example underscore example.ty which is the shadow module. And you are really interested in the example.ty. Then you're saying in the header section you include the example.hpp. And then you're saying include example.hpp here. What include does it takes the contents of the header file and parses it. Instead of doing the include, you could have just cut and paste this. You don't even need all of this because you don't want the if defs. You could have just cut out float my constant and long fact const long and dumped it here. You don't have to use the include. Swig will treat it as if you want this to be parsed. That's it. And now you compile it, generate the extension, import example, it will work. Now this is extremely simple. So now you want to complicate things. So again, I'm going to do very basic things here. Look at the Swig documentation for a detailed guide. So it's easiest. Usually it does work, especially now. Swig has matured quite a bit. You just say include header and it usually works. Now if Swig hasn't passed a particular structure, it treats it like an opaque object. So let's say I have a header. The header is referring to some object, which I'm not wrapped. So let's say I have something that deals with file pointers. So file, capital F-I-L-E is a C structure to deal with files, right? And let's say I'm not going to sit and wrap the file structure, but my code happens to use file pointers. It doesn't matter. It will still work. So if you say fopen, fopen will return a file pointer. Swig doesn't care. So Swig will return some wrapped pointer and give it to you. You can't do anything with it in Python, but pass it back. But you can pass it back. And once you pass it back, Swig knows how to deal with that and send it in. So this will work. So if it's not something that Swig has passed and it knows the data structure, it will treat it like something that's transparent. And it'll just pass it along between here and there. So it works. Any global variables, global variables are a bad idea, but if you do have them, they typically go in the module, if you have my constant in the previous case, you'll have example.cvar, which is all the globals.myconstant. A better option is to declare it as a const. And then it just becomes a const, which means you can't change it. And this guy, it's a const and it doesn't go in cvar. It's not a variable. Classes, inheritance, all of these work as you would expect. And if you have operators, supposing you have the indirection operator, there is no equivalent in Python. You know the indirection operator? Less than, less than, greater than, c++. You can't quite wrap them to Python. So you can ignore some of these, but if this is an ad or it's a get item, okay, operator bracket, I think, right? That will give you the, give you an element inside a container, for example. That would be wrapped automatically to get item in Python. So it takes care of all of this for you. It also supports c++ namespaces. So let's look at a bunch of simple directives and wind up with that, some common directives. Rename will let you rename function. So let's say in C, you may define a function called print, but print is a keyword in Python. So you have a problem. So if you use the rename directive, you're renaming the print that's seen to my print. Now the note, the important thing here is, you must only include your headers, these guys, after you've done this. Because once it's passed, it's finished. You can't do anything about it. So in actuality, this should come above this. The other thing you commonly need to do is, c++ doesn't support multiple arguments. You can return a pointer. But in Python, it's natural to return three integers. So there is a built-in set of type maps in a library interface file called type maps. And they define what are called input and output type maps, which let you do this. This is how it works. So let's say I have two functions, get size, which is basically just setting xs and ys to 100, 200. And you wanna return this in one case. So in this case, both x and y are gonna be changed. In the other case, it's just returning something. So if you just include type maps.i, so if you wanna include a library, you just say include that thing.i. And then you can, what's called apply a type map. So you say apply int star output, which means apply this specific type map that's defined in type maps.i to the arguments you see like so. So wherever Swig sees int star xs, int star ys, it will automatically inject the type mapping code in order to deal with this as an output. So now if I actually, in my Python wrappers, if I called get size, it will return a tuple. It will return the xs and the ys because it knows that you have explicitly said these two guys have to be output. So it'll automatically output, so it's very convenient. So you can wrap functions like this and make it generate output. In the other case, supposing you have the sub function, subtract, you want it to be passed without creating pointers. You can't create a pointer in Python. You want to call it with an integer, right? I want to call the subtract function with two integers. But the c++ or c function expects a pointer. So this input type map, the instant I've declared something as input in this case, like so, it assumes that the sub function is given two integers. So it'll write the wrapper function underneath for you, such that when it is called from Python, it's just given two integers, not two pointers to integers. Then it takes those integers, creates the pointers, passes it down to the underlying library. So this is again a very handy type map. So basically, type maps are the following fashion. You have a particular signature. You have a particular thing like int star xs, int star ys. And then you've defined the type map of how to handle the int star. So all you're doing here is you're saying, apply the output type map to these two. And here, I have just declared this. This is in the header section. This over here is in the parsing section. So the parser just is told, there's a function called sub, which is expecting two integers as input. You convert them to pointers underneath. It takes care of all of that. If you want to do overloading in C++, you use what's called the director feature. So you have to turn on directors at the module level, and then turn on director for everything. You can turn on directors for every object, or you can generate them for specific objects, or for explicit virtual functions that you want to wrap. This will take care of that. It handles templates. The only thing you have to worry about templates is that you have to explicitly instantiate the template. So for instance, if I have a pair template here, I have to make it, I have to create a pair ii, which is basically pair intent. And in Python, I simply say import example as X, P is equal to X dot pair ii, and pass it to integers. It actually uses the instantiated time. So it basically works. It also deals with vectors, STD vectors, and you can pass lists, and they behave like lists. It also deals with exceptions. So if you have code that explicitly throws exceptions in C++, Swig will take those exceptions, convert them into Python exceptions, all for you, and do it. So you can actually catch exceptions from your C++ code. It's not going to segfault and kill your interpreter. You can also document your code. So you can actually pass documentation strings that will be injected into the shadow file. So when you say f fact, and you're going to type something out, you're going to do fact, question mark, and get help, this help will be shown. So you can add that also. So basically, your wrapper file lets you build lots of complexity into the generated wrappers. Finally, there's numpy support. Again, it's the same thing. You have what's called a type map, and the type map here is called in array for input arrays. And then you have something called in place array for arrays that are going to be changed in the C and being used back. So basically, for a numpy support, you need to basically define, in the headers section, define this, declare whatever headers you want. In this particular case, I'm considering one single function. You need to include the numpy.i interface. And then in the init section, as I showed you earlier, you have to do import array, which basically sets up numpy so that it can be used. Now you just need to apply the type maps declared in numpy for your signature. So all you're saying here is, this is going to calculate the RMS value of some numpy array. You simply say apply in array int dimension to this signature, double star seek int n. And that's it. So now the swig will generate a wrapper. Once you compile it, you can actually call this with a numpy array. And it will work on that numpy array and return a double position number. So basically, we've just done a very, very, very brief. As you can see, swig is pretty complicated. But for simple things, it's very easy. And you can actually just gradually build your, the best way is to take one thing, write a function, wrap it, and you say, okay, yeah, things work. Now add one more, keep adding. And then say, okay, let me throw my whole header file at it. Some things may not wrap. And you say, okay, let me improve it. So it's easy to work with it in an iterative way once you have the basics. But this should get you started. Read the swig docs. Pretty good. Thanks.