 Hi everyone, I'm going to talk about my experiences in wrapping machine learning library, C++ machine learning library in Python. It's a bit of a negative thing, it's more post-mortem, but I think there's something to be learned here. So what is it? JML is a really high pre-performing machine learning library. I'm going to talk about the innovation cycle, which is how I use that to be really effective in solving machine learning problems. Now talk about my requirements for Python wrapping. Why are it difficult to encounter some recommendations or what I would do if I was to start again? So here's a link, it's on the big bucket. I say it's a high modular efficient machine learning framework. It uses very idiomatic C++, so it uses boot libraries, templates, functional objects, things like that. Turns out that those things are really hard to wrap. It's very modular, so a lot of problems you solve not by taking the high-level interfaces, but by plugging little bits and pieces of it together. It's very, very fast, multi-threaded, it's vectorized, it offloads things to the GPU where it can, things like that. And it's also highly memory efficient. For me, the kinds of problems I solve on, which are big data sets of machine learning, it's really the key to my productivity. Now I've used it for various things here, it's more like bragging than anything else, but it has been used in some real situations. So here I talk about the innovation cycle. As more a researcher, this is basically what I spend my time doing. I have an idea. I have to test it somehow, so I write some code. I then have to run the experiment. I wait for the results. Once I get the results, I have some ideas on analyzing the results to see what I do next. For a lot of the data sets that I use, they're really, really big, so I spend a very large amount of time waiting. I also don't necessarily have a massive cluster of my disposal, so I have to be careful about that. And my productivity there, it's a bit of a generalization, but it depends really on the amount of time it's been coding, the amount of time it's been waiting for results, and the amount of time it's been thinking. For a lot of machine learning problems, the waiting time absolutely dominates all the other. So that's really what you want to reduce to improve your productivity. And this is where JML increases my productivity. So we'll be talking about the GitHub contest. That was a contest to design a recommendation engine. It took me about 10 minutes to run. I came in second place in the competition. It took me about 10 minutes to get the results, which meant that in a day I could do a lot of experiments. Other people didn't necessarily have that. That's a quote from a guy who's using Ruby. He had to basically start up a whole bunch of servers on Amazon to get going. So with that waiting time taking care of what about, how can I improve the coding time? That's where I want to use Python. So what I want in my rathons, I want it to be idiomatic in Python language. I don't want to be able to crash the interpreter. I want to have a natural feel to it, and I want it to expose the full power of the underlying modules. I don't want to just write C++ code in Python. I encountered a lot of difficulties. The biggest ones were mismatches between C++ idioms. For example, it's easy to make something which looks like something was iterable in Python. The problem is you have to understand that underneath is C++, which has really strict requirements. If you don't meet them, you get a crash. There's also complex ownership, like shared points, it's very difficult to deal with in a rath and a binding. C++ data structures are clumsy in Python because you have to always convert them back and forth, and templates are really difficult to handle too. What did I use? I tried with Swig. It was good to get started, and it does automatic stuff for you, but it's buggy and it's full of black magic. If it doesn't work, there's no way to know why. It just doesn't. It creates enormous files as well, like 45,000 line C files. It ended up being unsuitable for the job. So after I moved to Boost Python, it's powerful, has good syntax, manipulate Python objects, it's flexible, and most important is the explicit. It does what you ask it to. But it takes a long time to compile and it's buggy, and it's really difficult for distributions to package. Despite this, I feel like Boost Python is the best choice. So recommendations, if you wanna wrap some C++ code, start small with a really narrow interface. Write an interface module in pure Python. Don't try to just do one single binding. Have outside binding, which is what people see in an inside binding, which is your helper classes, which is written in C++. Construct internal helper objects in Boost Python. You can increase the safety of things like generators. You can manage references, you can do type conversion, things like that. And think about binding when you're writing the library. Create safe iterated classes, so instead of having a point that you use an offset, then it can't crash. Don't expose complex data structures as part of the interface, expose a functional interface instead. Minimize coupling within the library, and functions are much, much easier to wrap them in classes. That's all. Thanks. Thank you.