 Thank you everybody for coming to my talk and thanks to the conference organizers for all their hard work and for accepting my proposal to give a talk about computer vision using Ruby and LibJIT. Okay, a footnote, two days ago I conveniently brought my glasses, so if I cannot recognize some of you at 1,000 meters distance, please don't be offended. So, most of this work was done in context of the EPSRC nanorabotics project, which is a research project in the UK, and it was about implementing a vision-based closed-loop control in a transmission electron microscope, and the work is all released on RubyForge among other websites under the name Hornets Eye. The link to the slides, for some reason, is not visible. Sorry about that. But I put the link on the Ruby website as well. So, while I did this work in context of the nanorabotics project, the intention is to write a general purpose machine vision library. So, here are a few example applications of computer vision. I don't have time to go into the details, but what I want to show today is a bit of augmented reality because many people get stuck on doing two-dimensional computer vision and because it requires a bit of a step to go into 3D, but once you have taken the step, it can do so much more. It's very interesting. By the way, the slides, I'm changing them using a camera as well. So, this presentation software is all written in Ruby. So, when my hand comes here, I can choose slides using my camera. And because I'm nervous and my hand is shaking, so I also added some functionalities, so I can just do this. If I do this, it'll just go one slide forward, like this, go one slide backward. All right. But computer vision means array operations. It's really like this. A lot of it is done using array operations. And as you can see, even the recent version of the Ruby virtual machine is still 30 times slower on this particular machine than the recent version of GCC. So, this is an example of operation creating an index array with 10 million elements. So, you may think maybe using Ruby is not such a good idea. I don't know who, if you've seen the movie, but if you've seen the movie, I know what that red light means. So, I said, okay, it doesn't matter. I'll do it in C++. I'm very good at C++. So, I wrote, for example, a plus operator for the boost multi-array class, which is a class of choice if you do n-dimensional arrays in C++. And actually, it turns out it's quite a lot of code just to implement a stupid plus operator, an element by plus operator because it turns out the very data types which are most important for your work are arrays and integers and floating point numbers, and all these types are basic types in C++. So, they have the worst support from the language. And if you do the same thing in Ruby, you can with less than 10 lines of code, you can have a working version of an array operator already which runs on multi-dimensional arrays. So, the trick is in order to combine performance and flexibility, I use the approach similar as the NRA library that you basically implement a Ruby extension providing uniform arrays. So, arrays where each element has the same data type, for example, a duck. But I'm not using NRA. I'm implementing my own extension because, for example, NRA is written mostly in C and it's, for example, not easy to add a custom element data type. Now, I'm going to show you how this is done. It's actually not very difficult. The core of the library is very small. So, you start by defining a C extension which defines a class, a malloc class. And this malloc class, you use it to, for example, allocate 10 bytes of memory and then you can write a Ruby string to the memory or you can read back five bytes of memory and get the content as a string. And you can do pointer operations like M plus two and if you read five bytes, then it will ignore the first two bytes. Of course, you have to be aware of the problem that this is not safe against buffer overflows. So, if you read 10 bytes after increasing the pointer, it will crash. So, you have to reintroduce boundary checks at a later stage but this is part of the thing which saves a lot of time which makes a lot of performance. And so, you can use this class to define a native data type in Ruby. So, the constructor, you allocate four bytes of memory and then you have a set and a get method. I don't know, maybe some of you were in the talk earlier on. So, you just use a standard Ruby library methods, array pack and string unpack to convert a Ruby integer into a four byte string containing the native representation of that integer and back. Then you can write a native integer and on the right side of the slide is some example used where you set the, you create an integer with initial value three and then you set it to five and you can read back the value and you can make all this, you can do all this in a more generic fashion. So, you can make a more generic integer class, a parametric class if you will which has parameters, a number of bits and signed or not. And then you create a method which creates classes inheriting from that class and on the right side you can see the purpose of this. So, I say, for example, unsigned short end is an integer with 16 bits and unsigned and if you have taken care about choosing the appropriate descriptors and memory size and whatnot, then it will behave as expected and you can start using this integer. Also, what's useful to implement is some means of having several Ruby objects viewing the same memory location. In the same fashion you can implement static arrays. So, in this case you have a parametric class with parameters, number of elements, sorry, element type elements and number of elements and stride, stride you can assume to be one at this point. And selecting an element basically means taking the pointer, adding high times the size of an element to it and then creating a view which is viewing that part of the memory and the views of the element type and then you can use this to implement the standard Ruby array operators and on the right side is some example code where I say, okay, a short end is a 16 bit signed integer and then I create an array with eight elements of that type, I fill it with the default value and then I can set and retrieve elements as you would expect from an ordinary array class. And if you have implemented everything properly this will work multidimensional straight away so you can create an array of arrays. So you say in this case a 32 bit integer array, three times two array and if you implement some inspect methods to do some pretty printing it will pretend to be a multidimensional array and you can also improve your methods for selecting elements so that they will accept multiple indices or even ranges as indices in a similar fashion as the NRA library provides. In contrast to NRA library this will create a view and that memory not a copy actually. So now we have arrays and the next thing is to define operations on them and I will show you how to implement, how I implement unary and binary operations of course there are more array operations than this. So we start with scholars and I just define a method called op and the method passes its arguments to instance exec, just calls instance exec nothing more and then you can already use this to define a negate operator. The negate operator just creates a new object of this class and then it calls op with ruby value of this native element as a parameter and the operation you execute with instance exec just does set minus x, so set to the negative value and that's all and on the right side you can see an example where it works as expected you create a short integer with value three and minus s will give you a short integer with value minus three and then for arrays you also define the op method and it simply loops over all the elements and calls op recursively and if one or more arguments are arrays as well you also iterate over the elements of that array and if you do it like this you can actually implement the negation operator in exactly the same way as for the scalar and on the right side you see some example usage where I create an index array with values zero, one and two and if I negate that array I get zero, minus one, minus two so this is all nice and good but it's still very slow so how to speed up things if you look at the title of my talk you can already guess that we're going to use a widget to do this and I also use a reflection to do this so I use a reflection to compile a code to interpret code so you start with a... this is an example illustrating this where I use a smaller class which uses method missing to record what methods are being invoked on an object of this class and on the right side you can see some example use so I create two objects of this class A and B and you can see if you do operations on them it will faithfully record what's happening of course Ruby reflection has some limitations so you have to restrict yourself to a subset of Ruby when doing this so you can see there are a few operations where you won't be able to distinguish what's happening and then you integrate JIT like this so this is for scalar operations we implement a new op method and the op method will first check this self and all the arguments are there all JIT supported and if yes then we call the JIT compiler and give it all the variables, self and the arguments and the JIT compiler will convert the variables to JIT variables and call this a code block which I'm passing here and in the code block I just call the method itself again so I'll enter again at the top and because JIT variables themselves are not JIT supported I'll jump into the second case this will call the old implementation of the op method which will call instance exit in turn and to understand what happens you can for example implement a negation operator and add some print statements to it then you can see that when executing the operation you actually don't have ordinary Ruby variables but you have actually JIT variables so you have self is an integer with the value the machine register I1 is pointing at and the parameter X is another register and when you do set minus X with this variable then instead of doing computations it will generate machine code so it will generate the code for negating and writing the value to a new register and then code for writing the value of that register to the memory location I1 is pointing at and this is actually inspired by Ruby LibJIT which is another project using the LibJIT library and for arrays it gets a bit more complicated again you initialize the same way you check you get JIT support you call JIT compiler, you call yourself and then you jump into the second case and then we check are we JIT compiling? Yes, then I first get a reference to the JIT function and then I create a copy of the array pointer I compute the strides and I compute a pointer to the end of the array and then I create, this is a bit complicated I create two list of functions for the arguments to process the arguments so if the argument is an array I will add two functions one function for incrementing an array pointer and one function for extracting an element and if an argument is a scalar then I just add a function for returning that scalar and then finally comes the loop so in the loop I simply extract all the current arguments I call up recursively and then I increment all the pointers and I do this until I have reached the end of the array and here are some, here's the illustration so you see that very similar with the scalar operation it's almost the same only the register indices are higher this is because while we are generating the same machine code using the operation the library will put some looping code around that operation so you will end up with a machine code to do an array element-wise operation and here are some performance results they vary greatly depending on the machine you can see it's still sometimes six or even worse, six times slower than C++ but if nothing else helps I will just implement a small C++ code generator to CCC to beat C++ with C++ it's no problem and to the comparison I also added the NRI library so we should at least be able to reach the NRI library's performance with this concept and now I want to give an example of what one can do with the library so the question is we have an image showing a rectangular marker to determine the position of that marker in order to be able to draw a coordinate system properly into that image as shown on the right side of the slide I'm not showing you all the code I will explain some methods instead of showing the code but basically you start with opening the camera and then you read an image then you threshold that image this is an example of an element-wise array operation and then I do connected component analysis in this case we will have 93 components including the background so you set n to the maximum index of the component image plus one and then we create a histogram of the components this will give us the area of each component and then we create a binary mask to filter for the components which have a size which is within the acceptable range and then I create an index array select just the indices which I'm interested in using the mask convert it to a Ruby array and then I can loop over the indices so I'm now looping over the remaining components which are not too small and not too big and C is the loop variable and I can create a mask of the component just saying components equals C and then I can create a mask just selecting the edge pixels by taking the difference of the dilated component minus the eroded component so I get a mask with all the edge pixels and then I'm computing the gradient so I'm not showing you the source code of this method but it basically just computes the gradient vectors which direction the gradient pointing up and for convenience sake I store the gradients in complex numbers so it's complex gradients so the real and imaginary part are forming the two parts of the gradient vector and then using a mask operation I just select the creative one-dimensional array with just the gradients of the edge pixels and then you see what I'm heading up so now I try to detect dominant orientations in order to do that I compute the argument of the gradients the complex argument is in this case the gradient angle and then I quantize this into 36 bins and I create a histogram and if this is a rectangle if this is a rectangle you should end up with a histogram with four peaks so I threshold this histogram and then I look for connected components in the binary image in the resulting binary image so if this is a rectangle I should end up with four components and in the end I also create an array partitions which will tell which component each gradient is belonging to and if I have four dominant orientations then I assume I found a rectangle and then my computer is a complex ramp I'm not showing you the source code of this method but it basically is a complex array a two-dimensional array with the real and imaginary part containing the x and y coordinate of each point so we have a two-dimensional array and the first row will be 0, 1, 2, 3, 4 the next row will be 0 plus i, 1 plus i, 2 plus i the third row will be 0 plus 2i, 1 plus 2i 2 plus 2i and so on and then I do a masking operation to compute x so that x will be a one-dimensional array containing the complex coordinates if you will, of all the edge pixels and then the following two statements are just to compute the center of gravity of each line so I first create a histogram of the labels this will give me an array with the size of each line the number of pixels of each line and then I create a weighted histogram where I use the coordinates as weights and if I divide this by the number of pixels each line has I will end up with the center of gravity of each line I also will get the center of gravity of the unclassified pixels but I will ignore that later and then I need to compute the orientation of each line and in order to do this I first subtract from each edge point I subtract the coordinate of the corresponding center of the line so I will end up with vectors pointing into both directions along the line and if I would just sum them up I would end up with 0 so that's not useful but these are complex numbers so I use a trick let's say I have vectors pointing at 20 degrees and vectors pointing down the line 200 degrees if I square the complex number I will end up with twice the angle so I will have now 2 times 20 is 40 degrees and 200 times 2 is 400 modulo 360 is also 40 degrees so now I can sum them up after taking the sum I take the complex square root and I will have my orientation 20 degrees and if you do a web search for line fitting algorithms I'm sure you will end up with more source codes than this and then finally I compute the intersection of each two neighboring lines I'm not going to explain how I did this this was a bit difficult but basically then you will get the corner points of the rectangle so now you see we have gotten very far already and now we want to do 3D I want to be able to do I want to be able to find a function where I can give that function the coordinate for example say minus 2.5 centimeters minus 2.5 centimeters and it will give me the top left corner in the image or minus 2.5 centimeters and y is plus 2.5 centimeters 2.5 centimeters then it should give me the next corner so now I'm looking for a function like this so in order to do this we need to do a little bit of mass but you will be surprised in 10 minutes you will know how to do 3D vision and we start out with I guess everybody knows this the pinhole camera model the pinhole camera model just says the size of the object x1 divided by the distance of the object x3 equals the size of the image on the camera chip divided by the focal length the size of the image on the camera chip is the number of pixels times the actual size of a pixel on the chip so delta s is the pixel size on the camera chip and f is the focal length and the ratio f divided by delta s you need to know but you can for example do it using camera calibration or maybe you can get it from the specification of the camera and this of course holds for both dimensions in 3D so x1 to x3 you have an equation for the first dimension you also have an equation for the second dimension which gets projected now we take these equations these two equations and we reformulate them a little and we introduce an additional variable lambda and this is called homogeneous coordinates anybody who has done 3D graphics will be familiar with this and we just set lambda to x3 and add an additional equation that lambda has to be x3 and we require lambda to be non-zero and the reason to do this is that this will allow us to write all these projection equations using matrix vector modifications so it makes it more easy to handle and this matrix is actually the intrinsic camera matrix this will do the projection part and then it's when you've done this it's much easier to introduce additional 3D rotations and translations by just taking the point the point x1, x2, x3 the vector you just multiply with the rotation matrix and we add a translation vector and if you also use homogeneous coordinates for the 3D point and we do this by just adding an additional fourth element to the vector which is 1 if we do that we can write the rotation and translation using one single matrix multiplication and what we will do now is we will insert all those point pairs of screen coordinate and desired coordinate we have four of those pairs so we insert those so bear with me and we also need to allow for an error so now we have four of those equations so I just write it once so i is between 1 and 4 with the projection and the rotation and translation and because our object is a planar object the third coordinate is always 0 as you can see so this allows us to drop one matrix column and the zero so we will end up with the equation in the second row and then for the moment we multiply the intrinsic and extrinsic comma matrices and now we are going to look for the best matrix H which is going to minimize our overall error and now we get rid of lambda so we take the last row and we compute lambda and substitute the value and you can see it's we have the unknowns H31, H32 and so on multiplied with the error so we can isolate the error so what we do is we assume that lambda 1, lambda 2, lambda 3 and lambda 4 they are actually the depth of each point and they are very too much so we say these are weights for our error if we omit those weights we will not introduce too much of a bias so instead we optimize for the non-weighted errors and now you can see you can isolate the error easily so we have for I, 1, 2, 3 and 4 we have for each of these values we have two equations so the bottom row is basically written in one matrix equation and then we put all these equations into one big matrix and now the problem becomes the problem of finding a vector H so I've written the matrix in a stacked vector form the problem becomes the problem of finding a vector H such that the norm of m times H is minimal and in order to avoid the trivial solution you could say H0 is the best solution so in order to avoid this we impose additional constraint but the norm of H must be an unknown number mu but mu must not be 0 and here comes the big mathematical trick that this is actually the solution is actually the singular validity composition so I don't know if anybody knows the singular validity composition or anybody knows the singular validity composition so you can see you can say a singular validity composition is a more generic kind of singular sorry, a singular validity composition so a singular validity composition you can also do on non-square matrices like this and then the thing is if you have a singular vector and you say m times the singular vector if you multiply this the same vector will be the singular value so you just take the singular vector with the smallest singular value and then you have the optimal solution that's the big idea, it's not my idea I read in the paper but once you have this idea it's solved because now you know the matrix apart from a constant factor but you know that this consists of the known comma matrix and the rotation matrix and not translation vector and the vectors in our rotation matrix are always length one and the third vector which we dropped earlier because the third dimension is always zero for our points you know that the third vector must be orthogonal to those two and you know the object must be in front of us otherwise we couldn't see it so t3 is greater than zero and because otherwise we wouldn't see it either so the scalar product of the third axis and the translation vector must be negative or zero and if you compare that's a source code so after lots of thinking you end up with very little source code actually okay I haven't included the source code of a singular value decomposition I just used a singular value decomposition from the linear Ruby extension but that's all the code and at the end you have the extrinsic comma matrix and then you have enough information to draw a coordinate system like this into your image and if you want to go over this again I can recommend Zinyu Zang's paper about comma calibration and I can also recommend a chapter of Ballard and Brown's computer vision book so the slides are released on the web by the way anyway so you don't need to take notes of the URL and all this if you can find them come back to me later I put the URL on each slide but it's not visible I'm sorry about that so now I want to give you a small demo with this so basically I have a camera here and it's viewing this marker I hope it will work so here I'm watching a video how it gets properly placed onto the marker it's a bit dark but it's still working I knew you would like it except maybe for one person so you can see it really works in 3D although there's a bit of a problem because it's a bit dark in here and I also added some special effects so that it sometimes does this just for fun alright that's it pretty much so if you have any questions yes Ruby is the one we have possibly you know you could take quite a lot of this library and use the camera and the other and all the facilities that do some interesting applications I mean you just had to call my old classes so you would have I guess I don't have a Mac so I guess the camera input and maybe the display will be different it's much plugable because it only relies on memory objects to exchange data you would need to port that power and as far as I know Ruby legit runs on macOS as well next question all of the primitive operations you created on top of Libgit is that part of that Hornets-Iron? yes basically Hornets-Iron provides the array classes and it has some bindings to different IO libraries that's the other problem you need to deal with when doing computer vision so at the moment it runs under Linux and most of it runs or a lot of it runs under Windows as well and PCs so I have IO for loading images loading video files also for displaying you can hardly accelerate the video display under Linux at the moment only and the QT4 integration to display as part of a GUI and then there is well there's array classes which are making use of Libgit and then there's also means of conversion to use Ruby OpenCV or NRA and also make use of the fastest Fourier transform of the WEST library for Fourier transforms FFTW this is called the fastest Fourier transform of the WEST so the Fourier transform I'm not doing with Libgit I'm just using the FFTW library at the moment there's another thing this is a comparison with Python and OpenCV so by using closures or using lambda functions you can save a lot of code and provide a much more concise API and there's another thing I noticed I think in order to create OpenCV and other C++ libraries and performance it will also be necessary to avoid intermediate results so it's quite interesting when you could hear earlier on this morning you mentioned this problem and here's an example where I tried something where you basically you have an array but the constructor it will behave differently depending on the thread variable the purpose of all this is if you do two array operations let's say you compute A plus B plus C at the moment my library will compute A plus B and write the result to memory and then it will do this result plus C and write this to memory so you could avoid the intermediate result to save some time so I'm thinking about something like this at the moment to implement in the near future in the constructor you behave differently depending on the lazy variable of the current thread then you have a method where you can just say lazy do and then you do some computations and then you do end and then it will after the end it will mark all objects as you know you have to compute them and once I retrieve the result of one of these objects it will do the computation only at that point that's another thing I'm thinking about at the moment you also need to deal with color space conversions when you do IO many cameras they provide the luminance channel with high resolution and the color channels with lower resolution so it's split up into gray and into chroma red and chroma blue so you need to up sample chroma greens that's another thing you need to do when doing computations otherwise thank you