 Bonjour tout le monde, je m'appelle Jislan Duryf, de CNRS en France. Et premièrement, merci pour l'opportunité de présenter notre travail à use R2020. Donc aujourd'hui, je vais parler d'un co-op, une libraire pour faire des opérations cannelles sur la GPU sans l'overflow de la mémoire. C'est un joint-work avec Benjamin Charlier, Jean Fédit, Joanne Glonnais et François David Coll. Toutes les informations sur les co-ops peuvent être trouves en ligne sur le site de la libraire cannelles.io. Donc, premièrement, ce sont les co-ops. Co-op signifie opérations cannelles, c'est une libraire C++ et elle est disponible dans R dans le R-package R-co-ops. Ce que les co-ops peuvent faire, c'est-à-dire qu'ils peuvent compter une réduction générique de verre-gérés. Par exemple, si vous avez une réduction de verre-gérés, indexée par INJ, une réduction est des éléments de verre-gérés, ou des éléments de verre-gérés. Ici, vous avez la réduction de verre-gérés sur l'index INJ et ici, vous avez la réduction de verre-gérés sur l'index J. Plus généralement, les co-ops peuvent compter une réduction cannelles et les gradients associés. Je vais expliquer à Nater ce qu'est la fonction cannelles. Mais intuitement, imaginez que vous avez une mémoire, qui est des éléments de verre-gérés par une formule et vous voulez faire un son de verre-gérés de verre-gérés. Ce sont des co-ops qui ont été créés pour manquer une dimension très grande, même plus grande que la mémoire de GPU, et pour faire une compétition rapide sur la mémoire sans mémoire de verre-gérés. Un monde rapide, les cannelles sont widely utilisées en statistiques et machine learning, par exemple, pour la réduction cannelles dans l'estimation densité, en classification, en régression, dans les embeds cannelles pour comparer la distribution, dans l'interparation et la création, dans l'optimal transport. Et qu'est-ce que la motivation des co-ops ? C'est de développer un tool user-friendly pour faire la compétition de GPU. Dans R, il y a seulement quelques solutions pour faire la compétition de GPU, et elles étaient créées pour des tasks spécifiques. Vous pouvez vérifier cette page web. Au cours de 5 ans, la compétition de GPU, les efforts de développement, étaient orientés à la profondeur deep learning. Par exemple, vous avez des libraries, comme PyTorch et TensorFlow, qui proviennent de l'implémentation de GPU de commande opération utilisée pour implémenter les networks neural. Mais la compétition de GPU peut être utilisé pour la compétition d'une génération générale et notamment les networks neurales. Mais pour faire ça, vous avez des codes génériques en utilisant des tools à la hauteur, comme CUDA ou OpenSEA. Il y avait besoin d'avoir un outil d'effort pour la compétition de GPU, avec une grande range d'applications en statistiques et en machin. Qu'est-ce que le corps des co-ops et ce qu'il peut faire avec la compétition de kernel opération et la réduction ? Tout d'abord, let's imagine that we have some data vectors x and y that are d-dimensional and indexed by i and j. A kernel function will be an application, a real value function that is applied to a pair of vector. And it will correspond to a scalar product between these two vectors, but in a different space than the usual d-dimensional real value space. Very intuitively, a kernel function will be a similarity meter between the data vectors, but that is different from the Euclidean distance. And for instance, you have the linear kernel, which is simply the standard scalar product, or you have the Gaussian kernel, which is based on the Gaussian function. And a kernel reduction is simply a row-wise or column-wise reduction on the kernel matrix whose entries are given by the results of the kernel function applied to the data vector index by i and j. And you can compute even more complex operations like reduction on combination of kernels and scalar product. Like here, row-wise reduction or column-wise reduction. Why do you want to do that on GPU? Actually, matrix kernel reduction are a combination of generic matrix operation and GPU are very good for matrix computation. So, we are going to use GPUs, and what are the interests of GPUs? They are made of thousands of computing units, so they are very fast with heavily parallelized computations. The problem is that the amount of memory available for each computing unit is very small, so you have some issues to process very large data. And the challenge is that the kernel matrix k can be very large, up to millions times millions, and it is not possible to store it in the GPU memory. Sometimes it is bigger than the GPU memory. So, you have to be smart about how you iterate through the rows and columns to do the reduction, and to compute the kernel matrix and then do the reduction. So, on GPU, the memory is managed, so there is two steps in the management of the memory. First, the data is initially stored on the host in the RAM of your computer, and it should be transferred to the GPU. To do the computation. So, that is a bottleneck, because the link between the RAM and the GPU is not that fast. Inside the GPU, you have different kind of memory, some that is local to each computing unit, some that is shared. And actually, if you want to do a fast computation, the key is to use the shared memory smartly to reduce the number of transfers between the host and the device. So, to do that, in Kops, we have a tiling implementation. I don't have time to detail it here, but I have joined an example in the slide, so you can get the slide online and check it to understand better. So, I have a small example of a benchmark to compare some computation time between Kops and PyTorch to do some Gaussian matrix vector product on GPU with different data size. So, here basically are the results. Here you have the data size and here you have the runtime. The first point is that for small sample size up to thousands, you have a similar performance between PyTorch and Kops. But for larger sample size, bigger than thousands, Kops, here in Triangle, will outperform PyTorch, here in Square. And you have a memory overflow with PyTorch on large sample. So, it stops here, where the data does not fit anymore in the GPU memory. But with Kops, you can process larger data than GPU memory. I also have another benchmark that is available in the slide and you can check online. So, how you use, how Kops works, sorry. So, basically what you want to do is to compute a mathematical formula based on data vectors. So, here you want, for instance, to apply the exponential to the scalar product between X and Y. So, here it's what you want to compute. And to do that, you will just have to write a formula that is in Kops encoding as a string. So, you just have to write a text string describing your operation. So, here you want to apply the exponential to the scalar product between X and Y. So, here it's what you're going to write. And then, inside Kops, under the hood, the formula will be expanded in the C++ code using templates. Here, you have the templated function. And actually, a formula is an instantiation of a variadic recursively defined templated class. And Kops will compile your custom operator on the fly to be able to compute on GP. So, with Kops, you can use a wide range of elementary operations like simple vector operation, elementary real valued function, simple matrix operation and matrix reduction, like sum, but also minimum, maximum, and so on. And a formula will be a combination of these operations. So, how you use it? So, first, we come back to the website where you have a complete documentation, you have installation instruction, and you have to know that our Kops is available on the CRON. And you also have some example. So, a quick word on the Kops stack. Kops is hosted on GitHub. It is distributed under the MIT license. And to finish, I will present you a small example of how you use Kops in R. So, what we want to do here is to compute a reduction. So, here is a sum of a Gaussian kernel between two data vectors X and Y indexed by A, I, sorry, and J. And apply to a filter matrix, which is B. Here you have some filter vector B indexed by J. So, okay, it is an operation. It is what it is. And how are you going to do that in R? So, first, you're going to write what is your computation using words. So, here you have the square norm. So, the square distance between the data. You apply the exponential. You multiply by the filter. And you do a reduction, a sum reduction over a dimension. It is your formula. Then you define the arguments of your formula and their dimension. So, X is a vector indexed by I of dimension 3. Y is a vector indexed by J of dimension 3. And B is a vector indexed by J of dimension 6. S, it's just a scalar parameter. So, based on the formula and its arguments, you can define the new operator OP. So, here you will define a new function that will actually compile some C++ code. And then, if you call the OP function, it will call the compile C++ code. Let's say that you have some data whose dimension correspond to what you wrote here. And you can then call the new OP function applied to your data. The code is the same if you want to run your computation on CPU or GPU. The only difference if you want to use GPU is that you have to call this function that will say to the new operator that its computation will be run on the GPU. And then you can also take the gradient regarding the X-variable, for instance, of your operator. So, to conclude, K-OPS allows you to do seamless kernel operations. So, you just have to write formulas with simple matrix operation in R. On GPU, to do fast computation with auto differentiation, based on an automatic gradient computation. And without memory overflows, thanks to a tiling implementation. So, a quick word in the future. We hope to develop lazy evaluation in R, like what we have in PyCups in Python. So, here you don't need to write a formula. You just write symbolic operations that are not actually computed, based on the symbolic representation of the data. And then when you do the reduction, it is actually the moment where the computation is done. So, thank you very much for your attention. We have a paper of a run archive and painting publication. And I recall the websites of the library and the GitHub page. Thank you very much.