 Hi everyone. My name is Max Turgent. I'm an assistant professor in statistics and computer science at the University of Manitoba and the title of my talk today is Arbitrary Precision Linear Algebra in R using RCPP IGN and BH, which are two R packages So I want to talk a little bit about the motivation for Doing arbitrary precision linear algebra, but first of all I want to talk about what I mean by arbitrary precision And so maybe the best starting point is to talk about floating point arithmetic, which is How computers hold a finite amount of precision? When they do computations and they achieve this by using form of scientific notation in binary essentially and And it's very important to understand This is a phenomenon because it can lead to counter-intuitive results So one example here is if I try to test whether point one plus point two is equal to point three I get that the result is false and the reason for this is that these numbers have a non-terminating Expansion in binary and this is why there's Inherently some rounding error that happens This is also why you should really use all that equal when testing for this sort of thing because the function all that equal accounts for rounding errors and precision and so arbitrary precision is When you want to or when you need more precision that what floating point arithmetic Allows us to do and this can be useful when for example dealing with very large numbers, which can be the case in cryptography but it's inherently slower because instead of performing the the arithmetic using machine instructions and the CPU it uses a software Which is inherently slower and I want to point out that How even though arbitrary precision Seems natural in most cases when we want to perform computations Whenever possible, we should really try to use more accurate fixed precision algorithm instead and there's Many many people have written about tricks for developing more accurate algorithms So I won't go on into this and the example where I've Encounted the the need for arbitrary precision is When looking at the distribution of large the largest eigenvalue of wishart matrices And so wishart matrix w we say that follows a wishart distribution in p dimensions Degrees of freedom equal to n and covariance matrix equal to sigma if we can write w as x transpose times x where x is an n by p matrix and Each row of x follows multivariate normal in p dimensions Mean equal to zero and covariance equal to sigma This is a very famous distribution in multivariate analysis because it's related to covariance matrices as you can probably guess and so knowing the distribution of the eigenvalues is of importance in some applications and it's been a Very hard problem in general but in 2014 in the journal of multivariate analysis Keani gave an exact algorithm for computing the CDF of the largest eigenvalue of a wishart matrix and In the in the paper The algorithm that he gave the first step is to construct a skew symmetric matrix a and the dimension of a is going to be Related to the minimum value of n and p so you can see how the matrix the size of the matrix is going to grow as both n and p grow And the second step is to compute the determinant of a and take the square root and the third Step is to divide this determinant by a normalizing constant and This matrix a is going to depend on the value at which we want to compute the CDF So the problem that arises with Keani's algorithm, even though it's exact Is that for when n and p are both large? It turns out that both the determinant and the normalizing constant are also very large And so you end up dividing two large numbers With one another and this can lead to inaccuracies and so this was the motivation for me to look at ways to do arbitrary precision linear algebra Because in the paper Keani says that he implemented his algorithm in Mathematica Directly, but when you try to do the same it are you see how you you get you run into a numerical accuracy problems And so the solution I came up with Is based on using two packages that? That are built on to RCPP But first of all I want to talk about existing our packages that do arbitrary precision arithmetic in arm So there's several packages actually, but so they come in Different varieties, but I want to highlight a few of those options so two examples of these are the GMP and the RMP FR packages and what they do is that they provide our functions that Allow users to call directly the C or C++ libraries that do multi precision or arbitrary precision arithmetic Another example that I've seen is the our YAKAS package and the way that it allows Arbitrary precision arithmetic is by creating an interface with a computer algebra system called YAKAS And so Those are several options that you can use but they were not quite as flex quite flexible enough to do what I want to do Which was to do linear algebra With arbitrary precision And so the two packages I ended up using our BH and our CPP eigen They both they both are part of the RCPP package family and the first one BH Is an interface to the boost C++ libraries for our users and our package developers and so If I just read What is written in the documentation for the boot of the BH package says those boost provides free peer-reviewed portable C++ source libraries So BH allows our users to work with these libraries That cover a wide array of applications, but in particular it also provides Access to a multi precision library and so in the code snippet that I give here what I'm doing is including the the header files for multi precision or arbitrary precision arithmetic and Then what I'm doing is actually creating a new data type or a new arithmetic type Using the using type def and I'm calling it MP float which would represent sort of a hundred digit precision number So this is the how you can do how we can use the BH package to do arbitrary precision arithmetic in general And then it becomes very powerful when you combine it with the RCPP eigen package And so again reading from the documentation. It says eigen is a C++ template library for linear algebra And what this means is that you can use this library and use the same function But you can also Use it with user-defined types and so templates mean that you can do linear algebra with those user-defined types and so in the code snippet here I give an example of Creating a 100 by 1 matrix that uses the Arbitrary precision type that we define on the previous slide. So it's really a column vector and then I can compute the dot product of this column vector with itself and this now is arbitrary precision Number So you can see how the two packages together interface and play nice with each other And allows us to do arbitrary precision linear algebra. I Implemented this solution in the package called root wish art Which computes this distribution using the the algorithm from Keane and I just want to give you a quick comparison of what happens when you try to implement it using fixed precision and This is what you get. So here. I have n is equal to 25 and P is equal to 15 So this is a very realistic example doesn't have to be very high dimensional But you can see now that if I try to compute the CDF over the range between 0 and 100 We don't get Monotonic increasing function we see some places the the line is skipped and also we see sort of a lack of precision near 80 to 100 So this this clearly shows that the naive implementation in our doesn't work So what happens when we try to do the same computation, but using arbitrary precision We can see now that we get a smooth curve No kinks no missing values and it's monotonically increasing just as expected And so to conclude the main message here is that the combination of the BH and RCPP Diagon packages provides a lot of flexibility to do linear algebra And and using arbitrary precision and all of the other limeraries in BH The cost of course of this flexibility is that you get slower algorithms, which is inherent to arbitrary precision arithmetic but also The user needs to code in C++ right so the the advantage of some of the other packages that I mentioned like GMP Keyani actually has a second paper where he proposes a similar algorithm, but for double wishart problems which arise when you talk about multivariate ANOVA or canonical correlation analysis and I also implemented the same algorithm and the root wishart package But just to finish I want to point out once again that as a general rule More precise algorithms with better numerical accuracy will always be better than arbitrary precision Just because they're going to be faster