 Welcome everyone, so I would like to speak to you about the preservation of numerical algorithms and first I would like to tell you how I got into this project. I'm a physicist, not a software developer, not a hardware developer and in my work I use numerical methods to solve equations of physics. And it did happen a couple of times to me that I was looking for a method to solve a certain kind of problem. It was in a paper in a scientific journal and they said that the code is available to interested people. The date of these papers varied from the 1970s to now and it happened a couple of times that I wrote an email to the address given there or if other papers there was no email address of course just a mail address but you can Google the name of the person, you write an email, wait and it bounces back or there is a web address, you go to it and it seems that the head of that research group who came up with the algorithm retired or moved to another institute and the next time they redesigned the website the code was gone. So I will talk to you a little why it might be important to preserve it if there was a paper about it the algorithm is there you would think. And numerical mathematics obviously has a long history far longer than computers. So most of the mathematical background was worked out before the computer age but how they describe the algorithms was aimed at hand computation so where computer was not an electronic device it was a job title. So when you describe an algorithm you describe it to a person and the person is of course far more intelligent than any machine so you don't have, there are a lot of things you don't have to care about. They worked out lots and lots of algorithms far before the computer age so for example the fast Fourier transform was invented by Gauss and reinvented by a number of people even Lanzos invented it before Kuli and Taki he just did not have a computer to actually implement and run it methods to solve differential equations were worked out by Euler or by Runge and Kutta in the late 19th and early 20th century but all these descriptions aimed at human computers you want to expect them to think you can expect them to realize exceptional, recognize exceptional cases or if there is a case of loss of significance you can expect them to realize that oh I did not calculate enough digits let's go back fix it, calculate with longer numbers computers don't do that the computer age started around the late 1940s and from that on everyone has to communicate their algorithms to humans and computers at the same time and for machines obviously nothing can be left to the reader the machine will very diligently follow what you program if your program has mistakes or it's an exceptional case what you will get is garbage in and garbage out so that's the reason why if there is a good computer program that's the most precise description of an algorithm that's possible at least for that machine on which the and that compiler if it's written a computing language where the author wrote it there is nothing left to the reader that if you get your algorithm in that form you have the chance to run it, it might solve your problem or if you have to re-implement it for a modern machine in a modern language the best way to test if you can trust it is to run your version run the original version and compare the results if you come up with enough test cases then you can be more and more sure that it works so this project I got into it and I think I did not tell the answer I was looking for a certain algorithm, googled it did not find it, googled it if there is someone who is actually trying to preserve numerical algorithms and I found that the users of the R programming language that's mostly used for statistics have run into the same problem and they came up with a project which aims to preserve numerical algorithms and not just preserve them but to find where the ones that are being used by the R interpreter come from and what is the original one and if there is doubt if it's possible to run the original one and compare the results the initial idea for this project came from a bug report where this programming language has a built-in optimization software for unconstrained minimization and someone reported a bug in that code so they could come up with a function in multiple variables and see that this minimizer does not converge it does lots and lots of steps and then finds something which is not a minimum and if you look at the source of this code the first couple of lines is remarks so a copyright message and that this has been translated by F2C from a FORTRAN version of some code which is not included you start looking for it you find many FORTRAN versions of the code you want to know if the bug came from the FORTRAN version from translating it with F2C to C or from the later editings that people did to make it more look like a modern C program and so in such a case you need to find if the original version of the code is still available is it in a machine redouble code or is it published in the form of a printed listing has someone retyped it or translated it from multiple languages and some numerical algorithms have a very long history so somewhere first written in FORTRAN might have been translated to C by F2C then they might have been edited manually and most numerical methods were published in one of the two old programming languages either FORTRAN or ALGOAL60 in the case of ALGOAL60 it was not a very successful language from a commercial point of view so for many computers there was no compiler available even if there was one available it might have been not very efficient so people often did manual translations to other languages and every time you touch the code it's a possible source of errors so it would be very nice if in such cases we could compare the results with the original one there are very many programming languages the first computers were usually programmed in binary then came some auto codes which were made from one machines then assemblers and in 1957 FORTRAN was the first programming language that became quite widespread at that time it was not standardized but available for IBM machines and later on people wanted to enhance on previous languages and then came up with newer ones so FORTRAN was 1957 the ALGOAL effort started in 1958 and there were later versions basic which is still popular in retro computing started in 1964 at that time on a university time-sharing mainframe and later languages include Pascal, APL, PL1, C, C++, Python so there are so many languages with old code it's an important question if we can still run them on a PC at least for the goal of comparing the results or for actually using the programs or finding bugs if they were there in the original one and in the case of numerical software an additional difficulty might be that you might not even know what computer the original program was written for and computers before 1985 or some even after that had quite varying formats for floating point numbers so even if you can run the language you might have to think about what it means by a real number and what follows I will have a still history and motivation a quick look at the programming languages they use in mathematics or numerical calculations so the first one is FORTRAN was the first compiler sent out to many people were in 1957 and a lot of people were skeptical when IBM or John Beckles first spoke about it that he wants to come up with a high level language where you can actually type your mathematical formula and the computer will evaluate it and understand it and it's readable for humans not binary not auto codes or assembly for one specific language but in one year in the most computing centers who had IBM machines in a questionnaire sent out by IBM they answered that more than 50% of their new code is actually written in FORTRAN it was such a great help in the development process and they also realized that to reduce the duplicated efforts they can form users groups and send their sub-programs or programs to each other and some part of it was collected by the users group into the IBM SSP or Scientific Subroutine Package so from that on it became widespread that people implemented numerical methods at different places they sent it out to each other and users groups and people working at computing centers started collecting subroutines for themselves and at the same time while people were interested in using a code or reusing it as often as possible the language also developed and it became standardized in 1966 and ever since there has been a continued development so the latest standard came out last year and this is one of the reasons while a lot of computing work is still done in this very old language that the newer versions of the language always include a very, very large part of the old 1SS subsets so if you have a FORTRAN program from 30 years ago it's most likely that it's still a valid FORTRAN program now at least if it was a valid program then and this may have multiple definitions like valid according to the standard or valid according to one compiler on one machine on a present computer like this one here you have an almost perfect support so you have not just one but multiple compilers they are very good at the standard compliance and on older machines the original FORTRAN from the 50s then the FORTRAN 66, FORTRAN 77 they were very conservative languages they were not a very complicated language so some computer vendors in their own compilers started adding exceptions the good thing is that G4TRAN supports even some of these like this quite unusual way of specifying a double precision reel instead of spelling out double precision and besides an actual compiler there is also a source-to-source translator available to see so that's one way of running a FORTRAN program but that's also how some old codes got into newer projects that someone translated it and ever since they've been editing these C codes the other language in early publication it has been probably even more often used than FORTRAN was ALGOL 60 it was not a language that was invented by one vendor like FORTRAN at IBM but by a group of experts and the ideas started to come from the International Algorithmic Language in 1958 and then there was the ALGOL 60 report which was an almost mathematically precise definition of the language syntax in the Bakus-Nawood form which is still used nowadays to specify a programming language and later there were some fixes to ambiguities of that report so the language has some beautiful properties like it introduced the block structure that makes it much easier to avoid mistakes that you would easily make in FORTRAN and a lot of other things which is the reason why we still refer to some languages that we use nowadays to belong to the ALGOL family on the other hand it might have been an unfortunate decision not to include input-output in the standardization of the language they saw that computers are developing so fast and changing so rapidly that whatever they come up with a standard for input-output would be hardware-dependent and would be quickly outdated if you compare the input-output of FORTRAN and C well it's quite similar so that might have been a bit pessimistic about the longevity of an ideal and what was even worse that the language was not supported by IBM which was the biggest computer vendor at that time so it did not get that widespread also in other languages there the printed representation of the language in a book and the machine representation in some binary format was quite different so if you come up with a printed version it's not enough to type it you have to adapt it to your compiler or your language and if you have an old computer code even if it's preserved in a machine-readable form you might not be able to run it as it is and it's often used as a kind of pseudocode that is more aimed at humans still we have support in Linux which are source translators to C and there is the name of two translators the GNU Mars and JFF Algo which you can find on GitHub it has been written by Jan van Katwijk and also there's an implementer called Maes mathematical software often came in the form of subroutine libraries so the idea of a subroutine to separate a part of the program is quite early from the time of the first computers there was a first scientific paper on it written by Gostin and von Neumann in 1948 and three years after there was the first published subroutine library in the form of a book which included binary which included the machine code form of these languages and then people started personal collections and users groups and quite shortly after that there were formal collections the American association for computing machinery in their journals started publishing an algorithm section numerous and mathematics and other journals so there are a couple of other journals listed there which published mathematical code and there were also a number of books which not just taught people to do numerical analysis they also included a set of subroutines which you were expected to be able to use after it now these languages the collected algorithm of the ACM there the first hundred were printed in the journal in August 60 later more languages were allowed to be used like Fortron, PL-1, Matlab and so on and now we only have the later ones in machine readable form and that's more or less true for other journals so what I list here might be interesting because that we can use to test our tool chain if we can still run that and later subroutines and libraries were also available in computer readable form like the SSP from IBM port from Bell Laboratories which was first published in the first part of this library was published in a journal actually and there was sleightech by the U.S. National Laboratory Physical Laboratories, CMLIP by the National Institute of Standards NSWC and NUMAL and commercial libraries so to see how well these codes are preserved we actually tried to test some of them like we used the ACM algorithms one to four of which the Bearstow root finder is the most complicated to check if our tool chain is actually good enough they were in a printed representation not in some machine representation so you definitely needed some hand editing to get into a computer ready form and the GNU Mars the GNU Mars it was actually possible to run it or you could see that when they printed something in a journal that was not necessarily directly printed from the computer readable form but written in a human readable form where people could easily lose a semicolon which can result in algo as 60 in the next block completely treated as a command and first with every algorithm I tried to run I found a compiler bug but later on actually it was possible to just type one and run it what I tested is this one algorithms one to four from the ACM and the NUMAL from the NUMAL library I just picked out one bigger code which evaluates the exponential integral and then its dependencies and again after a couple of compiler bugs fixed by the author it did run so with algo my experience is that there are algo as 60 compilers in Linux however they have a small user base so if you want to run it you might need to build your own Ubuntu package I did and you can find it in my PPA there is a big difference between the printed and the computer representation of algo as 60 some use the recommended IO by Knut somewhere you just don't get a main program and you have to write it to test the algorithm I think now with a bit of our bug testing and the compiler's authors bug fixing there is good enough support to compare a code with your your own implementation I will skip most with Fortran the experience is there much better there is almost perfect compiler support what you have to do to preserve code is to check if it's standard conforming and maybe fix it or find the compiler switches that help you compile order code with certain kinds of standard violations and one of the there we did not test single routines but whole subroutine libraries which are available because Fortran was much more or still is much more widespread in the community so there the experience is you can find often even an old Debian package source and with minimal changes compare it and in the case of Slatec and the port library we managed to very quickly get to the point where all the tests included with the library actually run so here preservation mostly means keeping the source at least to yourself or convincing authors that before they give up on a project to publish it not just on their homepage with a license but completely as in the case of port a large part is public domain that you can find on the internet but the non-public domain parts were still given freely for non-profit use however the web page is not there anymore and back to the project started with UNC Min we found the code many versions and I implemented the test case where the one in our failed in Fortran with the original version and that one worked perfectly so the conclusion is that in this case it is worth finding the original version of the code comparing because then you can filter you can find that the bug actually appeared in some later development so my conclusion is that there is a lot of legacy mathematical code there that is still used it is essential to save the original version to preserve the algorithmic knowledge for comparison and for testing later developments many of these codes are very high quality so they were written in the 70s and 80s by big US laboratories and universities they were made public domain for example because there was government money included however they are in old programming languages like Fortran and Algo 60 and the task to do so is to future proof the future proof the codes test your tool chain especially if it's not a really widespread language like Fortran and save the code at least for yourself and check if there are bugs so thank you for your attention share this knowledge on this GitLab page repeat the question for the camera sorry so the question was where we intend to publish this knowledge and there is a GitLab page which includes not just code but it's in our markup so you have the in the case of our algorithms it's a document where you can actually run the code there is one more question yes and sorry so the question was that even if we have the code and the language is correctly implemented what about hacks in the original machines and so in this big subroutine libraries luckily these were actually aimed to be portable among at least a couple of computers so far we have not really run into ugly hacks some standard violations so it was quite surprising that we have found one case of an ugly hack but it was not really in the code but it's recommended in its documentation how you can save memory where you use it and actually in its test case it was used so it was aliasing Fortran variables and the modern compiler does a lot of optimizations so it just reorganized the order of stuff and immediately the test did not run but the bug was not in the original code but in the test and the documentation no not really so the codes are high quality but they were written in a way not to demand too much from the floating point implementation but if you look at the floating point implementation at many older computers they were much worse like IBM computers sometimes used chopped hexadecimal instead of rounded binary thank you