 Hello, everyone. My name is Kiran. I am a compiler engineer working at ARM in the Manchester office in UK, basically working on the Fortran front-end for our compiler. And the topic for my presentation today is flang, the Fortran front-end of LLVM. This is basically a summary of the contents of this presentation. First of all, I mentioned a word about Fortran. I wasn't sure how many people here will be familiar with it. Why is it still important? Then there was a similarly named project that I call it old flang. Then I come to the new flang project, which is the subject of this presentation. I discussed the various compiler stages giving a bit of details about how things are implemented. Then I talk about how OpenMP is handled in this compiler. Then I talk about how, very briefly, about the driver, the plans for the driver, how the driver is interfaced with current LLVM. Then I talk some of the details about what is the process and what is the status of the submission of this project to the LLVM repository. I then give a few information about how to contribute if someone is interested in contributing to this project. I then give a brief status of where we are with the implementation, what are the tentative timelines for this project. Finally, I conclude. Whenever people talk about Fortran, they think that it's a very old language and they don't tend to understand the importance of it. It was probably the first high-level programming language, but it still continues to be popular, particularly in the HPC community. I have a sentence here from Steve Lionel, who is also the chair of the Fortran Standards Committee. He gives a bit of reasons about why Fortran continues to be popular, mainly because there is actually a source of over 40 years old, which still continues to compile because of the standardization of Fortran. Then Fortran has strengths in floating point computation, array processing, and all that. It's actually now a modern language with support for object orientation, modules, and parallelism. Old Fortran tended to have this, you know, that you have to write things in fixed columns and all that, but new Fortran is free source, free form. You can write it anywhere and it does not matter. It continues to be used in the real world, particularly in applications like weather forecasting, numerical simulation, modeling, and also in very important libraries like LAPAC, SciPy, and all that. It also continues to be standardized on a regular basis. The latest standard for the Fortran language, which came out, was in 2018. And the next standards are in the works, 22x and 22y, expected to come in this decade. So to give one more slide about the popularity of Fortran, in the UK there is a supercomputer called Archer, and they give some statistics of what are the kind of applications that generally run on this supercomputer. So this graph on this right side is based on languages. So we have C++, Python, C, and Fortran. And there is one bubble for each application that is run on that supercomputer. The size of the bubble represents the amount of time that is spent on the supercomputer. The darkness refers to the number of users that use it. So you can see that on that supercomputer a lot of the applications that are run continues to be Fortran, more than 60% of the application continues to be Fortran. So it's very important in the HPC community that you have Fortran compiler and you are able to generate high performance code for that. Now we come to the old flank project. So old flank project was a project that was designed to generate LLVM IR and to interface with the LLVM infrastructure. It was a project that was sponsored by the US Department of Energy and its national labs. So they signed a contract with PGI or NVIDIA and PGI was to take out the front end of their compiler, make it generate LLVM IR and then open source it using some license. So they open source it to the NAPASHA2 license and they recently switched to the LLVM license. Now this project was available since May of 2017. It runs on various platforms, ARX64, X8664 and PowerPC. And with the availability of this compiler, it filled the key gap in the LLVM story for HPC. Previously LLVM did not have a front end for Fortran. So if you have to use it you would have to either use G Fortran or some other compiler. There was no free alternative that generated LLVM IR for HPC. So this was the first one which filled that key gap for HPC. Now this was not just a project which was to demonstrate that it's possible. It was adopted by many companies and it became the Fortran front ends of their compilers. PGI compiler obviously because of the source it was adopted by ARM compiler for the Fortran front end and also it was the front end for the AMD optimizing compiler. Now it was not just a compiler, it's also a very performant compiler. So I have two graphs here. This is taken from Steve's calcone of N Media from his presentation in Euro LLVM. So I have several benchmarks here. Pauli had run various versions of spec and spec coins with parallel benchmarks. And the last one shows the geomene of all these benchmarks. The light blue color shows for flang. The dark blue is for G Fortran and the yellow line is the PGI compiler. So as you can see from this geomene this flang is actually better in performance compared to G Fortran and approaching the performance of the PGI commercial compiler. The one on the right side is basically a comparison of all the Fortran benchmarks in spec 2017. Again it's the same story here that flang is better than G Fortran but it's a bit slower than the PGI commercial compiler. So I'm showing this slide basically to show that it's actually a performant compiler. So currently the latest standards in Fortran are Fortran 2003, 2008 and 2018. Flang did support a good amount of these standards. So when it came out it supported Fortran 2003 mostly. Coming to Fortran 2008 it was mostly partial but as years progressed or in a couple of years more features were added. The only important omission is that there's no plan for co-arrays. Co-arrays is a parallel feature that is embedded into the language. But other than that most of the features were supported in the flang compiler. But coming to Fortran 2018 there is no plan in the whole flang compiler to support that. Now although this flang compiler is a performant compiler and it was adopted by many companies as they're frontend it still had a lot of issues and these are some of the issues that became the reasons why the new flang compiler is necessary. So the first one was that it was not a pure open source project in the sense that it was actually ripped out of a commercial compiler. And the way it was open sourced there is no way to actually make submissions to that project and then get it accepted. Only a handful of pull requests were accepted over a period of two years. The reason was that any pull request that you submit had to go into their PGS or NVIDIA's commercial compiler and then it runs through all the CEI and goes through the reviews there and then and then if it's all fine it has to come back to flang and then finally get approved. And there were no developers assigned for this and it was the managers who were doing this and you know so that did not work out very fine. The other reasons were that the code is very old. The PGI or the flang project was open sourced. It had a code history of almost 30 years. So you can imagine that over 30 years you know a lot of rotting has happened. The original people who implemented that compiler are no more there. And there are a lot of other things like you know they used flags which are new numbers. There are a lot of global variables and things like that and typically when you try to fix something here then something breaks there. So it had all these old issues. So its code is all difficult to maintain and the entry bar is high. The error messages provided with compiler usually gives only the line number. There's no column number so it is not high quality error messages. It was also when they approached the LLVM community it was mentioned that no we cannot accept it as our frontend due to some of these reasons and also that it's trying to see. It generates LLVM IR but it does not use the IR builder. It generates IR using print of statements and all that. It cannot be used as a library which is one of the you know cornerstones of LLVM that things could be used as libraries. Because of all these issues it never became the frontend of LLVM. So now there are two options. One you can try to improve the existing flank compiler or try to write something new. So work started on trying to improve the old one but that did not go that well. So you know people desired that okay let's write something new for the next 30 years and that's a new flank project. So the new flank project or F18 that was a name that was initially given but in the community when it was accepted as the frontend of LLVM people decided that flank is the right name for that project so that rhymes with Clang also. It was accepted as a frontend somewhere in the middle of last year. It uses the LLVM license, Apache with LLVM exceptions. PGI or NVIDIA continues to be the lead developer of this project. ARM, AMD and people in the US National Labs are contributing. Everyone else is also allowed to contribute. The project is being developed in the open in GitHub. And the main initial features of this project was that it uses the Fortran 2018 standard as the primary reference for implementation. It is very standardization friendly. We will see that in the next few slides. It is written in modern C++. It's actually written C++17 existing LLVM code base in C++14. So it's a step up from there and they got a specific exception for this project. Yes, it's written as C++ classes. It's only lowered after doing all the semantic checks. Old flank was lowered early. It has high quality social equations can be used for tooling and somebody in NVIDIA has already written some portion of a flank detool for that. So now we come to the various stages of this compiler. Particularly, I'll be in the next few slides. I'll be looking at the preprocessing, the parsing and the semantic analysis. So there is a stage called the preprocessing is composed of two things. One is pre-scanning and preprocessing. So the whole idea of this stage is that Fortran is actually a very difficult language to parse. It has a lot of contact sensitivity there. White space are not important and a lot of things like that. So what the pre-scanner does is that it tries to remove as much of these issues as possible. It normalizes the source, expands the macros, changes all the case to lower case and then what the output is a cooped source and also a provenance is generated. So all the next stages look at the cooped source rather than the original source. And the provenance is basically that it maintains maps to the existing source, the original source so that if a later stage comes up with an error message you can always map it to the original source. The next stage is parsing. It is basically a recursive descent parsing that is used. There are no tables or anything that is there. The parser is written in a declarative fashion. It's a bit of a stone on the right side example which we come to. The grammar is taken from the standard left recursion and all those things that hurts recursive descent parser are removed. It basically uses the idea of parser combinators. There are various token parsers that are written and then there are functions and combinators to combine these to form more complicated parsers. The parse tree actually follows specification of the standard. So that's why one of the points that I mentioned earlier is that this is very standards friendly. So I have different entries on the right hand side. The first one is the original FORTRAN source which says integer x is equal to 1. So I am concentrating on that x is equal to 1 part. Next when we come to the standards document, what the standards document says is that an entity declaration contains an object name that is x here. It can be an array so that means it will have an array spec. It can be a co-array which is a parallel feature in FORTRAN or it can be a character then that information has to be there or it can contain an initialization. So this array spec, the co-array spec, the character length and initialization are all optional. So the parser that's written also follows from that. So the parser basically says that you generate an entity declaration and it should have an object name, maybe an array spec, maybe a co-array spec, maybe a character length, maybe an initialization. So where the parser is written, it follows what is written in the standard and when you come to the parse tree node also you can see that it's basically a tuple which contains an object name followed by an optional array spec, a co-array spec, a character length and an optional initialization. So all these things follow the standards very closely so that if someone wants to add a new feature it's probably easier to do that. So once parsing is complete you go to the semantic analysis stage. So the basic job of the semantic analysis is to do that whatever code that passes through this stage conforms to the standard. So there are a lot of error checks and all that happens at this stage. So it's composed of many things. So first is that it does label resolution so there are a lot of jump statements and all that with labels and all that. So it goes ahead and checks that all these labels are valid, they are in the correct scope and all that. Then it does name resolution basically it constructs, go to these declarations, tries to fill in a symbol table and tries to assign scope for all these. Now the problem with Fortran is that because as I said it's difficult to parse it can be the case that the parse tree is ambiguous so you might actually have to modify the parse tree based on new information that became available as part of the name resolution. So this stage does and goes and changes it. So one of the things that is at the array the way you call the way an array is written and a function called they look very similar. So there can be mispasses there and once the parse tree is modified then a constant expression evaluation happens and then this checks for expression and statement semantics are done and then a module file is generated if it's a module. So that is a matter of this slide. So it basically describes a module format. So I have some Fortran source code here. Basically it has a module variable. It has a module called module wars. It has two declarations for an integer and a real. It contains subroutine called add valet which basically takes in a parameter x and adds that to the variable a which is part of this module. Now there are various ways. So the idea is that once you have this as a module other place in the code can use this module to access the variables here or the functionality here. So once you process this file it's written as a module file and other portions of the code can read it and do whatever they want. So how do you represent this? Various compilers do this in different ways. You can actually dump it as an internal format and then read it back. But the new flank does is that it actually dumps it as a Fortran source itself. And then you can read it back and then the pass or pass it and constructs a tree and all that fast. So one of the reasons that it can do this fast is that you don't have to do preprocessing or anything. All that has already been done. And the parser is very fast so that it can read this and then create the internal structures. So the module file basically contains the name of the module, the version of this module. It contains a checksum and all the publicly visible variables and functions that are part of this module. Now we come to the optimizer. So once the semantic checks are all done then you can either generate LLVM IR or you can have an intermediate high-level IR to do some high-level optimizations. So that's the approach that the new flank compiler does. And it uses the ML IR framework to, it defines a new IR called FIR or Fortran IR and the AST is lowered to that IR and then a lot of optimization passes are done there. So the reason it does is that sometimes Fortran requires a lot of information that requires knowledge of the Fortran language and only with the presence of that you can actually do these optimizations. So I'm not going into the details of this. There was a talk at LLVM Dev. If you're interested you can go to that. So I missed a point. So I said that you generate the ML IR dialect of Fortran. So there is also an LLVM dialect in ML IR. So what finally happens is that the ML IR dialect of Fortran FIR is finally lowered to the LLVM dialect in ML IR and there exists code in ML IR which translates the LLVM dialect down to LLVM IR. So they will use that to generate the LLVM IR. So that's how the LLVM IR is generated. So the next slide discusses how OpenMP is handled in this compiler. Now it should be noted that OpenMP is standardized for both C and C++ and Fortran. So there exists a lot of code in the LLVM compiler which generates code or the generates LLVM IR for various OpenMP constructs. So the design is such that we should be able to reuse some of this information that's already there in the LLVM infrastructure. So to do this we have two components. One is ML IR again and the second one is the OpenMP IR builder. So we define a new dialect for OpenMP called the OpenMP ML IR dialect and then so the reason is that because Fortran although the Fortran FIR dialect is there it only represents the Fortran language constructs. It does not represent anything in OpenMP and to just restrict the FIR dialect only to Fortran language there is no OpenMP in that. So we have this separate dialect for OpenMP. The second component is the OpenMP IR builder. So OpenMP IR builder is basically a project which is started by Johannes. So what this does is that it takes out all the code which generates LLVM IR for OpenMP constructs in Clang and moves it into the LLVM directory. So there will be functions like create a barrier which will go and generate the code for the barrier construct. So now in Flang project also we can call create barrier using the OpenMP IR builder to generate LLVM IR for that. So that's basically what is summarized here. So you have the Fortran language the pasta generates the AST and when it comes to lowering a mix of FIR and OpenMP MLIR is generated and then it's finally transformed into a state where the OpenMP MLIR will exist together with LLVM MLIR and when it's translated the translation library will call the OpenMP IR builder to generate LLVM IR for each of these OpenMP constructs. It can be barrier for the parallel region and all that and the outlining and all those things will actually happen at this layer. So I'll just look at a couple of examples. The first one is the parallel construct. So you have this Fortran code here which has a parallel construct and has an assignment and an addition there. So this complicated thing in the middle is basically a representation of the AST. You can see that it's similar to Clang but it's Fortran specific and the one generated by the Flang compiler and on the right here you can see that you have actually generated OMP.parallel which is an operation the OpenMP dialect and this add up is actually something but a standard dialect but there could be more FIR dialect sitting around this and in the next step or in the final step you will have the OpenMP dialect sitting with the LLVM dialect and then you will use the OpenMP IR builder to generate the outline function and the Fort call which calls it outline function. So it's not necessary that we will always use the OpenMP IR builder. It could be that things can be handled completely inside the MLIR layer itself. So this is an example for that. You have the OpenMP collapse construct which basically says that collapse the two loops in a single loop. So that's the example that we have here. You have two loops, two do loops inside an OpenMP parallel region with collapse set to two which means that collapse the next two loops in a single loop. So in the first stage there will be an OpenMP operation sitting together with FIR do loops and it will be converted into another dialect called the loop dialect with loop.for and the loop dialect has some coalescing already implemented. So you'll use that to convert those two loops into a single loop and so now that collapse operation is actually fully handled inside the OpenMP in the MLIR layer itself so that for that particular operation you don't have to use the OpenMP IR builder for that. So next we come to the driver. So when we have a new front-end that has to be integrated to the existing driver. So the approach that is being taken is to create a new binary called bin slash flang. It will reuse the lib flang driver and the options file that's options.td. So the sample invocation would look like bin flang-fu bar, fu bar.f90 and it will call bin flang-fc1. So just like clang if you call clang then it's actually internally calls clang-cc1. So just like that you'll call bin flang with fc1 which will then call the real flang or a fade in front end to do the compilation. So it has to be noted that most of the many HPC applications are mixed source programs. It is a mix of cc++ as well as Fortran. So it's important that the compilers are aware of each other. So this flang compiler can also be called with driver mode set to Fortran with clang and then it will call flang. So the initial plan is through, so currently clang has a feature to call gfortran if you call it with a Fortran file. So that will stay for now but if you call it with driver mode is called Fortran then the flang compiler will be called. I have put a pointer to the RFC if anyone is interested in the details. So next we talk about the submission to LLVM project. So as of now this project still exists as a GitHub project somewhere. It's not yet part of the LLVM although it has been accepted as the front end of the LLVM compiler for Fortran. So we made an initial submit, initial attempt to submit this to the LLVM project and then we got some feedback. So the initial attempt was to submit the master and semantic analysis checker. So it seems that at that stage there is not much of LLVM that is used. So that's still modern C++ using standard libraries. It does not use LLVM API, it does not use some of the LLVM data structures. So the community came back with some suggestions saying that you have to use these LLVM APIs, confirm more to LLVM practices and all that. So some of these are listed here. So some of these are already done like moving the public headers to include folder, renaming CC files as CPP files. Although this project used Clang format it had a few more additional settings. So we are seeing whether we can remove all those and also these other things like file system handling, it uses used standard all stream and all that but LLVM has its own stream handling. It also used some scripts to do the testing. It has to be ported to use LIT for testing and also finally to use LLVM data structures wherever applicable. So you can use dense map or small vector and all that wherever it's possible to use. So one of my colleagues David Truby is actually working on all these things to try to get it so that you know it's more suitable or it looks more LLVM friendly. So in this slide I basically mentioned the status of this project. The password work is completed. This project passes for TRAN-28 in completely and also open mp4.5. The semantic checks are mostly complete. Work is in progress on the MLI based of optimizer. Work is beginning on the runtime. Some portions of so basically the initial plan was to use the old flying projects runtime for IO but finally it was decided that we will rewrite it now then reuse it. The math library will continue to be PG math. So there's a PG math library which is the math library for the old flying project. So this project will also continue to use it and work has also begun on doing the implementing the open mp portion. So I have also given a tentative timeline for this project. Moving to the LLVM project report should happen in one or two months. Serial code gen by middle of this year. Parallel code gen with open mp4.5 early next year and open mp5 with co-arrays which is the parallel feature embedded in the language by end of 2021. I have a slide about contribution if someone is interested in contributing to this project. This project welcomes contribution. Code is out there currently in GitHub. You know you can submit code or even bug reports. There is a documentation directory which has a lot of documentation. For people concerned with C++ style there's a C++ style and if someone is not familiar with Fortran there is actually a guide for Fortran for C programmers. It's basically lists the differences in Fortran compared to someone who is used to C or you can start with the overview documentation also. There's a project page which has a list of various items in progress, things which are completed or things which are going to start or you can pick up issues from the GitHub issue tracker. Basically if you want to contribute something it will be good to send a mail to flying dev so that you're not duplicating someone else's work. Code reviews happen in GitHub itself. There is a file called pull request checklist. You can read that. That's also in the documentation directory if you it's basically a checklist before you make a pull request but it should be noted that once this project makes it to LLVM some of these things things will change. Basically it will not be pull requests will not be in GitHub reviews will happen in fabricator basically. So in conclusion all Flang demonstrated that an industry strength high performance LLVM based Fortran compiler is possible. The new Flang or the FITN project addresses the deficiencies of the old compiler. The new Flang is accepted as the front Fortran front end of LLVM. Submission is expected to happen soon. It fills a key gap in the HPC story for LLVM. It did not have a native front until now. It's written in modern C++ uses MLIR shares code with LLVM wherever possible particularly in open MP driver etc. As I mentioned it also is very friendly with standard so it aspires to be the platform where people will come and code and check whether you know this is a new feature that can be taken up in the next standard. It's act under active development and you can contribute if you're interested. Thank you. WRF which is a weather forecasting application from the Americans. They state please compile with Intel because it's 30 percent faster. I didn't see in the Intel compiler results in your benchmark comparisons. I don't know whether WRF is there and there it's part of spec also. No none of them is Intel. It's Flang. It's PGI and the GCC compiler. Makes me buy 800 euro CPUs. But so I was wondering is there in the future plans is there like a performance review of an optimization. Is that there or So basically the question is that for applications in weather forecasting like WRF the suggestion is used to use a proprietary platform dependent compiler like Intel and is there a plan in the new Flang project to do a performance review at some stage. So as I have mentioned there is actually a high level optimizer as part of this project. So when you even when it reaches maturity and when it generates LLVM IR and you can actually run executables these things will be checked to see whether it's as good as the Intel compiler or the other compilers and to be honest to so I am working for ARM if we finally decide to have this as our forefront and it should be at least as good as the old one and preferably it should be as good as the Intel compiler also same with PGI also. So they are all you know performance is very important for all of us and that will definitely be a strong check that will be applied before this replaces the current Flang compiler in the commercial products. You mentioned that a lot of HPC code is a mix of Fortran C and C++. I know like for Rust the Rust developers worked on being able to do like cross-language LTO between like Rust and C++ and they have that working. Is there any plans or was a state of interop between doing LTO between C++ and Fortran with Flang? So if my understanding is correct I am not an expert in this area but once you generate LLVM IR so once you generate object code it should not I don't know how much language specific is or things at that stage so I have actually seen examples where LTO works fine with Fortran as well as a mixed Fortran so I think it should work fine. So the basically the question was whether LTO will work with Flang with Fortran and C or C++ applications. We'll go this okay sorry. So the question is have we started testing this against Fortran reference test suite? So do you have a particular reference you would in mind? So there are some test suits out there but the one which is popular with companies at least is the NAG compilers for Fortran reference test suite. So that is something that people test it against but the problem is that when you if it's error messages and all it's different compilers provide error messages in different ways right the text is not the same sometimes the line numbers are the same but it can also be actually a bit different so that comparison is actually a bit hard to do but we will do that check you know and we do actually run a lot of Fortran reference codes like already we have run it as part of many applications and also some other tests that are part of the whole flying project as well as internal tests that are part of various companies and it seems to be doing a good job and that's why the semantic that's why I mentioned that semantic analysis checks is almost complete because you know it actually finds a lot of issues that are out there also another thing that is being done is that whenever there's a restriction that is mentioned in the standard that check is developers write tests for each of those checks and there is someone who is actually always going and reviewing if anything is missed so so yeah a lot of emphasis is placed on those things sorry someone at the back had raised the hand so the question is that there are two compilers called flang one is the old one and the other is a new one and which one is this performance results so this performance results is for the old one reference see it's not for the new one the new one does not a might just be able to generate lvm ir in some branches but but even the mainstream master branch does not generate lvm ir so it's not in a state where you can actually run code so we have not reached that stage for the new f a new flang compiler yeah so it's being currently merged into the master branch so somebody has gone and done that on a branch uh so and it's still a work in progress so we are not at so and it's not just that also you have to have the libraries fortran has a big library runtime library so that has also to be done so there's still a lot of portions that are not yet complete so these timelines are all tentative and i have seen that these are missed also so uh for open mp 4.5 early next year or middle of next year is the plan that's a current plan yeah so the question was when is open mp 4.5 uh what is the timeline for that