 And he studies insects, stick insects. Peter? G'day, well I know that you all want to be here because I didn't make it onto the little booklet so you must have actually found out and wanted to be here so I'm really pleased with that. I thought I'd start first with a little bit of a spiel and a little bit of a demo. 2010 I was undergoing chemotherapy for leukemia and I wanted a project that I could work on for that year where if it turned out to be complete crap I could just throw away all the commits for a whole year. So it occurred to me to have a look at a project I last touched in 2007 and I gave a talk in 2007 at LCA about that. And unfortunately that's assumed background for some of this. I expect a lot of guys are going to be following along on their machines. I know I do with talks so there'll be links. This is on the website but it's a little bit challenging to find. SourceFord just been having some issues lately and I would have made sure it was all up there and ready for you. So the compiler I'm talking about is written in C++. It's a cross compiler for a very old dialect of Pascal. Back in the very early 80s I was playing with Apple Pascal. A number of people were. And I always loathed a number of properties about the compiler and I always wanted the source so I could fix it. Of course that was a dire secret and that sort of thing didn't happen. When I gave the talk in 2007 compiler was proof of concept. I could get very basic functionality to happen but it really couldn't compile significant programs. And of course in 2010 however I got the compiler to the point where it could compile the UCSD Pascal operating system sources. It could compile the UCSD that is cross compile. Cross compile the UCSD Pascal native compiler and the associated tools that go with that. And there are instructions on the website for how to go about building all of the necessary pieces to make that work. There's a cross compiler, there are file system tools and there's a virtual machine written by a talented bloke called Mario Klebsch which I repackaged with his permission and put it on SourceForge with a matching name. So those are the tools I'm going to be using for this demo. And one of the features of the website is there are pre-compiled disk images. For the last 30 years people have been using bootleg Apple Pascal disk images when they needed to get their retro computer fix. As of about October 2010 I've got disk images compiled from Source from the 1979 UCSD Pascal original sources. These are available under a moderately permissive non-commercial license. UCSD published the license on their website and they couldn't find any source code. But once the license had been published lots of people went, oh here's some. And so now they've got source code to go with the license. They didn't originally, they couldn't find their own source code. But they were sure there was some out there and if they would let people say so people would say where it was. This demo is small, simple, wrong window. And I've already downloaded the system volume disk image. We're talking Apple Pascal at the time had huge disk images of 140K. They're doing much bigger than other people with five and a quarter drives at that time we only had 120. So we've got a disk image, I've got a working directory and the instructions say make a file called hello.txt. Let's say it's Hello World. Now with the necessary software installed and being a good open source enthusiast these are all available on a Launchpad PPA if you're using Ubuntu Linux and all you have to do is app get install goodness and it all happens. I'm currently packaging the OS but ASourceforge has problems and I actually hadn't finished that prior to the conference but it should be possible in the next week or two if you're able to say app get install your CSD P-System OS and get that too. There's also a reconstructed user manual available that was taken from scans of the dot matrix manual and OCR and hand corrected and stuff so there's full documentation available as well which is kind of a surprise. The license for that is slightly less clear but we're guessing it's the same as for the source code. So let's go on window. Let's close that one so it doesn't confuse me. So we can compile this and we get our enormous executable. There's a wrapper which says turn the current directory into a disk image and use that system volume as well so we'll have two disk images mounted and here we are, welcome to 1979. If anybody ever used Apple Pascal, I realized some of you are much too young. This is very familiar except for the extra extraneous line which I inserted so people wouldn't be misled into thinking it really was the original. These ever so slightly changed to fix a couple of compiler bugs that I found in the original sources and a couple of OS bugs I found in there too. So this does a bunch of things. It actually adapts to the size of the terminal and lies to the Pascal programs as to how big the terminal is to get it right. Used to read it out of a static file for example in this many places. So we can execute, sorry, I just have to change directory into the work volume. Now we can execute our hello program and we get hello world. So our cross compiler is capable of making an executable. The fun thing about this is this is reconstructed from sources so we can also compile this with the original UCSD native compiler which has been cross compiled by my cross compiler. Slightly more involved, slightly more complicated program and it compiles at some astonishing number of lines per minute. You've got to remember on an Apple Pascal that exercise would have taken several seconds. Not very slow on a 1MHz 6502 and we can execute that and once again it says hello world. So that's my live demo of back to the future 1979 and to get out of there you say H for Holt. So what I wanted to talk about today was of course this compiler and one tiny aspect of one of the things I learned I got to the end of this and I got to the point where fantastic this thing now actually builds from source. I had email conversations with people who worked on it in 1979 and said now I've got the sources. How did you guys actually build it? Don't know. There's a bunch of files in there for the disassemblers and the assemblers and things and they're binary formatted opcode tables. Where's the source for the programs that wrote the binary files? Don't know. So there's a whole bunch of stuff which is really interesting. So anyway, I was thinking wow I've actually got it to work and people said really well done and like 68 people have visited the website. But as a project I could throw away if it was complete rubbish. It worked very well. So what else did I learn? I mean I actually achieved something which I thought was really good when I'm suffering from leukemia and could hardly stand. But I actually learned something that I could learn about C++. So today I'm talking about an aspect, a single small aspect and I'm using examples of that technique that I applied to the cross compiler. Now, like knowledge up front, C++ is a horrible language for doing compilers in. Pick something else, almost anything else. But it was a language I knew well I was comfortable using and I didn't have to wrestle with that aspect of a project I chose to work on. So let's let's work our way through. Now this is a really fast revision of the 2007 paper. There's a link there if you want to do a quick abstract syntax tree revision but I'm sure judging from most of you you're not going to need it. So one of the fundamental things about the cross compiler was that I actually wanted to be able to retarget the grammar that prompted my 2007 exercise. I wanted to be able to take the one grammar and instead of hacking the .y file to put different rule bodies in for every different tool I wanted to be able to share it across cross compiler and say pretty printer as an example. It turns out I've got a bunch more of them these days. I actually needed some exactly accurate C++ code to do the linker the same as the original. So since I had a pretty printer it was a very small exercise to clone that and get it to be out C++ instead of Pascal. I also looked a hell of a lot nicer. This reminded me why I stopped coding in Pascal in 1979. Really awful. So the fundamental revelation in this was to use an abstract base pointer and all of the syntax rules go through the abstract base pointer. It means that I can retarget the grammar and leave the yack grammar entirely alone and simply derive another class. Fundamentally it maps onto the grammar itself so you get abstract syntax trees as a consequence. And there's an abstract translator. That's the context pointer. There's an abstract statement. There's an abstract expression and there's refinements all the way down. So the neat thing is that it's virtual and we wind up with well actually a really big abstract base class with about 140 factory methods. So now and we can churn out interesting object. So that's the revision. Follow the link if you need it slightly slow. So first revelation is what you're looking at is how C-compilers used to be written when I started working on C-compilers in the early 80s and they typically would have a C struct with a type specifier and then a union which was indexed accessed according to the type of thing it is. Now really horrible C++. How do you write that in C++? The first thing to realize is this is a type-based dispatch. Frequently that discriminator member in the struct was called type or a synonym of type. So C++ has a much better type system than C which is 100 times better than 0 is still 0, right? So how do we use this code so that instead of me doing type-based dispatch manually, I can get the language to do the type-based dispatch for me to maintain the type-based dispatching machinery for me and save me some effort and this is why I wrote it in C++ and not in C. So our example here is of an expression and we're going to which method I choose. Code generation. Okay. Near as possible I have used the actual op codes that the cross compiler uses but you'll find that some of the classes and some of the other machinery has been vastly simplified. This code, all this example code uses simple actual pointers the implementation uses smart pointers which are neither smart nor pointers. But the point to be made with this slide is that the actual guts the actual piece that does the work is the same. It's only the machinery has been done automatically for me by the compiler. If anybody knows any of the ways that virtual methods are implemented you'll realize that this is no slower than the original C code and there is a chance it is faster. It won't be any slower. I'm going to come up with some other examples later. So the challenge here therefore is I have convention in my code that one class, one source file which means that of course suddenly I have an explosion of source files where we didn't have it before. So the tags and all the rest of it is your friend because you've got to be able to navigate the sources. So the next problem comes along in that there are some other things I want to do. When I put the grammar together Pascal was designed to be an LR1 parsable and then people went and added size of and it stopped being an LR1 grammar because you now have an ambiguity and the only one on identifier when you say size of identifier. Is it a type or is it variable? Up until that point Pascal was an LL sorry not even an LR1, it was an LL1 grammar easily parsed with a recursive parser that you could understand why on earth would you need YAC but it got more complicated. The second thing that did was it would frequently say syntax error at me when I wrote perfectly sensible code until dredged out of the back of my consciousness was the fact that assignment in Pascal is written colon equals that I've been coding in C and C++ for so long getting that colon to come out of my fingers was very hard. So the compiler would keep going syntax error, syntax error and what I really wanted was it for it to say doofus I bet you meant colon equals there I needed it because I was really suffering. The reason we're saying syntax error was of course a statement required a left-hand side and a colon equals and a right-hand side in the grammar and of course an equal sign can't appear in a left-hand side expression so syntax error bad person. So I rearranged the grammar into much more C-like grammar and it would say expression and then it would look at it to see whether it returned anything other than void and bitch at you and unlike the C semantics where an assignment returns the value I made an assignment return a void so that all of the other type checking wouldn't bitch further down the line and then I was in a position to say well I'm looking at a single expression and it's an equality test instead of an assignment so probably you made a mistake doofus fix it. So now I got sensible error messages again but this changed the game because how do you know if an expression's right? You don't right until you see that assignment operator and doesn't matter statistically most of the time you're going to be on the right-hand side or effectively the right-hand side you want the value not the address so an arbitrary expression seeing and identify in the grammar it must be a variable access the variable and when it's discovered it's actually on the left-hand side we now have to growth it, pull the address out of it and turn it into the appropriate assignment operator assignment abstract expression tree node not quite so abstract anymore, very definite one but of course in any program known which whether it is a virtual machine for UCSD Pascal or actually native compiled C code different variables are accessed different ways, variables on the stack are accessed as a local variable global variables of course are accessed by address they're not stack relative they're not indexed off the stack point or anything else and then you've got external variables that have to be linked later Apple UCSD Pascal actually has external variables that get linked later it's got all of those facilities and they're all working in the cross compiler, yay so we have to rearrange our expression tree and we we can do this in a moderately kind of sort of really fugly type safe way in that we can find out if we've cocked up with really slow nasty downcasts in C++ at least the compiler will tell you if you really really screw the pooch these are slow and get address, I mean that getter exists solely to grow its privates so that we can build another star expression so aesthetically it reeks twice, not just once so it's slow and it's groping privates and we've got getters that would be really nice to do without and if all of those things fail then we have an error dumb thing to do so it turns out that is let's go back one, thank you it turns out that is another example of a type based dispatch test the type, depending on the type do something else, test the type depending on type do something else but instead of a switch we're doing it long hand and we're trolling through this stuff so how do we rearrange this code to use the type based mechanisms that C++ already provides to do type based dispatch virtual method again so here we go we wind up with the same mapping I did before turn to your virtual method the amount of codes the same the lines of code that do the real work there's just the same number of them but this definitely goes faster I'm not doing any downcasts with the nasty tests I'm going straight to it the trick is where do I put it in what class do I put make me an assignment expression I actually put it in the expression itself the expression, the very thing that I need to grope, instead of groping it ask it to do it so the piece that goes in our grammar now says right hand, left hand side I know you think you're right hand side but please turn yourself into a left hand side and here's your right hand side make an assignment please and that's what we've done the expression object now has a factory method of its own saying please manufacture a new assignment abstract tree node that's appropriate the second thing to note is no more groping privates don't need it anymore that getters gone because it's private and it stays private which is nice and the last of all thing is that where do we put the error message oh shit it's easy default implementation in the expression class for addition opcodes for function call opcodes for multiplication opcodes they all go what a dumb thing to put on the left hand side of an assignment error message so there we go comes out nice and clean and it goes faster no more downcasts bang one in direction whooped straight into the appropriate method not that we've got a real speed competition happening here the original compiler compile that a few lines a minute this is this is doing all those compilations sub one second for the largest compiles that I can find the compiler itself the native usd compiles in less than a second we're screaming along compared to our one megahertz competition but that's not the point actually this is way higher on the aesthetic meter downside because I keep insisting on one one file per class because I think it's cleaner things get interesting one of the pieces one of the techniques I've been using for many years and it actually makes it much easier to navigate is I make the class hierarchy in the direct directory hierarchy map onto each other so now you can look at a class name expression expression ldo and know that it's going to be called expression slash ldo with some dot cc or dot h with some decorators depending on whether it's in the library or one of the tools so now we have a factory factory we've got a factory manufactures an expression that in turn later manufactures another expression so we have a factory factory yay interesting fun part of this occasionally even me who wrote most of this stuff you look at a code fragment how did I get here what makes it go and how does somebody coming to the sources cold figure out what the heck is going on I'm going to write another expression tree and I know that I have to do the code generate method and I glue that in and I don't have to write anything except the code generate method yay even the pretty printer by the way has a code generate method but he generates text same deal so and we've got our global store opcode it turned out in the end instead of calling it store global to actually call it after the opcode made mapping source code onto what the heck was going on was much easier so the naming convention that you'll see in the source code is that the expression trees when they're no longer abstract when they're quite definite they have the opcode name I don't have the second level in direction that GCC has but how did I get here but if I'm adding a new expression tree node do I care the piece I want to debug is this piece of code I don't want to debug the other machinery the machinery's already been debugged for all the other opcodes that I already did just this one I want to work so do I care well yes and no I've got a slide later on which enumerates all the levels but the levels are there and the levels are there in the grammar and the grammar points you where you want to go because you've got that left-hand side pointing at right-hand side factory please sitting there in the grammar the grammar tells you the levels in the grammar tell you where you're going to be but yes there are times when even I look at the go think how did I get here I've got to debug it but how do we get there how do we write a test case this thing comes with 570 something test cases test cases turn out to be easy to write because you just write Pascal you know a colon equals 1 it's not it's you just write the test case the grammar's already been debugged the expression junk has already been debugged I'm just testing global opcode store instead of local opcode store that's the only thing I'm interested in so you can get there but your second problem is hang on that explodes I mean if I've got factories making factories don't I now have this enormous cross-product of space to test well yeah it's a language the problem space already had the cross-product in it it's just that the code is the same shape now so yeah you do get an explosion is it unnatural no in fact I think it maps on the code very well does it make more testing required rather than less well no I could have written it the old fashion manual way doing the exact explicit thing because I have the same amount of code doing the same job but the type based dispatch part being done by the language for me instead of me maintaining it manually so we don't have any more code we don't have any more test cases and we actually don't have it terribly much harder to navigate except that I exploded it out one per file instead of one switch case per so yeah there's a few more files to navigate eTags is your friend so yes I don't think this makes the problem worse but I don't think it makes the problem worse so I should have remembered I had this slide so this demonstrates our leftness versus our rightness and we generate a test case x equals 1 in order to get our appropriate piece and it all reaches the methods we're after so so I did this talk had more factories in it another place in the code that I needed to explore you've just seen an identifier in your grammar what is it is a type local variables a global variable is it an external variable there's actually a bunch of other things could be so we take this and our grammar says just build me a name expression because I might be a pretty printer I might be a compiler whatever and inside of our compiler we've got our name expression factory now we can do that we can implement that name expression factory two ways we can laboriously grope it for each sort same as we did last time for assignments only this would be for local variables and it has the same it can be dealt with the same way taking this explicit type based dispatch turning the handle turning it into let the compiler do it for me I'd rather it did it and again it's faster does it make any does it add additional complexity no the bits that do the work are the same and that's the point we're not doing any extra work we're getting there faster we're not adding test cases we're not adding test complexity it's the nature of the problem that contains the complexity but the way we're using the tool now reflects that but now it gets really complicated because how did I get here I've got a factory factory factory not only did I have to figure out what sort of variables and leap into the thing and then that variable what is the variable how did that work again where was the natural place to put that dispatch this time we said to the symbol symbol make me a right hand side expression mode that's our next factory it's manufacturing the appropriate instance might be a pretty printer node it might be a pretty printer symbol it might be a compiler symbol it might be a one of the back ends that I want to write as a doxygen documenting thing because this compiler is big and it's written in 1979 and it's disgusting horrible I'm based off the p4 compiler by Ollman who didn't even speak English well he did but not very well so yeah it's been through a few iterations so yeah we've got our factory factory factory so the problem is we got here now we haven't actually changed anything we haven't changed the complexity it's the same deal how did I get here we're trying to debug that's not the bit you need to worry about this is the bit if that's the opcode you're adding that's the bit you need to worry about and again we have this now I promised you that we'd get a list of this is the sequence of how I got there and it reflects the grammar quite well the different layers when you break out the abstract syntax tree so we have a bunch of methods that get called in sequence and it arrives there and it's actually really short it's not very deep which is nice I figure I can get my head around this because it's not very deep and it's not very many lines of code so I actually like this better after I got my head around the cross product of how big the problem space was oh hang on it's a programming language of course it's got a big cross product it's meant to have an infinite cross product to comparing the code that I started with huge switches, multi-multi-multi numbers of pages into much smaller pieces that I could get my head around even though my brain had been largely switched off by the chemotherapy drugs did you ever wonder where those symbols came from same trick, same technique what scope is it oh here you go, there's a variable name there's its type declare it, hello scope, declare one so now we have a bonus factory in the title just for you a factory, factory, factory factory they use kind of all over and we still get the same ugly object out at the other end so this is the piece that I realized that there is numerous places in my code even though I've been writing C++ for longer than I think I should probably admit I was still writing type-dose dispatch manually when I thought I was writing good C++ code and it turns out that there was something to learn even this many years later about exploiting language better and I think coming up with actually clearer code so if you should happen to take a wonder in the code there's a bit more abstraction going on to maintain less code rather than more if you look at the code one of the things that I've tried very hard to do is to eliminate copy and paste because that's a design error largely and so I've tried to use derived classes very smartly to avoid any possibility of copy and paste where possible so that was my insight after 8 months of chemotherapy type-dose dispatch you can still do it better so up the back right I see these recalcitants of learn right if we can throw in a fifth level to dispatch the output whether it goes to pretty print or code there might be one or two more places that it's actually done so particularly one of my gripes with the old compiler was that it didn't optimise very well in fact optimised almost not at all I've got infinite amount of memory compared to a 48k Apple II so it won't surprise you that the optimised method is a factory anyone else is the optimisation something that can be portable enough to fit into the same factory architecture can you optimise for different architectures in the same way well you could use the technique I don't believe it's going to cross CPUs very well but you can certainly optimise that way you can write yourself an optimiser that would do that I've been thinking of writing but given that 68 people have actually bothered to visit the source code the web page and fewer than that have downloaded it I don't think it's a good use of my time but one of the things that has been suggested is to read in old 1979 P code optimise it and write out brand spanking new much faster much smaller P code but no I don't think it's going to cross but the optimisation by itself is used in all of this is I've tried very hard to have everything be right once I'm trying to do a bunch of functional techniques and particularly single assignment so that the optimised method for example does not operate in place it manufactures a new expression instance it will recycle as much as it can of down further below of course but it manufactures a new expression instance and that's how we can look at the example of a store global it doesn't come out that way initially it is address of variable indirect assignment tree and those the address of and the indirect get folded because we didn't do an array calculation and they all get smushed together into an immediate store opcode and that's done as an optimisation because you don't know at the time what you need to do and this is a virtual machine that I'm compiling to I'm not compiling the 652 assembler and interestingly if you read up on some of the history of Java this virtual machine that inspired some of Java's features which is probably why some of it is so bloody ugly actually these guys were writing at a time when Unix was already working Feldman wrote make in 1977 and this is 1979 year of code so there was already an example of how to use a pdp11 much better than what these guys were doing with their pdp11 in 1979 but they didn't realise that it's interesting isn't it the LCA is coming to an end and you guys are starting to slow down where's the fiery questions what happened to yesterday oh look we got a game one here we go design patterns what do you think about design patterns and how do they how do they map to the use of multiple levels of factory in your design the multiple levels is a reflection of the nature of the problem it is not a reflection of the nature of the pattern the factory pattern is there the factory pattern makes a great deal of sense letting the language do it rather than doing it manually makes even more sense it's got a better chance of optimising it than you have manually as your situation changes initially a good author will be able to write better than the compiler for a medium mature compiler but as a compiler is mature no your ability to out guess it is probably less as a problem space changes as the available developers changes the chances of them doing better than the compiler are slim so much prefer letting the compiler do this than me doing it manually and having to maintain that machinery manually if you've ever done this in C of course you've got address of function pointers all over the place and in direction then life gets grim or you're writing something like the kernel where it's absolutely essential but yeah I use design patterns a lot some of it has been an exercise of oh look I've been using that pattern for all those years but that wasn't what I called it for people who are experienced coders they're frequently using design patterns and not knowing design patterns are very real and I think going out and discovering more only gives you more tools in a toolbox so you go and read the books and you read the papers and you you hang on to the ideas because one day it's going to be oh I know how to do that you've done a great job mate please take this an appreciation from LCA it's been a very interesting talk most of it was way over my head but that was still bloody great there Chaps put your hands together for him please one of the things you probably don't know about unless you've been keeping up to date the next talk in here there is one now it's how to lie like a geek