 Hello, my name is Christophe de Dineshain and my talk today is called The Pet Projects of Dr. Frank Einstein, or How I Learned to Stop Weighing and Love My Monsters. First, do you have a pet project? Do you have one of these crazy ideas that nobody understands and that keeps you awake at night because you really have to make it work no matter what? What do I call a pet project? Well, we may have various reasons to develop software, like ambition or being lazy. Maybe you do that for the big bucks, getting paid to write code, or for the warm and cozy feeling of being part of a great community, or having fun solving tough problems. Maybe you want to fight some injustice in the world, but here I will be talking about scratching an itch, something that really annoys you and that you have to fix. And first, I'm going to explain how to do it all wrong. Illustrated with my own pet project, with a guiding principle, too big, not to fail. Because you see, I have about 25 years of rather zany pet projects under my belt. And when you fail for that long, you need to redefine success to keep going. So I hope that my mistakes will serve as a warning for others. It all started in 1995 when I was young and naive and I thought I could invent a programming paradigm called concept programming. Of course, that came along with a programming language Excel. In 2010, I went to commercial and published something called Tower 3D, a real-time interactive tool for 3D presentations. In 2017, I joined Red Hat and extracted two smaller projects out of these larger endeavors. One could make it quick, auto-configuration using only GNU Make, and one called Recorder, which is a flight recorder for C and C++ programs. To be fair, chances are that you've not used or not even been aware of any of these projects. So by that metric, they failed. And I'd like to analyze why and explain what they are and how they failed. Also, what I learned from that. Let's start with concept programming, or the art of turning ideas into code. This was presented at FirstM in 2020. And I invite you to watch the talk there if you want to know more about these ideas. What is the itch there? Well, why does the code not look, not behave like the concept models? Turning ideas into code sounds quite easy until you actually start doing it. And then you realize that whatever you have in your mind has very little bearing with what you can fit in the computer. And so the code behaves differently and it's frustrating. So the first idea I had there was to study how we turn ideas into code. And I can't do that by planting electrodes into your brain to compare that with the code. So instead, I came up with pseudometrics to evaluate differences between code and concepts. I hope you will understand what I mean. The first one is syntactic noise. That measures the fact that the code does not look as expected. An example in Lisp would be where you write plus 2, 3 to add two numbers instead of 2 plus 3 as in mathematics. Semantic noise measures the fact that code does not behave as expected. For example, in small talk, when you write 2 plus 3 times 5, you're passing messages around objects until you end up with value 25 instead of 17 as in mathematics. When I pointed that out to Alan Kay, the inventor of small talk, he responded that mathematics were wrong. Another pseudometric is bandwidth, which measures the amount of code of the problem space that the code covers. For example, most languages have a plus that covers floating point values, but OCaml has a smaller bandwidth and you need a plus dot operator to add floating point values. Finally, the signal to noise ratio measures the fraction of the code that is useful. On the C++ code snippet you see on screen, all the stuff in red is noise. You can remove that. Now, what is very interesting with this analogy is that much like in music, what is noise to one person may be music to another person. There's a lot of taste involved. Maybe people are really happy with all these parentheses and three columns. So what is the effort in this first project? Well, it's mostly slides and talks. So this is just an idea. And as Nino Storvat famously said, talk is cheap, show me the code. So the real effort comes next, specifically with Excel. Excel is an extensible programming language, which is designed to take advantage of Mohr's law instead of being defeated by it. This was presented at first there in 2020, and the each there is why do we need so many programming languages? I believe the reason is that Mohr's law initially thought to apply to hardware also applies to software. As the hardware complexity grows exponentially, so does the software complexity. But since our brains are not designed to grow exponentially over time, the complexity defeats programmers over and over. As a result, we invent generation after generation of programming language just to cope with the increasing complexity. It all starts well. When you create a new language, the tools work, and two guys in a garage can make a ton of money. Everybody is happy. But if you keep using the same tools over time, you fly under the curve. Tools become outdated, you need more hands, you have delays, cost over runs, the boss isn't happy. The problem is that most of the time, most programmers use tools from the previous generation. And so you spend most of your time under the curve, and that's why programming is so hard. The idea of Excel was to take advantage of Mohr's law, to create an on-demand language with an ecosystem of domain-specific languages. Well, Mohr's law would help instead of being a foe. So as time goes by, maybe you hit the curve, but now you have more CPU power, more memory, more disk space, more whatever, you can extend the language and keep moving up like this. There is a proof that this approach works, and it's called Lisp. Lisp was one of the first programming languages invented, but this was also the first one to normalize object-oriented programming. How did they do that? By writing object-oriented lists and rewriting it into non-object-oriented lists. To do that, you need a property which is that your language needs to be extensible, and the language needs to be able to manipulate itself. If you want to push the idea to the limit, you want the language list as small as possible. In other words, a good metric of perfection is to quote Saint-Exupéry. Perfection is achieved when there is nothing to remove. Let's take the parse tree, for example. In Lisp, it's a list. In Excel, I wanted to also obey the ideas of concept programming, so I did not want to have the plus 2, 3 syntactic noise. So I came up with the smallest possible parse tree I could think of that would allow me to represent programs the right way. That involves having, for example, an infix node that represents things like A plus B. In terms of semantics, there is a single operator that can be used to define all others. It can be used to define standard programming entities like constant functions, operators, notations, types, optimization, programming constructs, and so on. Again, I'll refer you to the more extensive talks I gave about Excel to know more. What is important is that this allows me to create a very small programming language because most of the stuff can be moved to the built-in library. So now you can have fundamental operations in other languages like arithmetic being part of the library. Same thing with basic programming constructs, like if than else. So that makes the language much smaller. But it also makes compilation and optimization more difficult. What you see on screen is basically the direction I'm going with this right now. If you want to see older approaches, you can look at the Excel repository for details, notably the Excel to self-compiling compiler. I'm going to talk more about this in a while. So this makes for a weird and hard to compile language. This is what makes it fun. For example, it has self-contradicting types, a type that describes entities that are not in the type. It has the need for advanced optimizations, even for the simpler stuff. Since the one loop is not built-in, you need to be able to define it as soon on screen. But that means you need, for efficient implementation, you need to understand tail recursion right away. And selecting the best compilation strategy is not obvious. What you see on the screen at the bottom is the definition of the mean function, the minimum. And the problem of this function is that you can implement it in a variety of ways, with dynamic dispatch or with static type analysis, etc. And finding which one is the best is hard. So how did Excel fail? Well, it's an ongoing failure. I would say it's failing forward again and again. This language had more reboots than the Amazing Spider-Man series. It started in 1995, as I mentioned. Back then it was called Longage Experimental. And I show on screen a reconstruction of what I think it looked like based on a more recent email. Around 1998, I moved to California. And with another guy, we came up with the idea of a parse tree API. So the guy is called David van der Waal. And we created a separate project for that called Mozart with the idea of having compiler plugins that would manipulate the parse tree structure. There was an article published about that in Dr. Doves called Mocha Java to Java Compiler. But that was too complicated. The API was really complicated. We had hundreds of node types. And so simplifying this structure with only eight node types, which is the structure I still use today, made for a much, much simpler approach, but that of course required rewriting the compiler from scratch. Around 2003, I met an NK, the inventor of Small Talk, at a scientific conference at HP. And I was very proud of where I was with the language at the time. So I showed that to him. And he asked a single question. He said, does it self-compile? I said, no. Said, come back to me. So NK said, come back to me when it does. That was devastating to me. But then I started working on writing a self-compiling compiler. And I learned a lot from that. Notably, I learned that having an extensible language makes compilation very interesting. For example, you can have an extension dealing specifically with the process of translating parse trees. And you can do that in a distributed way. The translation Excel Semantics statement that you see there, which is taken verbally from the Excel 2 compiler, as a self-compiling compiler, that the multiple instances of this translation statement are collated together by the compiler, or by a compiler plug-in called translation. And from that, it builds a single function called Excel Semantics that takes a parse tree as input and generates a parse tree as output. And so you have a program concept that is distributed across source files, a bit like aspect J, aspect oriented programming, if you're familiar with that. Another interesting idea that came up around that time was the notion that you needed a way to focus a specific plug-in on a specific piece of code. For example, if you have a plug-in that does symbolic differentiation, you don't want that to apply on all your source cards. You won't need a notation to focus somewhere. That's the example on the first line with differentiation. Another thing that came up also was conflicts due to plug-ins with the base semantics. An example here is the constant for plug-in that will replace 1 times x with x, then say, oh, it's f of x minus f of x is the same thing on both sides. Let me replace that with 0. What is interesting that once it has done that, the f is gone, so that code will compile even if there is no function named f, which is a bit weird. The language was also extensible by having a text-based backend that could generate C and Java. And so it would basically generate C source code. Around 2008, I noticed a project called LLVM that had the ability to build just-in-time compilers. That was very interesting. Much like Java, you could run the code and it would behave like an interpreter. So I decided to create a backend for Excel 2 called XLR for runtime that was based on LLVM. And for some reason, I decided to take an example of a small functional programming language called Pure. And that language became so interesting so quickly that it basically became the language. I gave up Excel 2 very quickly. And so that's more or less the ancestor of Excel you see today. Now around 2012, I thought that another way to do something that would work well was to do a small interpreter. That would be a reference implementation. So I tried to do a very small interpreter. And that rewrite had an interesting property. The symbol table itself was represented using the parse tree. That meant I could send the source code, serialized, and the symbol tables over a wire. So now I could distribute pieces of program around. And that's how Iliot or extensible language for the internet of thing was born. Where you see an example on the screen here, it's a small script that you start on one computer and it's going to ask to two other computers to compare the temperature, in that case, a reading of the temperature sensor. So this little example that fits on one screen actually runs on three computers and synchronizes between them. All in all, one of the problems I had with Excel was that the type system was really bogus. And I wanted a type system that was well suited for parse trees. I came up with the idea around 2016. And it's fairly simple. It's just the types of trees is the shape. It's, so you match a pattern. So very simple, but again, that required a practically completely right of the language. So as a result, Excel is a fairly large project, again failing again and again. It has about 4,000 commits, about 10 authors. And because none of these authors would stick around long enough for the project to stabilize or they got ejected regularly when I rebooted. So the result is that the second author only has 202 commits and only 267 commits are not by me. I basically discourage the community by rebuilding again and again. It's not otherwise a large project. The largest part is actually the documentation. So I would say that this project, exemplifies a pet project that fades forward. And I think that's okay for a research project. We learn as we go and we decide that there's a better way to do things and we restart. So why not? That being said, there was a stable version of Excel called Tower 3D, which was a tool designed to create better presentations for engineers. And I gave a talk at defconn.cc about Tower 3D, but that talk was lost. So I invite you instead to look at the videos on the Tower 3D projects. Now, Tower 3D faded in a very different way, but when was that itched there? Well, the presentation tools are not sufficient for engineers or salespeople. You want presentation tools that are more dynamic, interactive, that use graphic effects, that can change languages on the fly, that can be used present objects or models or user interface examples, scientific data, mathematics, you name it. So all these things are a little hard to explain or to present with tools such as PowerPoint or Google Slides. What was the idea for Tower 3D? It was basically to use Excel for real-time interactive 3D documents. What was needed there was to add reactivity to Excel and build some graphical user interface around it. Tower 3D is a derivative of Excel and you've been watching a demo of that product since the beginning of this talk. What you see on the right is the actual source code for the slide you're watching. So how does that work? Well, it takes advantage of the functional aspects of the Excel language and you can pass code around and it looks declarative in practice, you're really executing code in real-time. However, for efficiency, it's also reactive. You have events that can drive reevaluation of part of the code, like depending on the mouse position or basing depending on time events. So that makes it even weirder than the base Excel. For example, here I can edit a piece, I can move an object here, but since there is a single ellipse in the source code in the for loop, when I move any of the ellipses, I move all of them. Find these both weirder than cool. So what is the failure mode for Tower 3D? Mostly it's too big now to fail because the product never attracted a user developer base. So at some point I may need to get up because you see the problem is that the number of current users and developers at the moment is exactly one, which is insufficient for a project that big. In particular, because there is a serious un-ongoing bit-rot of the LLVM engine I depend on. That's the same problem with Excel, but it's worse because it turns out that at least on Linux, some graphic pipes depend on LLVM as well. So now you need to find a version of LLVM that is compatible with both your graphic engine and your Excel language. That project grew a bit too large for the project. 9,300 commits in the main key repository, but there are 38 sub-modules for things like video graphics, 3D objects, and so on. In terms of community, it might look a bit saner with 10 authors mostly the same as in Excel and 3,300 commits by the second author. The problem is that this happened while the project, where we attempted a commercial venture and so we were sort of paying ourselves to do that and it's not really an open source community in that sense. In terms of lines of code, it's a fairly big meaning that it's a bit too large to be maintained easily by a single developer. So as a result, this one failed because basically even I, the only person who's still interested in doing it, have a lot of trouble revealing a runnable version of that. I described the build process as a bit similar to a self-compiling intercal compiler. It's really tough. So I decided to try to extract smaller projects hoping to draw some interest in some parts of what I had done over time. And one of them was make it quick, auto configuration using only simple make files. That was presented at DefConf US in 2018. And the problem there, the itch, is building for multiple platforms always felt overly complicated to me. I see auto-conf as being a rather complex non-solution to a relatively simple non-problem. Initially, the problem was complex. There were many, many variants of UNIX. But today there are much smaller number, yet we end up with a really complicated solution. That does not even solve the problem correctly. I invite you to watch the talk for more details. For now, let me just show you what I likened the problem to with this multi-python body. If you're on Linux or on a well supported platform, AutoConf is able to answer the questions relatively well. And you are in a standard use case, things go smoothly. So it looks easy. And you might be tempted to say, okay, it's going to be the same if I run, for example, on another platform like Mac OS. And the problem when you do that is that while the tools are there and they have the same user interface, so you look, it looks like you can basically use your standard comments, the tools only know how to answer very simple questions. When you hit something slightly more complicated, you may end up with a tool that does not know how to respond and you lose. That's bad. Now, you might think that on Linux, you're always, it always behaves right, but that's not true either. There may be cases where you want to do something that is outside of the purview of other tools. For example, building static tools in a built environment that was initially designed for shared lives, and again, you lose. Finally, the last case where other tool does not really help you is if you know more than what other tools knows how to answer. An example of that came up with Spice when trying to figure out how to deal with the Objective-C modules in Spice for Mac OS. And this required analysis of the source code that did not know how to do correctly. So if you know more than other tools, other tool does not help you. So the idea there was to instead do the auto configuration using standard Mac file rules. It's a small set, it's about 1500 lines of code total. Now you can compute the configuration and you can do that in parallel. And that's really a refined and improved version of the Mac files that are used for Excel over the years. Now one of the benefits, you see that's applied to Spice. Here on the left, you have an auto-count build on the right to make it quick. And you see that it's already compiling. The configuration happens as part of the build and in parallel, and you see that you end up with a build much, much faster. In this case, 16 times faster than auto-count. You can do a debug and an opt build long before auto-count is done building its first build. So there is this speed win and that's not the only one. Now you could say, did this product fail? In reality, yes, it did. It's small, it's efficient. I think it's useful, but it did not gain in popularity. In large part because of mistakes I did, I missed a few golden opportunities and as a result, that never gained any sales traction. So I did get a few users and quite frankly, this is a project I can recommend you use. If you start a small CRC++ project, it's really much easier to write your Mac files correctly with that than with other tools. For larger projects like Spice, it doesn't take that much of an effort and I could actually even integrate it in Spice in a way that was compatible with Autogen. I could measure vast improvements in builds, not only in the size of the description for the builds, but also in the build time. But the Spice team still went with Mason and they had spent six times switching. So I was not too enthusiast when I tried to also look at QMU and the Spice experience showed me that there are some things that I cannot really easily parallelize and QMU had a very smart build system that parallelized everything. So QMU also switched to Mason, but this time I did not invest that much time trying to make it work. As a result, MakeitQuick has very low traction and there aren't any large projects that use it that I know of. The transition cost still exists. You can't simply convert a make file, an auto make make file to make it quick. There is a not invented here aspect when you come to a project and you say, why don't you build that way? And boy, did I hear, oh, you don't have a team maintaining it because you see Autotools is like, Mason is about 40,000 lines of code, Autotools is also quite large. So it's much, much larger than MakeitQuick, but I'm the only person maintaining that stuff so nobody cares. Last project, the Recorder Library. So this one is a Flight Recorder for CNC++ program which I developed and I presented at DEF CONF US in 2018. And each there is, why is it so hard to know why a program crashed? Well, the problem is to know what happened before a crash because we have a very short lived memory. So knowing what happened just before is difficult. Maybe someone hit the wrong key. So you don't know how to reproduce and sometimes you have hyzen bugs that only happen once. It's hard to see them happen. Your users may have other things to do than reproducing your bugs. As a result, you end up with bugs that have lived for a long time, you close them and you never know what happened. That makes everybody unhappy. So the idea for the Recorder is to have a printf like always on instrumentation with side benefits. And one of the side benefits that I'm showing here is that you can actually extract data and graph stuff. You see here an example with Spice graphing at the same time, the Spice server, the Spice client and the Spice agents. All this with printf like instrumentation that can also be used for other purpose. So the effort there was relatively small and the problem is that the project still gained no real traction. To me it was instrumented in understanding the Spice smart streaming problem. Unfortunately, Spice is dying project, I think. So the fact Recorder was an old idea that I had implemented in other jobs. So for me, this was relatively quick to develop. And it was seriously refined, had a few iterations based on Spice feedback, like make it more shallowly friendly, having a printf like syntax, using Fedora packaging, a man page and so on. It's a small project, only 360 commits. And I'm happy that at least another author contributed significantly to it. But, oh, and the good thing is if we place the original Flight Recorder in Excel and Territory, so that's so much less care to maintain with something that actually is better. It still has very little use. So why? Let's compare with projects that did it right, small projects that have a community. The first case I want to talk about is Deep Publish, which makes it quite easy to publish batch series on a mailing list. The problem statement is extremely simple. And the stats of the project tell the story of a project that is contributed to 24 authors, only 246 commits, so 10% authors of the commits, 25,000, 64,000 stars on GitHub. The community is mostly the third team at Red Hat. Another case that I want to examine is Bichon, a terminal-based user interface for reviewing GitHub requests. Again, the problem statement is extremely simple. And again, the stats tell us that the project is small but actively used, 320 commits, seven authors. It's a bit larger. It has 10, 4, 12 stars, a number of open and closed issues. Again, the community is mostly the third team at Red Hat. Let's look at our third one, Q-Boot. It's a minimal x86 firmware to boot Linux, mostly to accelerate the boot of QMU. So again, the problem statement is extremely simple. And the stats here are even more amazing. Only 84 commits, 13 authors, 117 folks, 624 stars. I'm not sure there are that many other projects with more folks and stars and commits. Why? Because the project was mostly developed overnight. Again, the community is mostly the third team at Red Hat, which is not a surprise. What can we learn from that? Well, first, that the community is the key. If you measure success by traction, an existing community, a simple problem and an elegant solution are essential. All the three examples I gave that were successful had a pre-existing and well-established community. Because of that, they share the same problems. This means that we will probably agree on the solution. And they also learn to trust one another over time. The number of stars reflects an interesting stardom effect inside the team. But also the code is very simple, so this attracts more followers. Now, let's go back to my own pet projects. If you want to measure things by self-development, maybe you have other metrics. It may be worth solving a problem that nobody else has. In other words, maybe it's okay to be Frankenstein rather than Albert Einstein, and I know it's not the right spelling. Excel is still failing forward, yes, but I love every minute spent solving problems there. I enjoy using Tower 3D quite a bit more than Google size. Also, remember that it takes 20 years to build an overnight success. There is, of course, a lot of value in solving simple problems, but larger problems that are well beyond your personal capabilities are the ones that make you grow. If popularity is not your primary driver, then it may be okay to love the monsters that you created. Go for it, even if it's to be, even if it will fail. So that's the end of this talk. You have two QR codes there, one on the left for a pointer to Excel and one on the right for a goodie for French readers.