 It's our honor and our pleasure to be joined by the great Gerald J. Susman. It's difficult to sum up all of Jerry's achievements and contributions to computer science, but let me attempt with the following. Jerry is professor of electrical engineering at MIT. He has been involved in artificial intelligence, programming languages, and science education. He has also co-created Scheme with Guy Steele. Jerry is very well known for his work on structure interpretation of computer programs or fondly known as SICP or SICP, a reference textbook that is often cited as one of the best in terms of elegance and functional purity. He recently published another amazing book titled Software Design for Flexibility. We're going to hear about a few interesting developments of some of the ideas there. I would ask the audience members to get these books ready if you have them. Take it away, Jerry. Well, thank you. I am very pleased to be here. And I wanted to thank Tovia and Marcus for being just before me. I listened to their talks and they were very interesting. Of course, the stuff I was doing was mostly for very advanced students at MIT, graduate students and undergraduates at the advanced level in physics and in mathematics type subjects, although SICP was originally for freshmen. But today I'd like to talk to you about stuff that's brand new. It's sort of hot out of the gray matter, as we would say. It's not completely worked out. And it's sort of what I've been thinking about recently. So I suppose we'll get onto it. Let me see if I can advance my slides. Yes. What I'd like is to make systems that are easy to change. That's the most important idea, that they have the property that they can be adapted easily to unforeseen needs, that they're flexible, that they're incremental, and I don't see a lot of detail. One of my biggest goals, you see, is to improve the expressiveness of programming. All my life, I've thought about programming as being something like other linguistic disciplines, like mathematics or English text. It's an expressive medium that you write things down in and you can write beautiful things. You can write beautiful programs like poetry or Maxwell's equations are beautiful. You can write prose and Maxwell's equations when using it to describe the way electromagnetic radiation works. That's prose. It's describing the world. You can make art and music and all of those things are a kind of expressive mediums. And programming is another kind of expressive medium where we make expressions that we couldn't say in other languages. So I wanted the goal is to figure out how to say things that are otherwise hard to say. The current programming styles we have in most languages and most of programming in general is ponderous. The elaborations we want to express in an idea complicate the expressions. And I'll be more clear about that. So here's an example. Now, I'm sorry that I happen to like parentheses. I did replace the colons that I would use in Lisp or Scheme with dollar signs because I found out that in your language closure, colons mean something else. So these are presumably perfectly good variable names. But here's a simple program that computes the current in the junction diode as an electrical engineer. You might know what that means. Given the voltage across the diode, it doesn't matter. You don't have to know what it means. So it's a simple program and it's a product of the saturation current and the expression which contains various physical constants like the charge on electron and the and Boltzmann's constant and the absolute temperature and things like that and the voltage that I'm measuring. OK, so that's what this is. And I want to be able to say things about this program. First of all, the voltages and things like that are in exact numbers with specified uncertainties. There are units, the voltages and volts, the currents and amperes, the charges and coulombs, the temperatures in Kelvin's. And Boltzmann's constant is joules per degree Kelvin. And the result is in amperes. I want provenance. I want to make sure that something goes wrong. I know who to blame. Where did I get this data? The data might be wrong. So the company that made the diode, perhaps international rectifier company has to be able to sign the saturation current for the diode. That Boltzmann's constant and the charge on electron are published by the National Institute of Science and Technology every few years. They are the best estimates of the physical constants. Perhaps I measured the temperature in the room I'm working in. The formula that this is curls from from a book by Cyril and Gray. OK, and the procedure happened to be written by me. And then there are reasonable constraints. I could never have a voltage for this diode bigger than one volt. By I have thousands of amperes going through it and it would go boom. So I want to put in constraints like that. But I want more. I want checking my units and reasonable constraints at runtime and compile time where possible. I want symbolic partial evaluation so I can look at these things as symbolic expressions as well as numerical. And I suppose Tovia and Marcus were showing some of that with the kinds of code that I tend to write. I want symbolic partial evaluation. So I said that I want estimated the uncertainty that that one I don't know how to do, but that's what I want. I want tracking of dependencies. So if I get a wrong answer, I want to know who to blame. Is it the person who wrote the program like me? Is it the is it somebody gave me a wrong value for the physical constant? I want to know who did it. OK, I want I want that sort of thing. I want the unmarked derivatives of my program and intervals, perhaps I want the derivatives intervals to inherit the same annotations. And I want the annotations to be incremental and additive, requiring almost no change to my code. Now, here's the big one. I have a very beautiful program. See over there is a tiny little program. It's simple. It is elegant. It doesn't say any more than it needs to. However, I want to be able to add all this information without burying that program and a mess of declarations that make it impossible to read or be understood. So I wanted to invent a new idea, which I call layering, and that that's a very hard problem, whereby it is the case that I can sort of do what an architect does. That an architect, at least the old days, they would sketch a plan on a big board for some structure they want to make. And then after the plan explains what rooms there are and things like that and what hallways and stuff like that, then they put a piece of plastic down on top of it and draw in the next layer, which might be something like how the elevator connects or how the HVAC, the heating, the ventilation and air conditioning system is built in. And then another layer, where the plumbing goes. And each of those things has to be sketching on top, but those layers don't destroy the basic plan. The plan has not become too complicated. If you take off the layers, you can see the original plan. So that's sort of what I care about. Now, we don't know all the answers. So of course, my friend Chris and I, by the way, Chris worked for me in MIT for 26 years. Then he worked for Google for about 10 years. And now he's doing something else. But I'm not sure I want to talk about that. But in any case, we wrote a book about this. And we don't really know all the answers. This is, we put down our, what I would say is our current best estimate at the time, which is about a couple of years ago. And I want to consider one less developed idea, the layering idea, the one I've been trying to tell you about, okay? So first I want to say a little bit about the fundamental ideas in this book, okay? There's a, the very bottom of it is something called predicate dispatch extensible generics. We know what generic procedures are. For example, in all LISPs, the addition operator adds integers, it can add floating point numbers. It probably, in a scheme, it can deal with complex numbers, rational fractions, okay? And not only that, they can mix correctly. And that's, it's a generic operation, say addition, okay? On the other hand, people don't usually make it so it's easily extensible. Python allows some of those things for very simple operations. They call them thunder methods, which I'm not sure I care about. But the bottom line is, what if we wanted to make it so it's completely general that all we have to do is say that a predicate determines what kind of thing something is and we dispatch on those predicates, okay, to determine what handler is appropriate for every operation, okay? That's actually how we implement the forward mode automatic differentiation that was nicely explained by Tovia, okay? If you look here, this is what we have. We extend all of the primitive operations to handle the dual objects, dual numbers, which were invented by the way in the 1870s by the famous mathematician Clifford, okay? The dual numbers, the dual numbers you pass, if you pass a dual number through a function, it should produce the function's value and the derivative of the function's value as multiplied by the increment. So that the chain rule is automatically correctly implemented, because if I take this output and pass it through G, I get the correct answer for the base value, the primal value, and I get the correct value for the derivative, which is evaluated at the correct places, okay? That's just one kind of thing you could do when you have automatic differentiation. One thing you could do when you generically extend your arithmetic of all the possible functions to deal with, say, a new kind of object that was not thought of before, and this should be done easily, okay? So that's an example of that, but also it can deal with symbolic expressions and everything else once you do that. Now, but layering is not the same as generic procedures. It's similar in the following way. Generic procedures extend a procedure so it can operate on data it could not have operated on before the extension. When we extend addition to append strings, okay, we would apply the strings, we can do that. But with layering, the data potentially has metadata associated with it and the procedures are extended with the metadata. So an example of metadata is, for example, the units, okay? An example of that metadata is the provenance, is the support set, where that came from, who signed for it, okay, for that data. And what you wanna do is extend all the procedures to operate on the metadata as well as the data in parallel with the operation on the data. So for example, when we add a units layer, okay, we extend multiplication to do multiplication of the data as well as produce the units for the product, which basically adds the exponents, okay, if you wanna think of it that way in the, for each of the units. Of course, the metadata is itself data, so it may have its own metadata. So it's a whole another thing besides just generics. And although in what I showed you before here, we implemented the automatic differentiation and schema tools as a direct extension to arithmetic, it can also be implemented as a layer because the automatic differentiator must compute the value as well as the derivative to make the chain rule work, as you just saw here. It's computing the value as well as the derivative. It computes the value as well as the derivative, okay? So let's do an example of layering just to show you what it's like, okay? So we go back to the example I had before, it's small enough to fit on a screen. I mean, I like big examples, but the small examples are appropriate for the screen. So here we have this, again, method of computing a current given a voltage based on some other data that's given to us from somewhere else, okay? So here we have the saturation current, the electron charge, Boltzmann's constant, and the absolute temperature and degrees Kelvin of the room. And over here we have the result. The result is what's it, 1.2 milliamp years. That's a reasonable number for that kind of diode. But now we add provenance. Let's sign each of these pieces of information. The manufacturer can sign the saturation current. So I'm changing IS to be the signed version of IS. The physical constants are periodically published by the National Institute of Science and Technology. So I sign Q and I sign K by where I get that data. I personally sign the temperature because I measured it, I have the thermometer, okay? And I signed the voltage I'm putting in because I'm doing the experiment. And what I get out now is more information. I get the base value, which is indeed 1.2 milliamp years. We'll worry about units later. And I get these are all of the sources that this value depended upon. Now, I consider this to be a very important idea because consider all of the great scientific papers that are written in say places like medicine, okay? We're biomedicine. Many of them are actually are wrong, okay? Because an experiment turns out later to not be done right or something like that. But there's a huge pile of deductions based on those papers. And supposing if we find something really is wrong after much deduction, wouldn't be nice to be able to mark all of that tree and say there's something wrong in this tree and we mark it from, maybe in other case where something's wrong, mark all of that tree, then we find, hey, the intersection is this paper that's got the wrong measurement. And might make things move much faster in that science. Say, well, I'm gonna keep going with this. Let's maybe add more information. The formula was itself signed by the source book I got it out of, Sirle and Gray, okay? And I wrote the code. So I'm gonna, oh, no, sorry, that's not yet. I'm just checking that still works, okay? That's just added. And the code still works, of course, with unsigned inputs. But later I can seal the procedure. I can say the current I signed it, the current of the procedure that measure that computes the current, that's ID, okay? Going back there just to remember that that's what it is. ID is the procedure that computes the current, okay? That was being signed by me, because I wrote the code. And now if I look at the result of calling that procedure on 0.6 volts, okay, that I had measured, okay? I have two layers of the result. The ultimate result is the 1.2 milliamps. That has its own, has its provenance, how it was computed, okay? Including all of the ingredients that went into it. But also there's provenance for that structure, which is that it was because I produced, I passed in a number, which was signed by a GSV into the code that I wrote. So this is provenance of the code that processed it. It might be argued that we should flatten this, if that's easy, but the real bottom line is that I'm collecting that information, okay? Now real complicated programs are much more complicated. For example, there are conditionals and loops. So here's a nice little looping program, does nothing very much. It says, given the count of N I want to compute, if N is zero, then I'm done, okay? Otherwise I want to count by subtracting one from N and go around this loop, okay? Now remember, scheme is tail recursive. So this is not, it's not intended to push any stack, okay? Now that's going to produce a complication, which we'll talk about. But in any case, I'm signing every level a piece of this, the zero, which I'm comparing with N is signed by Frodo, the job being done, the done result is signed by me, and the one I'm subtracting off of N is signed by Mr. Bilbo, I see somehow he got into the Lord of the Rings when I was typing this. And if I call this by asking Sam Ritchie to give me a number, okay? Actually that might be Sam from Lord of the Rings, I give it a zero, of course. The only thing that matters is these two parts of the clause, there's the predicate and the consequent, okay? So there's the Bilbo's irrelevant here, okay? But if I ask, if Sam gives me five, then it goes through the entire loop and therefore Bilbo is relevant also. Now this is a complication. Anytime you deal with conditionals, it changes the way the interpretation mechanism works. One way to play this in a standard language the first way I did this and the way I've done it to make it work at all is by making if into a macro, a macro that expands if into a slightly more complicated situation. But that basically causes the compiler to produce Lazy Code. It works but it produces Lazy Code. And also it also is very inelegant. The easy way and the layering, because of the fact that I have to combine at the end of every computation, the various layers which are the various metadata with the data and pass it along, that would automatically kill tail recursion if it's in a simple way, the easy way to do that. Because when it does it puts extra stuff on the stack, which is the guy who's collecting that result, okay? Now in the last few weeks actually, I've conquered this problem and I will explain that, okay? So that's why I'm saying this hot out of the gray matter. So the way we did it in the SDF book, we didn't have tail recursion working for layering, but now I know how to do it, okay? So I'll tell you more about that. But in any case, just to prove that the stuff I do works, okay, I did a big problem. No, it's not a big problem. It's big enough. It's so big that I can just barely fit it on a screen, but it's small enough so it tests lots of stuff. And this is basically a Y operator like definition of factorial and Fibonacci. And they're core cursive in the way I wrote it here. And so I can compute the factorial of five and the Fibonacci of 10, okay? And I'm getting of course the results, and also the sums. So I get 55 for Fibonacci of 10 and factorial of five is 120 and the sum is 175. And these are the various contributions which are based on the signatures of all the constants. This is just to make sure that everything was sort of working sensibly that the interpreter I built wasn't broken. By the way, if anybody has questions, I'm pleased to answer even if you want to interrupt me but we can talk later as well, okay? So first of all, a little bit of a summary of what this is before I go further. The way the generic procedure, the way layering works is that it's somewhat like generic procedures but it's not like them in other ways. In generic procedures, there's a single layered object the single layered object, whoops, unlike generic procedures, there's a single layered object data type that associates a layer name with this data or procedure. There's a base layer and there's annotation layers such as units, providence, derivatives, whatever we want to add, could be types, okay? But it's very much like generic procedures in that a layer procedure has a handler for each layer of interest. So I have to build a handler for all of the primitive objects and the way they're combined, of which the most important way of combining is, well, that's what if is, that's one way of combining and another way of combining is by definition, by the lambda abstraction, okay? A layered procedure has a handler for each layer of interest. The unlayered procedures must look only at the base layer of the layered argument, okay? And most layers are self-contained. The handler will not look at arguments other than the ones for its layer, but remember, multiplication needs more information. And that's because if I pass two arguments in to multiply, one of them is zero, then the providence of the result is only, depends only on the providence of the zero because zero times anything is zero. So it doesn't matter what the ancestry of the other argument is. A handler will also not generally be invoked unless arguments for the layer are provided. And there are defaults for missing layer values. So there's an empty support set for providence. There are dimensionless quantities. There are mathematical constants. We'll worry about that. So I just want you to get the idea that it's something like generic procedures, but instead of it being there being one object, which is being passed through a procedure, which may have many handlers for different kinds of objects. Here we have an object with many layers, many parts, each of which is these parts is being sort of routed into a layer processor for the layered procedure. And then the results are recombined at the end. And this has to be done without killing tail recursion. So just be a little bit further. A layered procedure, layering is compatible with generic procedures because a layered object is one for which there is a test for it, layered object predicate can be used for generic dispatch. So in fact, generic procedures can have handlers for layered objects. A layered procedure is itself a layered object. The base procedure of a layered procedure can be a generic procedure. And the handler procedures can be layered procedures and they can be generic procedures. So this is all mixes together in a way that sort of works nicely. So first of all, I'm gonna show you, tell you what I did, okay? I'm not gonna show you the interpreter that I wrote because it's a long story, it's boring, okay? And the one great thing about it is it's generically extensible, okay? But it's a continuation passing interpreter. Now, I don't know how many of you know about continuations. That's a thing that is available in scheme. But think of it this way. If I have a compound expression like the sum of the product, the sum of one and the product of two and three and four, okay? So I've got, so the sum takes three arguments, one of which the middle one is a product. That expression is waiting for, at some point it's waiting for that product. In particular, I can't do any of the ad until that product comes in. What's going on there is that another way to think about that is that the interpreter is got a procedure that takes one argument called the continuation that's waiting for that answer and will then pass that to the addition. That's called the continuation, okay? And if you write your interpreter correctly so that everything in the interpreter and therefore in the compiler is continuation passing correctly, then you have what's called a continuation passing interpreter. And once you have that, you could also return that continuation to the user somehow in which case you get called current continuation. That's just a simple way of doing it. Anyway, I needed that because I wanted to be able to do something that makes all of this work nicely. And I'll probably explain that. But first I want to just show you what the setup is once I've done that. What I really have to do is I have to give the handlers for everything. So I'm gonna have it for the provenance layer, I'm gonna have a default value which is the empty set. I'm using list of sets, okay? So I have an empty set. I have a union of those because many not much of the time when I'm dealing with a provenance, the most important thing to do is say, oh, here are all the contributions I want a union of those together. And so there's a little bit of that. And then there's some, the special thing for dealing with some of the primitive procedures. Okay, let me see, where's my mouse, okay? And then there's a special setup for conditionals which is also uniting stuff. This is very, very complicated in the insides of the interpreter. But what's happening is I'm making a handler so that the results of the predicate calculation can contribute to the results of the consequent or the alternative calculation for provenance. That is if I determine that the predicate part of conditional is true, okay? What that means is I have to say by picking out the consequent to be evaluated and the provenance of the result of that has to be the combination of the provenance of the predicate plus the provenance of the consequent. That's the provenance of the conditional as a whole. And that has to be arranged. Then primitive procedures have to be given they're provenance handler like addition, okay? Subtraction, multiplication has to be a little different for the reason I described, okay? Division is a little more complicated the same way because it's like multiplication. If the dividend is zero, the result is zero independent of what the advisor is unless the advisor is zero in which case we get an error, okay? There are also, you know, there are comparators that have to be told what to do. All I care about is I care about the arguments, the provenance of the arguments to produce the provenance of the Boolean result, okay? There are one argument functions like exponentiation, sine, cosine, et cetera. And there are a lot of those. And then there are calculator and cons and everything else and just for every primitive, I'm giving a layer. Then there's, I don't even want to get into this, okay? This is the fact that there's a act that describes how to write a signer, okay? And how to assign procedures and so on, I'm gonna ignore it. But that's, I had a lot of trouble with that, it took me a while. The really important thing is this, okay? Tail recursion. And I want to make this clear to you what it means. For those of you who are not familiar with this, there, I could write factorial as a recursive procedure like this as you all know. Factorial of n, if n is zero, the result is one. In other words, it's n times factorial n minus one. Because of the fact that this factorial of n minus one has to be computed before the product can be computed. The product has to wait for that result. Therefore, there's a stack frame produced for catching that answer. So this builds up linear stack. It's linear in the, in n. However, I could write it differently this way as an iterative procedure. I could have a procedure, which is a two numbers of product and a counter, okay? It's initialized to one in one. If the counter is greater than n, then I've got the product. So that's okay. Otherwise, I go around the loop, multiplying the counter by the product and adding one to the counter. This does not have to build stack because there's nobody waiting for this answer. The only guy who's waiting for this answer is way back over here. The guy who called factorial. So there's only one continuation waiting for this answer. So this has constant stack, okay? Rather than this, which has linear stack. So that's called tail recursion or tail call optimization, okay? And the scheme, we do that and I care about it a lot because it makes the language very simple, which means, and the reason why I like this language very simple, bottom language. I don't care if you build, you know, lots of useful, useful macros and things that look, turn it into while loops and four loops and all that. I like to have the bottom language very simple. So the interpreters and compilers are very simple. So I can rewrite the interpreter very fast. I can do experiments for the whole language and everything works, okay? So now here's the trick, okay? The essence of tail call optimization is that the mechanism of procedure call itself does not push stack. That's strange, right? It's not like other languages. The mechanism of procedure return does not pop stack. If any information like the environment or the return address will be needed after a call, the caller pushes that information. The caller is responsible for popping what is pushed. This is called a pure caller saves convention, which is not what they do in say C. A layered procedure has multiple components. Each of the component must be produced a value. Those values must be combined. Is that a contradiction? For months, I was stuck on this, but it's not the case, okay? What I can do is I can augment the continuation to collect and combine the results of the layers. And that extended continuation is still one continuation. So it's only allocated once on the stack, okay? This extended continuation provides a locust for collecting the information from the predicate of conditional with the values of the consequent, for example, or alternative, but also it works for tail calls. So what I'm doing is I've changed the interpreter by augmenting every continuation in the continuation passing interpreter with an extension that can catch stuff, okay? But that extension doesn't grow except for the data that's put into it. It doesn't, there are not more layers added in, layers wrong word, more stack frames added in. So let's add a units layer just to show you, okay? So now I'm gonna have to put in, I have to put in handlers for all the primitives. I'm not gonna show you what the handlers look like. They're pretty simple. What this one's doing is checking that all the inputs have the same units and then make sure the output has that units. And that's true of subtraction. For multiplication, well, I'm gonna multiply the units, okay? For division, I'd have to divide the units, okay? And what multiplying units means is basically the units are things like, you know, kilograms cubed. Well, multiplying that by kilograms to the minus one gives me kilograms squared. So it's really just adding exponents. There's the quality, you think the comparators, the output is unitless because it's a Boolean, but the inputs all have to have the same units, okay? These are, again, the exponentiation, they all of the standard complicated functions are unitless input and output. Square root is special. It divides the units of its input by two. Well, it takes the square root of the unit, which is dividing the exponents by two. And now we can do the example and show it to you. So continuing with the same thing I had before, I'm gonna add another layer. Remember, we already had a layer for provenance, okay? So here, the saturation current is in amperes, okay? The charge on electron, which was told to us by NIST is now coulombs, which is amperes times seconds, which is literally just a list here of amperes and seconds. These are the exponents. The Boltzmann's constant is actually Joules per Kelvin, which I'm writing here in the traditional way. Okay, actually, if I put a little squiggly brackets in, I could do this with tech, okay? But let's not worry about that. So that's kilograms meters squared per second square per Kelvin, okay? The temperature is in Kelvin's and voltage is a kilogram square meter. Okay, voltage is basically, what is it? It's Joules per meter, okay? So that's what's happening here, I think. No, I just said that wrong. No, it's a Joules per kilogram, excuse me, okay? So that's what this is. And so I hope I wrote that right. The answer is right, so I don't have to worry about it. Whether or not I wrote it this way right is wrong, it's a different question, okay? And so if I put in my, if I just look at, for example, what that object looks like, it's a thing with a providence, with a units layer and a number, okay? But the original program, we can still work with it. It's, we don't have to change anything to deal with units, okay? It still has the signature in for, by assigning the formula by Sirle and Gray. But now if I put in a voltage with units that was signed by me, outcomes exactly what I expect, okay? I get the providence for the call, I get the providence for the result, I get the units for the result, and I get the 1.2 milliamp here as output, okay? So this is what's going on in my head right now. I'm sort of beginning to summarize. The idea of layering seems right to me. What it's doing is it's factoring the problem of programming into smaller pieces and that's always better. It's dividing it up so that my poor program over here, this program is not covered with lots of detail the same way, you know, if you're writing in Java, you have to, what's the right way to say it? It requires a great deal of ceremony to get anything to work, okay? And that's because you have to make the compiler writer happy by putting in all sorts of detail that's irrelevant to your idea, okay? That's a problem in most languages, but Java is a beautiful example of one that does it to a great extent, makes it really hard to write code and to see what you wrote, okay? I wanna keep my code to be simple. This is my simple code, but I want to be able to dynamically add layers, okay? Without changing my code. I want to dynamically extensively annotate something with metadata without actually changing the program I'm writing. And this is, so it means that there's gotta be ways of writing the layers, okay? And hooking them together with the code base that you're working on. My experiment here that I'm showing you with changing an interpreter, writing my own little interpreter is it's not hard to approximate. I need layered objects and I need small modifications perhaps the interpreter compiler. If you know enough about interpreters and compilers which of course I do, it's not a big deal to do that. But it took me a great deal of effort to think out that extension of the continuation objects in the interpreter. There are lots of unsolved problems. There's a painful problem of IO which is related to the challenge of editing a program factored in this way. One thing that, my favorite program at all, my favorite piece of software ever written is Emacs, okay? Because Emacs was written 40 years ago and it keeps being extended and it keeps working and it works really reliably and really well, okay? And the one nice new thing that's been put into it recently in the last five years is org mode which is sort of like a layering for text, okay? You could put in the code, you could put in, you could put in as you get out sort of what would be called literate programming style, you can put, you could get it to output in late tech, you could get it to output in tech info, you could get it to output and do all sorts of things, you could put pictures in and everything else and it's a great thing. And it's an outline, okay? And you could have different layers with different layers of detail, okay? So I'm inspired by that but I don't know how to do it yet, okay? So what I'm saying is I need your help, okay? You're the power, okay? I'm just producing ideas. Yes, indeed, there are some problems. In general, conditionals need special treatment. The provenance of an if is the provenance of the provenance is a combination of the provenance of the predicate part and the provenance of the selected alternative. There's a problem with tail recursion, okay? Because I have to make a layered object at the end but perhaps that idea is wrong. I have solved both of these problems by making a new interpreter. But is there a way of fully implementing layered systems without a new interpreter? Is there some trick I hadn't thought of? I don't know, okay? Maybe you have an idea, okay? And what would an, what would, whoops, that's got a problem. That should be an and, not and. What would an IDE for layered systems look like? Is there a kind of Emacs-like structure that we can invent for being able to edit these layered systems so that the various layers can be hooked together, okay? And yet we can separate them to work on them separately. Okay, so that's where I need your help, okay? I need your help in one more way, okay? Everything I do and all my books, okay, are basically free, okay? I want you to support and contribute to free software. Where free means doesn't mean free as in, as money. It has nothing to do with free as in free beer. It's free as in free speech, okay? So please support and contribute to free software. Thank you. Now I'm happy to take questions. Fantastic keynote, Jerry. I wanted to say that predicate-based dispatch procedures personally are some of my favorite techniques that I've learned to adapt from your work to my work and I've benefited a lot from that. So thank you and I look forward to extending that with what I've learned from you today. And so to everybody else, we would like to go into the next section and we'll have the opportunity to take some questions from you. But first we'd like to do a fun little show of books. I know a lot of us have these handy. And so if we can get these handy and turn your cameras on, we're gonna see how many we can get. Yeah, come on. Gonna give you, yeah, we'll wait for like a minute. Everybody grab it and then I will be a screenshot. It's all like that. Jerry's got many. I don't have four hands. Okay, five, four, three. Two, one. All right, keep them up. Screen shotting. Woo. Thank you, Jordan. Okay, so let's start like the Q&A. I just wanted to repeat a little bit the mechanism of the Q&A. Please raise your hands if you want to ask a question. We'll go with the raised hands first of all for interaction or post your question on the Q&A channel, Q&A dedicated channel. So we have already a few. We can take James first, which I see who I see here at the top. Should I allow? Yes, sorry, go ahead, James. Hey there, Dr. Sussman, thanks so much. The one of the, a while ago I was very inspired by the concept of generics. This was of course before layer procedures came out and I wrote a, what I thought at the time was an extremely readable program, at least at the top level, the top level elegant main function. And I came back to it about two years later and I had to modify it and to basically take the complex data type that was returned by it and then add another operation onto it and do some further things. And what I found was, is I had difficulty knowing what it was actually doing at that point. Cause I could read it like an English sentence at that point, it's very beautiful, but I had difficulty with the code navigation because I was using a dynamically typed language and I wasn't able to sort of navigate without doing some complex debugging down there. Now I have the methodology that you've outlined here and I think the interpreter might be part of the answer for this, but I was kind of curious if you had any thoughts on good ways to organize the code, at least in terms of code organization, so that when you know, for instance, let's say that you're the locality of the declaration of your data type is not visually near where you end up using. Let's say I just have some variable Q right there or something like that. It's not necessarily immediately obvious to me what the effect the plus operator is gonna have on it. And then I've defined various behaviors for the plus operator all over the place. And it's difficult for me to tell from like the argument signature what type that is or what sort of operations it's supposed to support. So it's kind of curious if you've dealt with anything like that and what your techniques you use for navigating those generics. Fine, that's a great question and it's very interesting. The way in which I think about it is two ways actually. There is a traditional thing that you would learn from perhaps early programming where you have a big string of a conditional with a big string of, you know, Kandikiu mumbo quote foo, okay? For a whole bunch of types, okay? And that would be the way you would just make a dispatch on the arguments to a function. The function is sort of all possibilities. Then there's the thing you learn from the object oriented world which is you organize it by the data type and you put the handlers in the data type for all the same operators, okay? And of course the real truth is that it's really a table which is a sparse table which is the operations and the types, okay? Which you're filling in some and some of the others. And so you could organize it by the operator or you could organize it by the type and not only but the operator organizing by the type which is sort of the object oriented view is not very good for multi-argument functions, okay? That's so because it becomes a rather high dimensional table. So I like to think of things in terms of the table itself just to be very clear, okay? I tend to not want to decide a priori that I have something that looks like an object or something that's being a dispatch by conditionals. Now, that does produce a bit of a way of confusing yourself which is what you're alluding to. What I tend to do is I tend to organize the material based on the meaning. That is for example, I might say that there's a whole bunch of things that I have to do with making a symbolic extension to my arithmetic, okay? The symbolic extension does not interfere with any of the numerical stuff, the old numerical stuff, okay? So I never have to think about it if I'm changing the numerical stuff, okay? It doesn't hurt, okay? Or if I change the symbolic stuff that doesn't hurt the numerical stuff. And then I might do, I might add the automatic differentiation structure, okay? That by adding for example, new numbers, okay? That doesn't, it doesn't cause any trouble, okay? Because all the old stuff still works. So if I think of it only as all I'm ever doing is modifying a system by adding new capabilities, it's additive. The real problem has to do with whether or not I want to change the old, the bottom layer, okay? That's I think what you're worried about. So for example, a possible bug that could come up is supposing I'm doing arithmetic and I have some complicated algebraic expression and I extended to matrices, okay? Whoops, but remember matrices are not commutative. Therefore multiplication of matrices. Therefore if I did multiplication in my original program I've just introduced some bugs, right? And the problem is knowing that you're, knowing that you have to go back and fix that. Is that, is that helping, is that helping you? That's a really good example of it. And I love the idea of organizing it in a table. And I think that really plays in well with your idea that we could use some better IDE tooling to support these sorts of ideas as well. Like being able to not just mentally but to visually see that table. Because sometimes if I come back to something two years later even if it was crystal clear to me at the time I have no idea what that mental table is anymore. So great, great thoughts though. But does that, so that helps you? That's what I was hoping, I mean, yeah. I don't know exactly but my answer got to your point. Okay. Yep, thank you. Wonderful. Well, we have another hand up here, Edward Hughes. Hi, Professor Hussman. Thanks for the, the talk. I just, I had a question more about pedagogy and one of the themes running through structure and interpretations of classical mechanics is that sort of intuitionistic and unclear mathematics and notation can get in the way of reasoning about problems and solving them. Now in undergraduate calculus, some of us might recall we were told that we cannot treat dy over dx as a fraction even though a lot of the symbolic manipulation that we do in math and physics requires exactly that. And now with dual numbers you sort of extend your number system a bit so you can actually do that. And given that it's such a powerful programming tool I was wondering if you had any thoughts to whether it's introduction to freshman math classes could have a similar effect that SICP does on people trying to learn freshman level computer science? I see, okay, that's very interesting. The motivation, a lot of the motivation came from for SICM came from reading these physics texts and realizing that they were hard to read. And part of the reason was of course that the math was, as I say, impressionistic which means it left out a lot of detail. It was sort of just sketches of what's going on and programs are not like that. But the other thing that's special is that Leibniz notation, the dy by dx notation is very misleading when you go to high dimensions or higher dimensions. I was inspired, when I was an undergraduate at MIT probably before most of you were born. There was a book I got, I suppose it was referred to me by Marvin Minsky, okay? Which was Calculus on Manafolds by Spivak. Okay, it's about a hundred pages, it's a delicious book. And the whole purpose of this book is to start with basic principles and get to Stoke's theorem, which is the end-dimensional, what's the right word? Generalization of the fundamental theorem of calculus which is that derivatives and integrals are inverses of each other. And this book had the property that it alerted me to the problem that Leibniz notation was confusing. You could get into all kinds of paradoxical situations by doing the division, as you say, of d by dx and thinking that that's what's happening. In fact, he's the guy who pretty much alerted me to the fact that what I really want is derivatives of functions, not derivatives of expressions. Okay, which is the big D operator is derivative of functions. Okay, so going down that path, of course, you can eventually transform this when I did the functional differential geometry. There is an interpretation of the d by dx and dx, okay? And those are the, those are basis vectors and co-vectors in the manifold, in the tangent space for manifold. And that's very, that's very, in the cotangent space. And that's very, that simplifies matters. So then it looks like you're doing the arithmetic on the d by dx and dx type objects as if they were sort of divisions, even though it doesn't, you don't have to think that way. But it just justifies that, okay? But you have to build that through the understanding of the capital D. I think that we would be, that introductory calculus would be vastly improved if we got rid of Leibniz's notation and used the big D and talked about functions the same way that Spivak did in his little book. Is that helpful? Yeah, that was that answer to my question. That was more or less what I was thinking. I think on page 40, I think it's 4044 on Spivak's book, just pulling out on my memory, where he gives a, he has a diatribe against Leibniz notation. Oh, I'll have to look that up, thank you. Yes. And other raisins, Ag Davies, take it away. All right, thanks so much for the talk that's a really fascinating idea that I'm gonna keep thinking about. Very, very simple question. What are some other layers that you can imagine being useful besides units and provenance? Oh, or even one that's really nice is types, okay? And type inference. Type inference is really easy to do, it turns out. And in fact, in our new book, we can give a very simple several page building of type inference, okay? And it takes only a few pages to just show how to do it, okay? But the point is that's something I like type inference. What I'd like, I don't want, I just like languages where there is enforced, there's enforcement, sort of a libertarian programmer. Okay, but I do like to get the information that the types gives me. And so I think that's an example of a nice layer. There are many other layers I can think of that are very helpful. They said things like tracking some of the uncertainty in a number. Now, remember that's hard, for example, backwards inference of exactness is insanely difficult work. Floating point is the scariest business in all of computer science as far as I'm concerned. If I were to, to be perfectly clear about it, the only reason I'm willing to fly in an airplane is because it's a test pilot. Not, I don't trust floating point numbers. But that's an example. What other things can I think of that are really useful in there? I want to be able to talk about, I want to be able to put on things like wrappers that are paranoid programming wrappers. So for example, I'll give you a simple case where it's trivial. I want to say the output of this procedure, I know what it ought to be approximately. A reasoning, okay, given the inputs. I want to have an approximate answer that's computed that I'm comparing every time with the output, okay? Because the same way I do it when I'm doing real problems, I don't know if my computation is right until I said, is it reasonable? If it turns out that an orbit of a planet I'm computing happens to, that you have a planet crashing into Jupiter, it's obviously wrong, okay? So I want to know, I want to have that kind of idea. So there's reasonable wrappers that I want to put on things. I want to imagine that of course I'm doing symbolic evaluation as well. That's very useful. If I'm debugging a complex numerical problem, then one of the ways to do it is look at the symbolic representation of the result and see what I actually am computing. But way back when I was developing the digital oratory, which was a piece of machinery for doing orbital mechanics calculations. This was in the 80s. I did it when I was on sabbatical at Caltech. And I have a bunch of friends helping. I made a special machine doing orbital mechanics calculations. And one of the things we did is we did the computerated design of the hardware with code that I had written to do the computerated design. But one of the things I did is I made it so it could symbolically evaluate what went through the chips, okay? So the simulation so that I could put in, I could say, here's a floating point number of these two pieces of it and they get recombined over here and so on. And I eventually, so I could put in the algebra expression that was the thing I was trying to compute and then compare it with what it actually did, okay? At every stage. So that's the sort of thing I want to be able to do in everything. I want to have lots and lots of ways of making sure my code is good, which is not the same thing as trying to prove it correct, okay? Yeah, thanks. Awesome ideas, yeah. Okay, we have another hand raised from Philippa Silva. Hi, sorry to parrot everyone else, but thank you so much for your talk. I have a very concrete question. At some point you were listening, okay, like we have to define how the arguments for all of these operations, how we're actually going to process this information, this extra information, this method like that. And it's like on addition, you combine it, et cetera. And then you got to multiplication and you mentioned the multiplication by zero. If I remember correctly, you kind of discard all of this extra information. Which kind of- It's some of the information. Yes, it discards me, yes. So my question there is why does it actually, so because like, okay, you returned the zero, but some of the, one of the arguments certainly was zero. Why is the information for that zero not being used? So like the question is actually, which zero is actually being returned and which information associated? Because presumably like if this is just like, the nil element, there's a nil element somewhere that says, the provenance of this nil element is this proof, et cetera. Okay, very good. You caught it. Very interesting question. The answer of course is that you're absolutely right. The result being zero depends upon any zero in the inputs. But supposing there are two zeros in the inputs to a multiply. Then it's the disjunction of those two rather than the, so it's either this set or that set, but we don't know which, right? So that's really actually a place where backtracking can be useful. What that really is is a place where we're saying that the actual provenance is an ambe. If you know what ambe is, ambe is the procedure invented by John McCarthy to describe non, what's it called? Basically non-deterministic automata, which is ambe of any number of arguments is one of the arguments, but I don't know which, okay? And so the way to say this is that the provenance of something with several zeros in its inputs is the ambe of the provenances of the pieces that were the zeros. And I didn't do that here because of course writing that code is big and complicated. And of course I have to do that if I'm doing a serious thing, okay? And I couldn't put that on the slides. How's that? That answers my question. Thank you very much. And thank you for the question that shows you were really sharp and clenched. James, do you want to ask the next question? Yeah, I'd love to. Okay, this is a bit of a softball question for you, Dr. Sussman, but I haven't seen the answer online or explained anywhere else. So I'm just really curious, what is the story with the hat? The hat, your Shreiner hat that's on every picture of you. Oh, okay, yes. Way back in when I was running the introductory class, 6001 at MIT, which was the thing that Abelson and I actually ran it together and we wrote the book SICP, okay? I would give a lecture every term which is the eval-apply interpreter lecture, okay? And that lecture was always done as a fanfare. A lot of effects. I would blow things up. I would use magician-type tricks with stuff that would explode and things like that because it's sort of magical, okay? In other words, the very fact that you can make an eval-apply interpreter and it looks like the, what does that mean? What does it mean for the system to sort of raise itself by its own bootstraps, okay? And it's beautiful, and of course it does. And to explain that is it required some sort of fanfare and it was always a fun sort of performance, which I would do. And I wear this hat as a magician hat, okay? How or a different one? He wore a pointed one when he did the lecture. You know, sort of the wizard, the traditional wizard hat. So that's all it came from. How's that? That's great, thank you. Love that hat, favorite hat. So we have Sebastian Crane that has their hand up. Okay. Hello, thank you. I have two questions and the first is about your talk and which was very interesting. And the second is a more practical question about using the scheme programming language. So my first question was that when you mentioned the signing of values and functions, is that cryptographically signing or some other mechanism? No, no, I used the word sign. I probably shouldn't have used the word sign. What I mean is that that's just an attribution of who is responsible for this value or where that value comes from. Indeed, if I were worried about making this secure, of course we use cryptography. But as an old locksmith, I will tell you that cryptography doesn't answer a lot of questions anyway about security. Another way to say that is when a burglar goes after your house, it doesn't matter how good the lock is, he breaks down the door, right? Perfect way of describing it, yeah. So my second question was directly related to what I did earlier today, in fact, which was writing a scheme program to use the LibNotify library, which is a C library. And I was wondering if you had any suggestions on how to write code that called C code in an elegant way because just copying C expressions doesn't sound terribly nice. So I'm wondering if you had better suggestions for that. Actually not, but I'm gonna tell you the real story that I think. I think probably the main reason why Lisp didn't make it as a major language in the world is because it refused to make an easy foreign function interface to C. And the reason why it generally does refuse to do that is because Lisp is its own operating system, right? You don't want to, what you basically try to do is you don't wanna see programs that are bad enough so that they can clear memory, okay? They can basically destroy evidence of why they fail, okay? And the nice thing about Lisp programs is that Lisp programs when they fail, you're still there in the alive interpreter that you can look around and see what happened, okay? And you could fix it and you can understand it and inspect all the pieces. And this has been the big fight of course that most of the time you really don't want to incorporate C code into your memory image. What you wanna do is to have a, have what the operating system does, which have a separation of processes, memory spaces. C code is so dangerous that may as well just connect to it by some good foreign function interface that goes through a, goes through sockets or something and that's not bad. And these days machines are fast enough so unless you're doing that at very high rates, it's not an expensive proposition, okay? Yeah, that's very interesting. So it might be, yeah. So I would make as a server, little server in C for problems that you want to call up, okay? And you make a foreign function interface in your scheme or at Lisp. And I have one right now that was made by one of my former students that allows me to call other pieces of code. Thank you for the suggestion there. So I'm writing the core interoperability parts in C and then using a socket, for instance. That's a really nice idea. Thank you. Okay. Thank you. Last thing, do you want to get the next one? By the way, I have to leave at four because I have another meeting, believe it or not. Oh yeah, fine. Okay, strangely enough, I have a meeting. Yeah, go ahead. We'll try to make it quick, yeah. Jerry, can you explain again what you mean by layered EMAX and what you're looking for? What I'm looking for is a convenient mechanism that I have where I can edit my base code, okay? Or any layer and make sure they still retain or connected correctly, okay? Such that if I've layered procedures and if I edit them, I can or manipulate them, I can walk around and see only the part that I'm interested in, just the same way that the architect could look at the party, the diagram, the base diagram of a structure he's trying to design and pick off the plastic sheets and put them down and say, I only want this sheet but not that sheet, you see? That sort of thing. So I'd like, if it's mostly I wanted to take my program and I only want to see the type information, those are all numbers, okay? Or I want to say, take it, except for there's a Boolean somewhere where I want to say, take that off and I want to see the provenance information. All right, take that off and I want to see, and I want to work on that on each piece. So you want to see the live runtime state of the running process per various layers? Yes, and you want to be able to edit that, you want to be able to your whole program, because indeed I want live, I believe that the right way to think about things is the code lives for a long time, right? This is not a, I don't like having to reboot things or recompile and reload every five minutes. That's a different kind of programming than the interactive way I like to live. As in Jerry, should we email you directly with when we find the answer to this? Is there a GitHub repo? If anybody wants to start talking to me, I have an email address that said MIT, right? I'm GJS at MIT.edu. And if you want to talk about this stuff, great. And we should figure out how to solve, and Sam Ritchie is a good intermediary because he talks to me all the time, okay? So he can be a person you can contact who knows how to catch me at the right times. Yeah, so Sam is... I do not have a cell phone, by the way. I do not carry one. Sam is in the keynote Discord channel. So if you want to maybe chat with Sam, so you can be a little bit of an intermediary, between like the emails and so on, then that's a good study group that can happen there. Right, and yeah, it would be good to do that, to have a nice such arrangement of people who are really interested in trying to figure out layered programming as a thing that you can play conveniently rather than what I'm doing. I'm just developing infrastructure. I'm a plumber. The designer, perhaps. Plumber, I'm figuring out how you hook it together. But I don't know how to make the nice user interfaces. I've never done that well. Okay, we're gonna take... Sorry, Jordan, go ahead. Well, I wanna follow up on, how do you not have a cell phone? How do you do 2FA without a cell phone? I don't, people didn't have cell phones in 50 years ago. Well, two-factor authentication with everything that makes you, it texts you to code. And then you have to, you know. I have a dongle. I have a little dongle that I plug into my computer. And when I have to 2FA, I authenticate, okay? But I don't, look, I'll tell you, I had a student who I told, monitor all the radio communications on a cell phone for a month, he threw it away afterwards. Because all the, it's a privacy violation. Machine is a thing that's sitting there telling everything about you to the people you don't wanna know, they have to know about you. Like where you are, what you're doing, who you're with. Do you want that information being transmitted all the time to people who basically sell that information to each other? Okay. So I just don't have one. I'm on the call with it. Yeah, I think you suggested a really great solution with doing all your 2FA on a dongle. Cause that's the problem that, I know people are trying to get around is those. And, you know, if people worry about emergencies, we didn't have these problems 30, 40 years ago or so. We get the old fashioned, well, landline telephones work very well. Wonderful. Well, we have a hand raised from Mike Nardell here. Hello, and thank you very much, Professor Sessman. I'm not, this might be a pretty obvious question or maybe a comment I'm raising, but it seems like one of the aspects of what you're proposing, is it with layered approach? Is it with subtly maybe push programmers towards really refining that base layer so that it really captures the essence of the meaning and that subsequent layers could actually build upon that in meaningful ways. So that maybe, you know, again, sometimes you write a function and it's really beautiful and captures really the meaning of what you want. And then sometimes it's just gunk. And it seems to me one of the interesting aspects of this is it really leads people like me, perhaps more towards that beautiful expression of the idea and rather and away from the gunk. I don't know if that's an overly simplistic interpretation. That's certainly what I intend. What I intend is to encourage you to make your code so you can read it next year. The most terrible thing about the piece of code is you elaborate it to put in all the special things you wanna make a copy about it, like all the things that are enforcing the types and making sure that all the details are exactly right. And then you can't read it because it's full of detail. You want those details to be pushed to places where that detail is separable. So I agree with you completely. That's my goal. Okay, we are gonna take a question from Discord now. Maybe a couple and then you probably need to leave. I have no idea what Discord is. Oh, sorry, from the chat. Got it. We just want to make sure we can take another picture with you because a few people were not able to turn on the camera so we can have like a larger grid. And we want to try that before you go. But before, let's grab a question from Discord. So did you know about the kernel programming language from John's shot? It is a scheme like Lisp with orthogonal first-class continuations and first-class environments, allowing no mess of expression by using these very elegant vowel calculus. Actually, I don't know about it. Please send me email and I'll read about it. Is that new or old? I'm not sure. I'm not sure. There's an elaboration of that, but I'll have to go back and read. Well, the answer is I don't know. Okay, we'll get that question through you via SEM. So even if I squeeze hard, I can see spec being a layer. And I wonder if spec could be using a way that interacts with object metadata. This could be closure specs. So I'm not sure we can ask you that because I'm not sure you know closure specs. I don't know closure specs, but if it's a method, for example, writing contracts about code, is that what it is? It's similar to declarative typing. So we have declarative types. Sure, I would love to have that as a layer. Absolutely. Okay. So another one. Mr. Sassman, can you expand on the details of what makes tail recursion so challenging with layered approach? Oh, I can say it again. The problem is that when I have a layered procedure, it has many components. Each component is computing some result. Those components have to be combined after they all complete, after they get their answer. Okay. That means that there's something that's waiting for that answer, for those answers. If that happens in the usual ordinary way, then you end up with a stack frame waiting for that, but it waits for that answer. Okay. That would hurt tail recursion. What I have to do is figure out some way to get rid of that extra stack frame. And the way I do it is by expanding the tail recursion, the stack frame, or I'll call it the continuation, is the right word, the continuation that's waiting for the answer anyway. Okay. It doesn't have to expand linearly in the number of guys contributing. It just has to be expand a constant amount, which is the amount needed to capture all that stuff. Okay. Is that helpful? I think so. I should show you the details. The details are, unfortunately, there are a lot of many pages of details. Yeah. They are telling me, yes, it is helpful. Thank you. Okay. Okay. We have one last question from Discord and then we'll get together for our pictures. So we'll do a redo. Everyone get your books again. From E.B., we have the question, did you imagine or give any thought about how a security capability layer could work? No. And the reason why is security is not easy. Security is, security is definitely like floating point numbers, one of the hardest things at all of computing. I would say, security has been thought of from the beginning. That's one thing that, but the other terrible thing is that security is mostly not about your computer programs. Security is about a system as a whole. And the difficulty is that there are components in the system called people who can easily be caught by things like spearfishing. Okay. So it's not at all clear what you do. Tightening up, basically tightening up one place like the software being exactly right. Okay. Doesn't necessarily help solve the real problem. I think the most important thing, the way I say it from, again, from the locksmith point of view is the lock on your door does not protect you from the burglar. The lock on the door is a signal to say there is a taboo, a social taboo to breaking this door down. And we're going to enforce this taboo by legal means. We don't have to figure out how to do that in the computer industry or in the, especially in the network. And I think we have to figure out how to make taboos that are enforceable if we really want real security. Is that helpful? We think so. Okay. Okay. Everybody gather the book again. Yeah, everybody gather the book again. Okay, I'll go to the book. In the meanwhile, and we are going to thank one more time, Dr. Sassman for everything he's done for us. Okay, yeah, we have a good 16 read. Can we make it bigger? Let's see, can we make it bigger? We're going for 20. Five, four, three, two, and one. Good, I also took one. Okay. All right, thank you very much everyone. It was nice, nice cover. We're going to post it straight away. Oh, very, okay. So thank you, big huge thanks for Jerry for staying with us, inspiring us to learn better ways to program, think about problems and design better applications. I think that was like the general message. And yeah, thank you very much Jerry. Thank you. Have a nice day. And I'm going to go away and get into my other meeting about the other meetings about grading class, this class that we just finished this term. So it's, you know, teachers, that's a good day. Oh, good day, bye. Bye bye, thank you very much. Bye, thank you. Bye bye. Thank you. Thanks.