 Okay, welcome to the second lecture. A few announcements to begin. In order to submit the homework on Sunday, you will need a class account. I'm sure you are familiar with these. So there are a few left. Please pick them up at the end of the lecture. Number two, maybe not all of you are on Piazza yet. So if you are not signed up for Piazza, because you will find announcements such as the starter kit, a starter code for the homework was released. And so please do the sign up. Next thing I need to announce now I need to remember. Okay, that's right. So if you want to work with a partner for projects and in fact you will have to because the projects are hard enough that you will want to have a partner and will insist that you have one. So if you have a partner already or you know who that would be, send us email by end of Sunday. And for people who do not have a partner, or will not have a partner by Sunday, we will try to use the homework and a little questionnaire to match you up according to your skills and schedules. That's probably all there is to it. So the homework you need to know about that. If you are on Piazza, you do know. Today is Thursday, so no laptops. We'll talk instead to each other and to me hopefully. And that's probably all about Manistrativia, unless you guys have some questions. The lectures are screencasted. The one on Monday was recorded probably missing audio for the first six minutes or so. When they post it, I don't know. You guys know how long it takes to get this posted. This is the first time I'm trying it. I'm also recording it myself, but this mic doesn't seem to work really well. So we have probably one screencast with good audio and one with good video, and then you can maybe put it together with the slides. Yes? It gets recorded. What I write on the screen gets recorded, but things that I only scribble to underline and use as a pointer I erase, but things that I mean to stay part of the published PDF. Okay, so let's start. So two more things you probably notice on the calendar that there are two midterms, one of them in the middle of the semester, the other one at the end, where the last lecture would be, and then final exam is actually show and tell with posters and demos. The grading is roughly like this, of course, quite a bit into the projects. The final project is important too, but you only have three weeks to it, maybe this phone. So only 15% for that, considering it's about three weeks. Class participation is important because it's an upper-level class. In fact, it's sort of a design class where you should build stuff. We want to think of it to the extent possible. As a studio, you build stuff, you show it to people, they come, they give you ideas, they critique it. So the talking is important, and I know not everybody feels comfortable raising your hands in the lecture room. I never did as a student, but there are other ways. You can talk in the smaller venue of the recitation section, you can talk on Piazza, you can post your comments to the lecture notes. There are many ways how to express yourself, ask questions, and participate. So what did we do last time? Last time we looked at about ten examples, five of them in more detail of small languages that programmers are actually likely to write in their practice, and you would be among them perhaps. We talked about what is a programming abstraction. Programming abstraction is essentially, you could say, a bunch of data types with operations on them, objects with method calls. When you have streams, you would have filters and the data flowing between them, so the streams. And usually a way of, and that's important, putting abstractions together, composing bigger abstractions out of smaller abstractions. So building bigger procedures out of smaller procedures would be this example of using abstractions to get bigger and bigger and bigger software. And this is how software is built, because you cannot write one million lines of code completely flat, of course. So what is a language and what is a small language? We also covered to some extent, but let's actually practice it with something that we did not have a chance to cover. So the map reduce story. So who knows what some map reduce? You probably used it, all of you. So how would you call it? Is it a library, framework, tool, compiler? Actually, it doesn't matter what it is. I'm curious how you perceive it as, what you perceive it as. Higher order functions. Okay, so it is some sort of a library with higher order functions, right? You pass it the operation that happens in the map phase, operation that will take place in the reduce phase, and then this big runtime will get the computation crunching on thousands of machines, if necessary. So to compare map reduce to the state of the art before, how do you think people implemented these large-scale data processing computations on thousands of nodes in a cluster? You can pose the question differently. You can ask, if map reduce is so useful, it's presumably because it abstracted a lot of plumbing, nasty, dirty detail away from the state of the art before. So even if you don't know what it was, you probably can venture to guess what tools people used to write such applications before. Exactly. So they use a library like MPI, which is a library for sending messages from one process to another, receiving it, unpacking it, and probably a library for starting processes on all these nodes. And when nodes crashed, it was in the hands of the programmer to recover the nodes, recover the data, and do all of this. So you could say that, oops, it all started with a library like MPI for message passing, and this is nothing but a bunch of procedures that you could call. And then came map reduce, which encapsulated more, abstracted more of the things, and I would call it a framework. Now, there are a million definitions of frameworks on the web, but I have something clean for you. So library is sort of a simple flat function that you call. And it returns and it gives you some value. Another example of an abstraction package is a library would be a socket. It has a beautiful abstraction, right? It's a complete illusion, in fact, because if you look at the hardware at the network, there is no such thing as a socket. You send a message across the socket, everybody listening on the Ethernet can listen to the message, but you have the illusion that the message goes from socket on one machine to the socket on the other machine. It's a fantastic abstraction because it completely divides something that doesn't really exist in reality, and therefore it hides a lot of plumbing. So socket would be such a library. Just call it and it returns back the data. Framework would be a library that is high-order in the sense that you give it some functions. Right? So a framework would be a library that you could say has holes in it. Right? And these holes are plugged in by the user. So in this hole, you would put a function F and G, and these are the functions that would be involved in the map phase and the reduce phase. It's high-order. So you could say that the framework is a parameterizable library. Of course, it comes with the abstractions. Libraries come as well. Sockets are abstractions. But it allows you to hide more and parameterize it, and therefore you can write more stuff because you can sort of plug in the function that the user wants, where they belong. So now if you think about the limitations of map reduce, what do you think would come as the next step to make the programming of such clusters easier? Well, it could be sort of an optimizing compiler where you write a C program or a scheme program and it compiles it. This is in general quite hard to do well. So it could work in some cases, but sort of a sober solution that is guaranteed to work. Sort of a small delta that improves on the state of the art of map reduce would be what? Something that does not actually require the compiler to do these heroic acts of distributing it out automatically. So you could lift it up a little bit indeed by sort of making it easier for the user to write the normal program and then tag things, okay, so this needs to go to map, this needs to go to reduce. That's true, but from what I know from map reduce programmers, this sort of lifting it up a little bit to shorten the number of lines of map reduce code is not really the burning problem. The burning problem is that map reduce does what? It gives you one pipeline, right? You have one map, one reduce. Real programs require putting maybe three or four such pipelines together in various ways. And if you want to do it with map reduce, you are back to the original MPI problem of taking care of the buffers, runtime management, and again all the slow level details of buffer management are exposed to you. So map reduce has an abstraction, but it's really not a true language because these abstractions cannot be arbitrarily composed. They cannot take two map reduce pipelines, stick them together into a bigger one and map reduce or whatever other composition. And you cannot keep doing this. So that was the big limitations. So this is where language is like flume, Java, not that well known, but existing and in fact this one is developed by Google and used apparently by hundreds of programmers at Google. Give you the power to lift it to a high level and compose these pipelines again into bigger one, bigger one. So you could now have libraries of map reduce pipelines. You stick them into bigger ones, you stick them into libraries, somebody else can use them. So flume Java is called Java because technically it's a library, somebody implemented it in Java, looks like a library, you can download it, but it comes with abstractions like these pipelines that are stuck together and they are optimized by a compiler. The compiler has a reasonable easy job in this case because it's not working with C code or scheme code, but with dedicated abstractions of pipelines, filtered, joins and such. And so optimization is easy because it's a library program just those. So you could say flume Java is composable abstractions. Okay, so you take one, map reduce, you take another. Okay, so this is map, reduce, map reduce and you can stick them together into something that looks like another pipeline that can be plugged somewhere else. So this flume Java is what we call here in this class small languages. It's written for only small narrow class of programs, but in a company like Google it takes care of many different problems, makes programmers more productive. And flume Java is not the only, there is a bunch of things that lift map reduce to a higher level like source all defined soon after map reduce. So we talked about all of these what are composable abstractions, what are not, so sockets are not because you cannot take a socket and put two of them together to build composable because this here is a regex, this here is a regex and so this whole regular expression here itself is a regex and you can build them up like this forever. In fact this one here itself is a regex build out of this and the star and this one here is in fact build as a concatenation of these three strings each of which is a regex. We'll talk more about the regex later. Okay, so what do we do today? So today we look at sort of the very basic what are programs, what are values, what are types, how do you evaluate program, how do you represent them, how do you represent values, how do you represent types, but we'll do it all in a small fun example that we'll actually present design challenges and make you think about all these issues. And most importantly we'll get to the question of how you can actually make your language extensible. Right? So often these small languages are written in terms of big languages. The flume Java is written sort of as an extension of Java. Java is not quite well suited for writing these small languages, for embedding them into Java, but you sort of can get along with generings. Scheme is better for that purpose. So we want to learn today how to build a small language that actually somebody else who uses the language can grow it, can extend it without going into the interpreter or compiler and doing hacking. And what we are going to do is remember the Google calculator, right? That's a language after all. It's true language because you can take an expression and another expression and combine it into arbitrarily big expressions so the programs can be arbitrarily large. It doesn't have some procedural abstraction, so in a sense it cannot hide things, but I'd say it's a language. Now, we'll do better than this. We'll show you how to implement that, but we want to see what other features we could add to this calculator language so that it is even more useful than the Google calculator. Okay? So this is what the calculator does to refresh your memory. Imagine you want to find out how fast those ferry boats work. On the San Francisco Bay, you go to their website, figure out it's 34 knots. I have no clue what that means, but this calculator is translated and here is the same expression that we have seen, and this is really, of course, the volume, which is this with energy and another volume and power and happens to be time expressed in days. So what constructs does the language already have? Can we get a quick list of what's available in the language in this calculator language? So we would say there are numbers of what type do you think they are? The numbers. Inns, floats, both? Floats. Okay. There might be ins. We would need to poke a little bit to see, but I don't know, it's not so important. We'll need to decide for our language what we do. So do you think these units are these are what? These are themselves numbers. These units. Is that something you can liken these units to? Yeah, it's one legal view that you can add a type, sort of a type. Not like a string type or a flow type. I think it's fair to say that these are types. Definitely, that's how I plan to talk about them. What else is there? So we have numbers and units. These are the leaves of the expressions. Do you see? So then there are operators. So some operators are obvious. Other some operators are not so obvious and are specific to this example. There would be this converter operator, this in, which you could say conversion operator. That's probably it. Is there more? Per, I see. So I would say, well, okay, good. So what is per? Is it conversion or is it just can we somehow clump it with something else? What's that? Division. So I think that per is actually an alias for the division operator. It's of course cool to make it so easy to add new operators by just saying this is the same as that. You can work on types just like division you can put between meter and seconds. But if you think of numbers and units as sort of the leaf things on which you operate, they are either values or types, the operators, okay. The operators are overloaded to work on one times two, but they can also work on meter times second. So in a sense, the operators work for both. We'll get to it during this lecture. So is there more in this language than that? Okay, excellent. So parentheses. Okay. And parentheses are interesting because they are in the language kind of and kind of not. Does the interpreter you think really know about parentheses? At some stage, these parentheses evaporate. At some stage of the evaluation of the program. Why is it that we don't need them at some point? Okay, I'm looking. Right, more people. Why don't we need those parentheses? How do you think those programs or expressions are represented internally? Okay, that they are structured and pre-parts and they are structured as what? As what data structure? A tree would be the natural choice, just like the scheme expressions, right? Scheme doesn't need a parser because it's already parsed for you into these expressions, okay? Exactly. So these parentheses are there but only exist on the surface level at where you type it. After that, they are gone. And when they print you the result, they put them back in so that you can understand the precedence of these operators. All right, I think that's it, unless somebody can tell me that we are missing something. All right, okay. So, like, have a synonym for some constant, so good point, right? But even with these, these are fun to add in and you'll add them when you build your parser, but this is not enough to make it challenging you, as you will see. What I want to do now is how do we extend it? How do we turn the Google calculator into a Berkeley calculator to make it actually more useful to people? So can you now... Actually, this is right. This is where we want to be. So can you talk to your neighbor for a few minutes and try to come up with scenarios of how we could extend this calculator to make it more useful? I have a few such scenarios in the lecture and it will drive the development of the language, but I'm curious what you'll come up with. I want some extra scenarios because I would like to show you that maybe we can support fun, useful features by doing very little in the language. Okay, so I think you probably have something, so let's hear from a few people. In future lecture I may consider having a random generator to generate an x-coordinate and a y-coordinate and pick somebody, but for now it looks like we could hear from volunteers. Okay, we could have variables that have essentially a constraint solver and equation solver. That would probably complicate things quite a bit. Not that this is a bad idea, it would be great because now all of a sudden it's not just a matter of evaluation, some program you need to essentially solve equations. These things are not too hard, but all of a sudden the inside of the interpreter would change quite a bit. So I love the idea. It probably would not be a tiny extension that I can do in three lines of interpreter code. The idea is good. Okay, so if you didn't hear more complex numbers, base 2, base 16, okay, complex. But let me write these. So essentially equation solving which involves variables of course whose value solve, now we say complex base 2, okay. What was that? Matrices, okay, I'll add it here since these are essentially richer type, right? Just like complex number is two numbers, two floats, matrix would be matrix of numbers. All right, okay. So that may also require quite a rich library of algebraic rules that we can use to perhaps simplify to do integrals since there isn't a sort of deterministic algorithm. But would be useful, okay, please. Okay, so that should be easy to add, right? Because this is just one expression. So let's call it a power function. So I'm looking for something that is easy to understand and useful. So a good ratio in that way. Okay, so new unit types, right? And conversion naturally comes with it, please. Okay, okay. New operators. So this is power and arbitrary operators. So presumably you would need to give them some general language, right? Well, sometimes you may create operators out of existing ones, and probably you would need essentially the ability to define a procedure, use it as an operator. Okay, three more. One. Yes. Okay, so you could say infinite arithmetic, right? Infinite precision including for rationales. Okay, okay, okay. So time dealing with time including the current time, right? Oh, okay. Okay. Reverse polish notation. So this is essentially you are saying new syntax, right? Since that's really all that it changes. Sure, why not? Right? If you grew up with the HP calculator. Okay, one more. It looks integer division. So we do have integer division. I think it just does the usual that it takes the whole the floor of the result. Actually, no, it doesn't. You're right. It will convert it into float. So I would put it here. You're right. Int division. So this is great. I put new unit types and time left up because this is what happens to have prepared in the slides. Okay? And the interesting thing is that I view this as letting the user, not the programmer, the user. I mean your grandma extend this language with new units. So you want to do it in such a way that they don't need to learn Python and change the interpreter. They can just define a new units. And there is something cool about the time as well. So let's go there. Okay? So we could, yes, add variable functions. These are the operators. But let's see what we'll do. Okay. So how do we grow the language? So now I don't mean growing by the end user. I mean by us. We'll build the language sort of piece by piece. So we'll start with simple arithmetic numbers. And the language is defined by syntax and by semantics. Syntax will tell you what. How the program looks on the surface, right? What operations you have, how you can compose together. And semantics will tell you what they mean. What does it mean to do multiplications of integers? If I take five and two and multiply them, what is the result? That's semantics of the language. Okay? So in this language, what do we have in this sort of subset for arithmetic expressions? We say the program needs to be an expression which could be either an int. So a number. Let's just have ints. It could be an expression in parentheses. Or it is two expressions connected with some operator. And the operator can be well plus minus times division and power. Right? So this is nothing but a grammar that defines the language. We need to use a grammar because these expressions could be arbitrarily large. So we need a grammar to define an infinitely large language, right? Because the grammar recurses to itself, right? This e recursively refers to that, right? So one example from this language would be program one, program one minus two. It could be also two minus three divided by four. These are all examples from this language. Now, the meaning so the syntax defines that we have these programs. In here we're assuming we have some floating points, some floating point numbers as well, but that doesn't matter. Now how do we define what e1 plus e2 means? So given the values of e1 plus e2 e1 plus e2 is the sum of these numbers. So we're saying whenever we see a plus operator we need to take these numbers and add them. Seems trivial, but we have some subtle issues there, right? So when we want to tell somebody go and implement this language for us, he will come back to us and ask well, what should plus do? What little details do we have to spell out for the implementer? Okay, so we need to parse the numbers. So there are these little things like how is the number represented as a text? Is it binary? Is it decimal? Can it have a decimal point? Okay. Once we turn them into some binary representation then how do we specify what plus means? What other issues are there? Exactly. Add these 32-bit numbers and after that they throw an exception or overflow to back to zero or back to negative 2 to those 31 or something. Okay, so all this is important. So these are roughly these questions. Now what we are going to do is say we are not going to worry about this. We are going to implement this language in Python and we will use Python Arithmetic to implement our plus. So whenever we see our plus we will just call Python plus. Whenever we see our times we call Python times. Okay, let's see whether this works for us. Okay, and the same, right, for float. So there is nothing much to say here. We just made the decision to delegate Arithmetic to the Python interpreter. So now how do we represent how little it takes to scare two adults? So how do we represent the program? So we talked about the fact that these programs will be trees and so there is nothing much more we need to say except I'd like to make the distinction between the concrete syntax which is what you type in and abstract syntax which is what the program is represented with internally for the interpreter, right? And so the concrete syntax is 1 plus 2. The abstract syntax would be 3 plus 1, 2. This is the Python notation for what kind of data structure. It's a tuple. In this case it's a triple which has a string for plus. We represent operators as strings. It's not beautiful but doesn't really matter in this case. And then two numbers are the leaves, right? And then this is a somewhat bigger tree and so on. So the concrete turns into abstract and it's abstract because useless stuff that is needed in the text like parentheses are abstracted away. So this is so-called abstract syntax tree or ASD. This is essentially what we'll be working with for first few weeks. This should be all well known from 61A. Okay, so now we are going to build an interpreter. So here is one ASD. We built it by hand. We also have a parser that you can download. It will parse it for you from the concrete syntax to this ASD. And now we want to call an eval procedure that walks over this tree and prints the result back. So how do we write the eval? What sort of program structure will you use? We have all written such evaluators in 61A, I bet. Unless that was the week when the weather was really nice and you went to the beach instead. So what is the structure of the interpreter that we'll use? Okay, we need more people to this. Hopefully it's obvious. Well, let's try over there. Yes, please. So you do recursive evaluation. So you presumably walk the tree. Are you propagating the values in order, pre-order, post-order, bottom up, top down? How will that work? Which is bottom up, right? So we have a star with a plus and here we have three, four, which is this tree here and then five. We'll push the values bottom up, right? So here we'll push value five, here we'll push value four and three. What value will flow here? Seven and then, oops, the result would be 35. That's it. Not all evaluators work like that, but this is what this particle language needs. So what do we do here? You get the tree and so this E, okay, what do we do? What do you think we are checking for in here? Let me show you a few more cases. This is Python, which you may not have seen, but it should be self-readable, self-documented, self-evident, everything. This is what's good and bad about Python. So what does this do in the recursive scheme? Please. So this is the leaf of the recursion, or you could say the base case, right? And so this would be invoked, right? This line for three and four and five, this would be this line. So type one evaluates to int because the type is int, and if the type of E, so this is the E, is int, then we just return the number. If the number is a float number, we don't have one in this program, then it is also return. Okay? So what do you think this case means here? I see you, but I want others to participate as well. So what would be the type with this funny parenthesis inside? It's a tuple. So this is essentially says, if this is a tuple, this means we are at some internal node of the tree. Okay? And now you are breaking the tuple, you are looking at the first operator, which is here and see, oh, it's a plus. You recursively evaluate the left child or right child, then you take the result, do a plus, you return it. Same for all the other operators, including power. See, so we are turning our operator power here into Python operator double star, because that just happens to be the index for power. Okay? Is there anything funny here that you see any dangers? Okay, so if the input is an empty tuple, so essentially an empty tree, then yes, this will crash with a big fire, right? So there is no error handling here. We are assuming that these trees are nicely well formed, so an excellent point. There is absolutely no error handling and no interpreter should look like this, but I want to have it nice and small on one slide. Okay? It doesn't differentiate between unit and integer, because so far we are only assuming integers, so there are no units here so far. Okay? Please. And division by zero, okay, excellent. So this is another error handling that what will happen if I divide by zero? Is the program going to crash with an error message or is there going to be an error message? Who generates it? Yeah, the Python interpreter will be called here, right? It will perform Python this division with this value and the zero. This will evaluate to zero, right? And an error will be thrown by the underlying Python interpreter, which may not be the right thing. We may want to check for zeros ourselves and throw our own message. If you are writing an interpreter in some strange language, I don't know, Mesopotamian, then you may want to at least translate the Python-English message into that language. An illustration by your own error checking may be useful. But otherwise, this should be really straightforward unless you have some interest. Let's leave your comments for the more fancy stuff. Okay, so we are done with arithmetic expressions. We are going to do physical units. So far only the standard units like meters and kilograms, not feet and so on, all right? So we want to do now things like this, two meters squared which evaluates to four meters squared, okay? And what we did so far is absolutely trivial. We just said we now define units, okay? And we add u among the expressions. So now we can have things like one times m, okay? So did you notice a new, funny operator in our grammar? So what is, okay, what is the epsilon doing there? So the epsilon you may not know means an empty string. Do you know what construct are we enabling with this? Okay, things like thanks to this you can write one m is now legal in this language, okay? Yes. Or you're saying that what we want is a matching number of left and right parenthesis. Oh, I see. You're asking what does it mean when I write say 2 space 3, right? Is this 2 times 3 or is it 23 or is it 2 3, right? Okay, so that's a good question. I was hoping to avoid it for now since we'll talk about these issues later. But usually the programs are compiled in such way that first you have a lexer, okay, which is our lexical analyzer and what it will do it will first look at the input down into sort of chunks of text. So the chunk of text would be 42 space and 35. And the space would be omitted, actually. So 42, 35 would be actually the sequence of lexines or tokens that is passed down to the evaluator. So this lexical analyzer would be responsible for throwing out spaces and merging when spaces are irrelevant and dividing things where they should be divided. So here the space is important because it breaks 42 and 35 into two things. Now whether the parser which comes later and processes these tokens understands this to be a multiplication it's a different story, but in our language it should, right? Because if we have 42, 35 they can only be next to each other with this operator between them. And we presumably are defining this to be multiplication. The meaning of 1 times N is the need. So remember this here is concrete syntax which the parser turns into the more simple abstract syntax. So notice what happened here. Do we have a multiplication here? We don't, right? Yet the parser, so oh I see what you really meant is the epsilon operator and the parser decided to put the star at the top of the tree because it is really 3 times M squared. So notice this. There is really an epsilon here, okay? An empty string, please. Oh excellent, I like it. So 1 plus all right. Don't tell them I paid you $20 to ask the question. Okay, so this is a fantastic opportunity to ask the question. Clearly according to this grammar, 1 plus M is a legal program and so is 2 feet minus 3 kilograms. So the question is presumably we want to catch these errors and give the programmer an error message or a warning. Where will that error checking happen? In the parser as we go from the text into the tree or in the evaluation as we are walking the tree choice C, these two will have to collaborate together to make the error checking possible and D is well we can catch these errors in some cases but not in general. So who thinks it's A? Don't be shy, you'll have to raise your hand once during these four options. Okay, so A. Okay, and who thinks B? Okay, how about C? Okay and D? Okay, so there are quite a few people who thought either A or C or D. Well even B is, okay, try to now talk to your neighbor and convince him or her that you are right. Without violence, of course. So, okay, so it has quieted it down. Some people are not talking so I think I'll have to write a random number generator to poke at people. So let's take the vote again. So who thinks it's A? Okay, it looks like opinions have not changed that much. B? Okay, there are more B's now. Alright, how about C? A few are C's and D? Alright, so you have an example that cannot be called. This is good. I'd like to hear. B and D, okay. But there are some cases that you'll miss. Oh, I see, I see, I see. So you are really thinking about can we verify prior to running the program that there is no error? Okay. So that might be difficult depending on the language. In our language we could do it unless we don't know what value the programmer will input. So I see what you are saying. But I think we are working in the domain where we would be happy to just sketch those errors during the evaluation time rather than prior to seeing the data. So not compile time, but we are happy to catch them all at evaluation time. And these errors can be all caught at evaluation time easily. Now clearly you know how you would do it during the evaluation. You propagate the types, bottom up, right, together with values. And when there is some mismatch, you are subtracting kilograms from feet, you throw an error. That's exactly how it works. Now, why would it be difficult for the parser to do such a thing? Parser can clearly catch an error here because it will see, oh, this is a number, this is a unit. Clearly these two should not be added together. It's obvious, right? Right, so if we assume that the parser only has this local knowledge, it can see 1 plus m, then yes, this error can be caught. But as soon as you have something like, well, 1 plus m, all of a sudden the type is not local. It's hidden under the parentheses somewhere deeper. And now the parser would need to propagate the type. It can. But now it's all of a sudden doing the job of the evaluator, right? So it's not that parser cannot do it, but it can do it by playing the role of the evaluator and pushing the type bottom up. So I'd say that this is the right answer, okay? So the units are essentially types. We will be propagating now these types during the evaluation because we need to know at each operation what is the type of a particular value. So how do you think we will represent these types? What would you suggest? So how would we represent this AST, right? This AST when evaluated it is 1 meter squared, right? What data structure would you use to propagate the value? Okay, somebody else now. There are many choices, none of them did the right one. There may be some less optimal than others, but more or less anything we'll do. Maybe a tuple. So what would be stored in this tuple, okay? So for this case, what would be in the tuple? Okay, so that definitely will be a tuple, right? We'll have the value, the numeric value and the unit, the type. Okay, let's call it a unit from now on. How do we represent the unit? Now here is where we have more choices. Okay, somebody who didn't speak can make a guess or inform suggestion. Now you need to start predicting what operations we'll have to do on this unit. And now some operations will become better than some representations. Uh-huh. Two m's meaning m squared, okay? Uh-huh, all right. So what would work? Try to think abstractly. Are we mapping something to something or is it the set of somethings, the set of pairs? Okay, so what about things like meter per second, right? We could have values of that type. The result of some computation could be five meters per second. So we need to be able to represent this meter per second thing. So it would be a list of what? We would have a list on the nominator and a list for denominator. That could work. So I think that's right, because you cannot have a plus, so you are saying it doesn't make sense to have something that represents meter plus kilograms, right? Okay, so that's why just nominator and denominator. Okay, I think that's very good. That could work. A simplification of this exponent, exactly. So now we are getting closer. It would be a set of pairs. You need an exponent, all right? Uh-huh. And so, but now we need to understand the semantics a little bit better. If you're saying, well, if it's a set, then I can represent something like m, let's put it as a string, two m1, right? That's a set of pairs. But you would like to say that this unit is illegal. You would like to represent this differently, right? You would like to represent it as m to the third. You know what I'm saying? That what you are saying is correct, but if we allow any set of such pairs, then I could create such a set. And it's probably not what you want. Gray shirt. Okay. So it would be map from, you could say si type to exponent, right? So this would now become essentially m mapped 2, 3. So if I write a program which says m squared times m, it will be normalized into m cubed. It's sort of essentially what you said, but cannot nickelize to the sort of normal form, all right? So this is indeed a map. How do you represent a map in something like Python? It could be an array or a hash array, exactly, right? This is what we'll do. So my pen misbehaved just a second. So here this is essentially a map m map to second. So here is our tuple. This should be an arrow, okay? 3m to the 2. So with that, I think that we actually got it quite nice. And so here is the program. Very little change is needed. Now, this eval still takes an AST, so the type of this is AST. And it will return a pair of, say, a number and this unit. And I'll see how we evaluate. When we do subtraction, a unit and another number and a unit, and it will check are the units compatible, are they the same? We cannot subtract meters squared from meters. So here the normalization to the same representation is actually the right thing, okay? Oh, okay. So let's hold that salt for a minute because so far we are only doing SI units, so milliseconds are not allowed, but we'll do it in a second, right? Here the fact that they are normal allows us to do a really simple test of compatibility, and if they are normal you just keep the same unit and do the normal subtraction. Multiplication does the subtraction as a multiplication of numbers and you need to write multiply units, right? What does multiply unit do, you think? So this will take, say, m squared times kilogram squared. It will just merge these two hash sets together, right? If it is m squared times m to the third, it will result in m5, right? This should be relatively straightforward. You can actually click on this and see the code. It takes you to the version of the interpreter which has just exactly the sub-language. Okay, so now we add known SI units. So things like milliseconds, okay? So now we want to add feet, here, and in fact milliseconds here, okay? Uh-huh, okay. So let's go here. So this would be the right place to So look at this expression here. This one here is three times meter squared, right? And I'm going to evaluate it into a Python data structure that is a tuple of number three, and now this unit here, and the unit does what? It is a hash table that is sort of the key. So this is the key. Okay, the key is the string m, and the value is two. So this is, sorry, I should have explained this, the Python syntax for a associative array which maps m the key to two. And of course it could have other exponents. It could have kilograms in it and so on. Good question, please. Okay, so if you, uh, well, so what happens if you do meter divided by meter? The result needs to be one comma and the description of a unit that is actually no unit. So how would you represent that? Okay, as an empty, right, in fact it would be a good question. So m divided by m would evaluate into one and an empty set, which means you essentially have no units associated with it. Which could be the same as m with exponent zero, but at least in my implementation I believe such would be flushed from the hash set, just so that it's not printed as m to the zero at the end. Okay, so like Joe, for Joe, the unit of energy will get there too. Okay? So, non-SI units, so we want to do feet and milliseconds and clearly the foot will evaluate to whatever 0.3 meter because we want to normalize it to that and how do we do it? It's very simple. The leaves of the AST when we evaluate this feed, we evaluate it into 0.3 whatever of the SI unit. If we cannot do it automatically, we need to code it into the interpreter and it needs to come from somewhere. So you effectively take these non-SI values and we turn them into these SI representations. Well, how does it happen? Well, look at the interpreter. If it is one, you just return one with no unit. If it's float again, one with no unit. And now it becomes interesting if this E is a unit, any unit for that matter. So we have encountered a string in the AST. We go into this lookup function and what's in this lookup function? Can you see what we are doing? The lookup function takes this string, the representation of the unit, uses it as an index into what? Into this hash array. And if it finds feet it returns this value which is 0.3 meter to the first. And you can see how you would do a millisecond here, right? You would here have millisecond map to 1,000 of a second. So all these units you need to of course insert here yourself because where else would the system know it? Well, so you are essentially saying rather than normalizing everything into SI, let's keep it in feet and propagate sort of feet units up to multi only normal, okay. So that's actually that's an excellent comment. So what we are doing right at the leaves of the evaluation we are turning feet into meters and other such things and then we do the evaluation in meters and maybe somebody actually wants to print the result in feet. So we will do one conversion from feet to meters here and then on the result we do conversion from meters to feet back. We could do it. It would mean more complicated evaluator because it needs to work with non-SI units and you could do it. I just wanted to have something simple. If the arithmetic is with infinite precision then we are clearly not losing anything, okay. So if you can rely on Python using infinite arithmetic then these two equivalent and my solution is probably simpler because inside these operators we are only dealing with SI units and the conversion to SI at the leaves so all feet goes to meter is fantastic because I no longer need to check when do I do say addition or is feet actually a unit of length or not because it would be converted to meters and because it has the same unit as the other operator on the other side of the plus I know they are compatible and I can do the plus. Otherwise I would need to have another table somewhere which will tell me feet are actually meters and therefore they are together, okay. So you can see here how we do the evaluation. For example 3 has no unit the plus will have this unit the result has that unit after multiplication we have a unit that is 1 plus 1 gets into 2. Okay. Now the more fun stuff, okay. So in the interpreter that we have written so far if we write 1 meter divided by 1 year the result will be 0 meters per second. That's clearly not what you want. Anybody can venture a guess why this happened? We would like to have a more accurate answer, right? 0.00 something. So why did it happen? Right, because we put an integer on the input and the interpreter just delegates it into the integer division in Python and integer division in Python produces an integer so it correctly rounded it down to the closest integer. So how would we fix it? Well we need to have a little bit smarter division, okay. And what we do when we see something like int divided by int we'd like to keep it as int whenever possible whenever loss no loss of accuracy would happen. If producing int would mean that we are losing some accuracy we need to turn it to a float. So now we see how you may want to build your own arithmetic operators on top of those provided in the underlying implementation language. So again we keep it as an int whenever this would mean no loss of precision. Okay, so again there is the code there. Now this is becoming more fun because we are going to handle this operator here. So the first question is how do we extend the grammar with this intc where c is some unit. Okay, so this is our first attempt. I just made intc part of the expression. Is that a good idea or a bad idea? Well c would be whatever can appear I didn't specify what c is but let's imagine that c are any units that can appear after int so you can say int feet, int feet where in meter divided by seconds. Let's keep that open for now. So without knowing precisely what we allow for c is this a good idea what I have done. Try to think where e in c can appear in the resulting program. Please. Exactly, so this is exactly the problem with this is that the way I have written it now you can write this expression which has int feet inside parentheses. What does it mean? We are doing a conversion which doesn't make any sense. The int presumably only wants to convert how you print it. So the second attempt is right. It will be a top level operator and it will decide how the value is printed. So now we have the program is either an expression without int or expression with int and then the rest as before. Okay. Does it make sense for c to be this? Yeah, no, somebody doesn't like it. Okay. Please. So it is true that we cannot subtract meters and seconds but we already made a choice that the parser cannot catch this. So we could in principle allow it here. The question is whether adding, whether allowing plus and minus in c gives us anything, any power. Would you ever want to say print something in meters minus feet? Probably doesn't make much sense, right? So the question is do you want to say two feet in meters minus millimeters? So this is nonsense, right? So this c would of course allow us to do it. So we don't want c to be the same and arbitrary combination of units. So what do we want to have in c? Which operators make sense? Right. So in terms of which units we want to allow, it would be the whole dictionary, right, of si units and non-si units, so that's good. How about operators? Looks like we don't want minus or plus because they are meaningless when you are defining the format in which things should be printed. How about division and times? Makes sense, right? You want to print something in meters per feet, sorry, meters per hour rather than meters per second. So the right value seems to be c would be multiplication. Also this little epsilon here division and turns out that we want to have a power as well, right? It took me actually a while to get this right. But this seems to be what we want for c. I think you could write. So this needs to be okay, sorry. I knew there was a bug somewhere here, okay. I think this needs to be all c and I think this is okay and then a u for base case. Okay, that's right. Seems to be correct, maybe not entirely correct. So meter in parentheses squared, yeah, you could allow it. I think at that point I went to Google calculator and see whether they do it and I said no, let's keep the exponent simple. So this is sort of expressive enough for you to express everything you want, maybe not with all the parentheses operators. Question? And non-SI, absolutely. Yeah. And now I don't know what would happen if you print something in meter per feet. It could be that that we want to complain. Oh, you would need a different symbol for pound as a force and pound as mass. No, I think you would need to have a different sort of text symbol for the two. So before we close, how do we actually evaluate c? Now we have an expression that has e in c, so it's a tree that has in on top. It has expression here on the left that we want to evaluate. Right, so we have now in and now we have expression here and we have a c here. We know how to evaluate e. And ideally at this point I don't want to make any changes to that part of the interpreter, because I debug it and I don't want to touch it again. I would like to evaluate c somehow and then write the case statement in my Python interpreter for in. So what should c evaluate to? So that's essentially right. It will be a pair of value and a unit, but both of them are different than before. So syntactic it is again a number and a unit except this unit here is not converted to si. It will stay in the original format. So if I say feet times meter times feet it will stay in feet squared times meter, because that's how I want to print it. So the evaluation is different. Not the normalization, but you just collect them all and collect their exponents. You collect it too. The first number again is not a value but it's sort of conversion between the si value and this funny value, feet squared times meter. So this is the value used at the end to scale up the result we got from the evaluation of e. You need to know it's still there. Yeah. I circled it so it could be there. It would be handled just the same way. So I have a few more extensions to the language, but at this point I want to stop and ask you to fill out a little questionnaire about the courses you are taking and a little bit about background. There are some simple questions that should take you just maybe three minutes to answer, I hope.