 Welcome to chapter two. Chapter one was all about abstractions for flow of control in a program. Chapter two is about data abstraction and the mechanisms we have for that. So that will be a little bit of a change of pace. And I'm going to start by trying to introduce the idea of what data abstraction is. So up on the screen, you see a very simplified version of counting the number of points in a hand, like in the 21 project. Simplified leaving out aces and picture cards so I could just keep the thing stripped down to its essence. Namely, if the hand is empty, then its point count is zero. Otherwise, we add butt last of last of hand. So butt last is leaving off the suit. So it's the rank of a card of the last card in the hand. I'm doing it right to left only because it makes the example work out nicely. And we add that to a recursive call to total hand for the rest of the cards. So take one card, take how many points it is, add that to the number of points in the rest of the hand. Simple. The trouble with this is probably completely apparent. There's this call to butt last right here, which means the rank of the card as opposed to the suit of the card. And there's this call to butt last here, which means something completely different, namely the rest of the cards. So the form of the program doesn't really help you understand that. And in particular, if you imagine that this procedure is just one of 1,000 procedures and a big complicated program, and for some reason you decide that you're going to read the hand from left to right instead of right to left, you want to change some of those butt lasts to butt first, but not others. So you have to find the ones that are about hands and not the ones that are about cards. And that makes things a little complicated. Let me just show you that it works if it does. So it's 3 plus 10 plus 4 is 17. So now what I'm going to do is rewrite it a little bit. This is basically exactly the same code, the same algorithm. But what I've done is to find some synonyms for last and butt last. So here, instead of butt last, it says card rank. The rank of the card, how many points is it worth? And this here instead of last, we have one card out of the hand. And here we have a recursive call on remaining cards of hand. So not only is this much easier to read, but if I change my mind about how I'm going to go through hands or about how I'm going to represent cards, instead of having to go through all 1,000 procedures looking for every butt last and trying to decide which kind of butt last it is, I can just change these definitions, either the ones here that are about cards or the ones underneath it that are about hands of cards. If I made an internal definition of those things that are highlighted, it would work. And yes, it would just like any internal definition. I wouldn't do that, though, because remember, we're imagining 1,000 more procedures that are all about cards and suits. And I want to use these consistently for all of them. So I wouldn't do it as an internal definition. I would do it this way. All right. So I haven't really changed anything. I just made up some synonyms. And because of that, I have more readable code and I have more maintainable code because I can change one of these two abstractions, namely the one for cards, dividing a card into a rank and a suit, or the one for hands, dividing it into one card and the rest of the cards. Well, the next step. Oh, sorry. So these procedures, card, rank, and card, suit, and so on, these are called selectors. A selector for a data type is a procedure that takes an instance of this data type and pulls out some piece of it. So it selects one of the parts, like selecting the rank or selecting the suit of a card. An abstract data type means that it's one that isn't built into the programming language. It exists only in the programmer's mind. So Scheme doesn't know anything about cards or hands. Scheme knows about sentences. And I'm using those words and sentences. And I'm using those to represent cards and hands. So these are selectors. And then the next step is that if I'm going to use these selectors, it's really kind of cheating for me to use a quoted sentence like this as the argument to total hand. Because if I now want to change what a hand of cards looks like, my input to total hand isn't going to reflect the change that I made. So instead of that, what I should do is also make constructors. So here we have the constructor make hand, which is just sentence. And here is make card. And I actually put a little frill into it that instead of just saying HSCD, you can say Harder, Spader, Diamond, or Club. And it takes just the first letter of that as the representation of the card. So now I can call total hand this way. So let me actually go ahead and load this all up. And now, I mean, this looks like a step backwards because I had to do a lot more typing. Instead of just saying quote 3H, 10C, 4D, I'm now saying make card 3, quote heart, and so on. But again, it means that I can change the representation and my program will still work. Some programming languages actually provide a feature to let you specify abstract data types in your program and then enforce them. So you can declare some variable to be of type card. And then it won't let you say first, or but first, or last, or but last, but you have to say card rank, and card suit, and so on. Scheme doesn't do that. Knock on wood, there's a new revision in the works. It may do something along those lines. But Scheme doesn't do that. And for our purposes in this course, that's a good thing because it kind of makes you pay attention to the abstraction. So the language doesn't act as your conscience. You have to do it yourself. OK, now just to make the point, here I'm changing the implementation of cards. I'm leaving hands alone. But now, instead of a word like 3D for the three of diamonds, I'm using a number between 0 and 51. And 0 to 12 are the hearts, and 13 to 25 are the spades, and so on. So if I copy these down, the important point is I didn't change total hand at all. And yet, I can now run total hand and still get the same answer, even though make card, or quote, diamond, is now 30 instead of being 4D. So this is just to prove to you that I can change the representation. And as long as I consistently use the constructors and selectors for my abstract data types, changing the representation doesn't require me to change every single procedure in the program. That's abstract data types. Questions? Easy, right? Good. OK, so before we go on and talk about pairs and lists, let me take a moment out to talk about the fact that a week from Wednesday we're having a midterm. Pairs and lists are more fun. So here are some things about the midterm that you need to know. It covers everything up through this week. So it's up to and including the homework that you're going to do this weekend and turn in on Monday. But it doesn't cover the lectures of Monday and Wednesday or next week. That's point number one. Point number two, you get one of the questions ahead of time. This is question zero worth one point. OK, so 1.4. Your name, that should be pretty easy, right? Your class login for this class, every so often we get a CS61C-something that we'll do. Your TA's name, if you're one of those people who never goes to section, make an exception this week and learn your TA's name. And your lab section number, not what day and time it is, but a number that's somewhere between 11 and I think 21. So those are the things you have to know for question zero. And you get a point because if you do know those things and you pull out the form correctly, it really helps us a lot in entering grades and in getting your exam back to you afterwards. That's why we need to know your TA's name and your section number so that we can put the exam in the right pile to be given back to you. We are going to do our best to get this first midterm graded before Friday of next week because that's the drop deadline, which is one of the reasons we have an exam this early in the semester. Don't take that as a precedent for the later midterms, where there isn't any such pressure. And one last thing, I think. This exam features a group component. You should by now be in a group of four people from your section. And you are going to take part of the exam in your group. And the way that's going to work for this midterm is as follows. First, you're going to take the entire midterm individually. And then we're going to collect those. And then you're going to redo one or two of the questions in your group. And your grade is a weighted sum of the two scores. Unless your individual score is better, that probably isn't going to happen. But if it does, you'll get your individual score. So don't panic. So it would be a really good idea for you to get to know the people in your group. That'll help you in the exam. OK, I think that's it about exams. OK. Every programming language has some way to build up data structures out of pieces. And you may have learned before a programming language that uses arrays with indices and so on, for i equals 1 to 10 and x sub i, bubba. In Scheme, we do have those, actually. You'll see later. But the main data aggregation mechanism that we have is something called a list. And lists are made up of pairs. So let me start by showing you a pair. This is a pair. And it has typically an arrow coming in and some arrows going out. And so the book uses the example of rational numbers. So if we wanted to represent the number of 3 fourths, we might put a 3 here and a 4 here. OK, that's a pair. Even though this is not an abstract data type, it's a, I guess, concrete data type, it does have constructor and selectors. The constructor for pairs is called Hans, which stands for construct. So you can see that this really is the kind of the key data aggregation mechanism. And the two selectors are called left and right. No, they're not. That would be too sensible. They are called Carr and Cutter. Why are they called that, if you want to know? Well, the very first implementation of LISP was on the IBM 704, I think, computer, which had a register that was divided into pieces. It had a little piece and then a big piece and then a little piece and then a big piece. And the little ones were called the prefix and the tag, which you don't have to worry about. And this one was called the decrement. And this one was called the address. So those names stand for the contents of the address part of the register and the contents of the decrement part of the register. The one on the left is the car, and the one on the right is the Cutter. Don't ask me why. It's before my time. The reason that these names are still in use, despite how stupid they are, is that we can build data structures by hooking these together. And let's say this pair is called P, and I want to know what's down here. Well, this is the Cutter of the car of the Cutter of P, otherwise known as the cadetter of P. And it's because you can do things like that that these otherwise stupid names have lasted. But having told you that, I should also tell you that you're never going to write cadetter. Because if you do, you're probably breaking a data abstraction. So probably what this really is, is something like the y-coordinate of the endpoint of some line segment or something like that. And so you would say, instead of cadetter, y-coordinate of endpoint of line segment. So in the library is this helpful procedure that teaches you how to pronounce these names. So this is Cutter-dadder, not to be confused with Cutter-daddar, and so on. OK. So what's in the top half of the screen here is an implementation of pairs. Supposing we didn't have any kind of data aggregation in the language at all. Surprisingly enough, that's OK. You don't need it. This code is in the book, except they haven't invented quote yet, and so they use 0 and 1 to represent car and Cutter. But since we do know about quote, we can use car and Cutter. So if I were to read in this file, it would define cons and car and Cutter representing a pair as a function, this function, that takes either the word car or the word Cutter as an argument, and it returns either x or y, the car or the Cutter of this pair. And then we can define car and Cutter, the selectors, simply as a little syntactic wrapper around invoking the pair with the word car as its argument. I'm not going to spend a lot of time on this. It's in the book. It's just really, really neat that you can do this. And it's an illustration of the general point that if you have lambda in your language, that's all you need. In principle, you can derive everything else from that. The main way that we use pairs, though, is not for things like rational numbers, but as the implementation strategy for a list. A list abstractly is just a sequence. In the book and in my notes, you will find a lot of pictures that look like this. This is an abstraction diagram. And what goes above the line here is what the abstract type is that we're implementing, like, say, a sequence, a bunch of things in parentheses, looking suspiciously like a sentence for reasons we'll talk about in a moment. And the way we're going to implement that is this way. So a list, a sequence of n elements, is a set of n pairs. And the car of each pair is an element of the sequence. And the cutter of each pair is the next pair along, except for the very last one, whose cutter is this special thing called the empty list. The empty list is not a pair, but it's the only list that isn't a pair. So this is kind of a recursive definition of lists. A list is a pair whose cutter is a pair, or the empty list. Now, what goes in the middle of the diagram is the constructors and selectors that define this abstract type. And the reason they go there in the middle is these are the only procedures in your program that are allowed to know both what abstract type we're implementing and how we're representing that type concretely. So when I have cards represented as a word with a rank and a suit, I have the constructor make card and the selector is card rank and card suit. And those three procedures are the only ones that know what a card actually looks like in my program. All other procedures either are part of a lower down abstraction, like words, or are things that use cards, like total hand. The total hand doesn't know what a card looks like. It only knows that they have ranks and suits that are represented somehow or other. So the things in the middle are the implementation of the abstract data type. Now, how about this abstract data type of list? Well, what there should be is a constructor called list and selector is called first element of list and rest of list. Instead, this is really the only thing in the language that's quite like this. We use Kanskar and Kutter, which is what in other contexts we would call a data abstraction violation. You lose points if you do that, because you're not respecting the abstract data type. The reason we treat lists specially is that lists are only sort of half abstract. So the language really doesn't know about lists. Let me show you what I mean. Suppose I say, I want to do 3-4. So I say Kans 3-4. I get something that looks like this. Parenthesis, and then the car, space, period, space, and then the Kutter closed parenthesis. This is not a list of three things. That period in the middle is a piece of punctuation, just like the parentheses. There's only two things here, the left thing and the right thing. So don't be confused about that. But now if I say this, I don't get openParenthesis 3. OpenParenthesis 4. OpenParenthesis 5. Open, close, close, close, close. I just get 3-4-5. Because we have this special shorthand representation for pairs that are hooked together in this particular way to form a list. And since the language knows about lists, it's not as if we're going to change the way lists are represented. So we don't have to treat lists as an abstract data type. OK? We actually do have other constructors for lists. Yes. How's a list different from a sentence? OK. A sentence is a list in which the elements are constrained to be only words. So while here, if I do this, I get this interesting thing. I don't get a list of four elements. I get a list of three elements, the first of which is a list. Because if you look at the picture in the abstraction diagram of what a list looks like, the car of the pair is one element, even if it is itself a list. So one difference between lists and sentences is that you can have lists of lists. OK? That also means that you can get in trouble. So if I say cons 3, 4, 5, I get this thing with a dot in the middle of it, which isn't what I wanted. I wanted to put 5 at the end of the list, but I failed at doing that. Because the use of pairs to make lists is asymmetrical. The car of a list is a different kind of thing from the cutter of a list. The car is an element, and the cutter is a sub-list. OK? Yeah. OK. Why didn't I get, he's asking, in this example, why didn't it come out, oops, this example, why didn't it come out open, open 3, 4, close, open 5, 6, close, close? That's what I'd get if I said this, list. So list is another constructor for lists. It takes any number of arguments. It's a little bit like sentence in that way. And it constructs a list in which each element of the list is one of the arguments. So if you give it six arguments, you get a list of length 6, OK? Cons, remember, look at that picture again of how we make lists out of pairs. So if I say cons to produce this pair right here, OK? What I'm doing is I'm saying stick one new element in front of this whole list, OK? So when I use cons to make a list, the first argument is an element. The second argument is an already existing list. So just add one element to the front. There's also a third list constructor called append. This acts a little more like sentence. It makes a list of the elements of the arguments. Sentence is different because sentence will accept words as well as lists as arguments and puts them together. In fact, here's an implementation of sentence. And at its core, the implementation of sentence uses append, OK? So the essence of it is right here. But since append only takes lists as arguments, first we say, is A a word? If so, make a list of length 1 out of it and call sentence again. Is B a word? If so, make a list of length 1 out of that, call sentence again. So by at most the third time through, we have all lists. And then we can, OK? OK, so three constructors, cons, list, and append. And you're probably going to have a wrong intuition about which is the most useful. The one that seems the most obvious is list. I want a list of four elements. I say list, boom, boom, boom, boom. The problem with that is, mostly when you're writing a program, you're not making a list of a fixed number of elements. You want your program to work on any number of elements. List is only good if you know ahead of time how many elements they're going to be. So for example, you have an abstract data type, which is a point in three space. So you represent it as list of x-coordinate, y-coordinate, z-coordinate. That's a situation where you might say list number, number, number, or expression whose value is a number. But mostly you don't do that. So mostly list is the least useful of them. A PIN is occasionally useful because you did some calculation for each element of an existing list, and the result of those calculations in each case was a list. So now you have a list of lists, and you really just wanted a list of the underlying data. And then you flatten it using a PIN. So that's sometimes useful. But it's going to turn out that the most useful of all is cons because you're writing a recursive program to go through a list structure. Just like what we did with squares, we said a sentence of the square of first of nums and squares of but first of nums. We'll do the same kind of thing where it'll be cons some function of car of list onto recursive call of code of list. And that'll be a typical pattern for a recursive list procedure. So that's the one that's going to actually be useful most of the time. OK. Now let me talk about why did we bother inventing sentences? Because lists come with the language. Why did we make this whole extra data type? And if you look at the textbook, forget about my lecture notes, just look at the book. Chapter one is entirely about numbers. And so they have some pretty mathematically hairy examples about derivatives and iterative improvements of square roots and stuff like that. And they do that because while you're just learning about the control mechanisms of the language, they don't want you simultaneously to be struggling with the representation of data. And in particular with the asymmetry of lists. So one of the things you're going to do in lab is write a function to reverse a list. And that's really easy for sentences because you can take the word at the front of the list and stick it at the back of the list. But if you try to do that straightforwardly for lists, you end up with something like this example where I tried to stick five at the end of the list three, four and I failed. So they wanted you not to have to see dotted pair notation by mistake at the very beginning of the course. So they get you all up to speed on control and then introduce data. I thought when I started teaching this class, we should have some more interesting examples from the beginning and so I wanted a foolproof data structure. So like lists but not lists because they're not foolproof and the idea actually came from logo which is a programming language for kids that deals in words and sentences and we just implemented that data, those abstract data types based on the things that Scheme already has. So now that we know about lists, most of the time you'll just use constant car and cutter if what you're dealing with is a sequence. Although we might ask you an exam question about lists of sentences to see if you can figure out when to say cons and when to say sentence. So that's the idea that we wanted a really simplified data structure and I think it's worth talking about that not just to satisfy your curiosity about it but to start you thinking about what's involved in designing a programming language and how might you make different design decisions depending on what you're trying to do with the programming language. So we're trying to teach a freshman class and that means what we need is a little bit different from what we would do if we were programming in industry using basically the same language. Questions? I'd rather I'm way ahead of time. That's good. Okay, let me start talking a little bit about box and pointer diagrams. This thing is called the box and pointer diagram. The pairs of the boxes and the arrows of the pointers and one of the things we want you to be able to do is draw these diagrams either explicitly in response to a question like draw the box and pointer diagram for such and such or to help you answer a question by really understanding what a data structure looks like. So here's how we draw a box and pointer diagram. Let's draw the box and pointer diagram for this. Okay, list. First thing I'm gonna do is I'm gonna count elements. One, two, three, four, five, six elements. Not elements of elements, just elements of the whole list. So since there are six elements, I'm gonna go ahead and draw the spine of the list. Here it is, okay? First step. Did I get it right? Okay. Now I can start filling these in. It doesn't really matter now what order I do it in, so I'm gonna start with the easy ones. A, seven. Our elements of this list. How about the empty list? Well, when we have an empty list as a piece of a pair, we just put a slash through the corresponding path. So usually you see a slash as in this pair in the cutter of the last pair of a list, but this time it's in the car because the empty list is an element to this list. Okay? All right, now let's do three, four. That is itself a list of length two. So I'm gonna go ahead and draw its spine, length two, and then fill in the elements. Three. How about this? How many elements? Two. Second element is just w. The first element is a list of length one. So here it is, a spine of length one, in which I'm gonna put a b. And last but not least, this is a list of length one, and its one element is this list of length one. Okay? That's how you do it. Basically start at the top, draw the spine first, and work your way in. Okay? So some crucial little details about box and pointer diagrams. First little crucial detail is this thing. This is called a start arrow. It's really important to have one so that we can tell where the beginning of your diagram is, especially when they get complicated and things kind of wrap around all over the place. So start arrow, don't forget. Secondly, most important of all, there is no such thing as pointing to half of a pair. So this arrow, for example, is not a pointer to the car of this pair, even though it happens to come in at the left. It's a pointer to this entire pair. Okay? If you wanted a pointer to the car of the pair, that would be this. This is the car of that pair. It's the thing that this arrow points to. Okay? So no such thing as a pointer to half of a box. Yeah. What if he's asking, you're pointing to a box that has a slash on the right? There's no such thing as pointing to half of a box. This arrow points to this entire pair. Okay? Okay. Okay, questions? Yeah. Great. What can go in a list? Answer anything. Yes, procedures can be elements of lists. Lists can be elements of lists. True and false can be elements of lists. Right? And furthermore, even one particular list can have some of each of those things. So lists don't have to be homogeneous. Yeah, that's the nice thing about lists. They can have absolutely anything you want in them. Okay, the difference between a pen, list and cons again. Cons is for the situation in which you have a list and you want to add one new element at the front. So you can think of it as a join, new element, old list. List is you're making a fixed length list and you're explicitly saying what each element is. Boom, boom, boom, boom. A pen, you have a bunch of lists and you want to put them together, but you don't want a list of lists as the result. You want a list of the elements of the list. What if those elements are actually lists? That's fine. It doesn't flatten all the way down. It just takes the elements of the list, whatever they are. Okay, remind me next time to talk about higher rotor functions. And last week I lost a power supply to the laptop that lives in here and if anybody happens to have seen it, let me know.