 Okay, so I think we are ready to start with an announcement. Thank you for the proposals for final projects. The way we would like to give you feedback is over the course of the next two weeks, I'd like to meet with each individual group. Of course, that means really short meeting since there are many groups. So perhaps spreading it over four office hours would give me five minutes with everybody. I would read the proposal beforehand, of course, and annotate it. But then the final discussion is much better in person, even if it is just five minutes. So I'll soon post sign-up sheets for that. Over two weeks, maybe less, we'll finish it all. Is there anything else we should talk about? When the meeting scores will be, I don't know yet, because I was out of town and returned yesterday, so I didn't check with the GSI. So realistically, I think early next week. Okay, so let's start with the motivation of some sort. So what I show here is performance of several languages on a particular implementation. So GCCC is the fastest, and the number on the left you see normalized performance on a list of about 12 benchmarks. Pascal is a little bit slower, Java even slower than there is some SBCL Lisp, and so on and so on, and it goes on. I skipped a few, yes, not all languages are ever invented here, but you can see the performance of those languages plotted left to right this way. And you can see here a huge gap between about three and ten. So why is that? Do we have any speculation why all of a sudden you see this quantum leap? So what potential reasons might we see? So, okay, first candidate is GC. Perhaps all these languages contain a garbage collector, and this is what slows them down. Okay, other candidates? Okay, so we are saying maybe statically versus dynamically typed, often referred to as static versus dynamic. What other candidates do we have? Interpreted versus compiled, okay, excellent. More candidates? There often isn't just one culprit, of course, so there are multiple things that play together, but the difference is so huge that we would like to point to the key reason, okay? So just bad implementation, okay? So you could say maybe the abstraction cost, right? The fact that they are working at a higher level of abstraction could be one, okay? Some more? Yeah, I cannot think of anything more. So let's try to go back and look at the list and see what have we got. So is it interpreted versus compiled? Okay, so these have an interpreter. This one is probably a subset of lists that is compiled, so I'm guessing it does have a compiler. This one would have a compiler. These, at that point it did not have a compiler, neither have a Ruby, all right? So that sort of works, okay? So if you look at these data, these are from 2007, then these would have a compiler, and these would have an interpreter. I'm pretty sure that's the division, okay? But now if you look at the data from 2012, the gap is smaller, but you still can see a gap here between about 3 and 5, maybe not 3 and 10, okay? And now if you look at these, there are languages here that are actually compiled. JavaScript v8, okay? This Ruby here on JRuby translates Ruby into Java and compiles it using Java Virtual Machine, the bytecode machine, okay? Go is also compiled, okay? So it's not that the compiler can make all the difference between excellent performance and bad performance, because we see compiled languages down here in the list. So there are some compilers here. So what could make such a difference in the quality of the code that you run? They compile to high-level languages like Java, and okay, well, there are pretty good Java compilers, as you can see, right? This Java here is only about 70% slower than C, well, or some Fortran. So it's very close to this. So compiling to Java probably does not explain the overhead. It could be that they compile into such a language where you don't have a good compiler. It could be something else. It could be something inherent in the language from which you are compiling, okay? So that might be a potential explanation. So do we have static types here? So these are static types. We'll go over it. Our language is where you can declare the type of the variables in the source code. And so the compiler always knows what is the type of the value stored in a variable. It knows when there is a plus, whether it's a plus on an integer or a floating point or on a string. So let's look at this. So this one is statically typed. So is this. I don't know what the ATS is, but this one is typed. I don't know what's OCaml. This one, I don't know. And so all of these are statically typed. So it seems to be that knowing static types of variables actually gives you quite a bit of power in generating good code, running your program fast, and hopefully therefore adoption. And if you know about what's going on with JavaScript, which is sort of the language of the internet, a lot of new developments are about selectively adding static types into your program. So you can where performance matters, annotate it statically, and hopefully helping the virtual machine, the JavaScript virtual machine, to get their performance on your programs. So if you look at the data from 2012, the difference between interpreted and compiled went from 3x to 10x in 2007 to 3x to maybe 5x in 2012. And the reason for that is perhaps the key reason are the browser wars, the fact that browser developers compete for performance and that pushes on performance of JavaScript and the JavaScript virtual machines, including just-in-time compilers, are getting better. And that sort of motivates other people like Python implementations to do better. And so these interpreted languages, thanks to running on just-in-time compilers, now are getting much better results than five years ago. But the key motivating difference for static types perhaps is the fact that you have so much more information about the program because of these annotations that your performance is much better. So look at one. Okay, so what are we going to do today? And then I'll show you two more motivating examples. We want to understand the rationale and the design behind static types. And we look at the static types that exist in object-oriented languages because this way you can compare them with the lecture from just before the midterm, where we did object-orientation without any static types. You may remember that we just used this Lua mechanism for meta-programming, the meta-tables to redirect lookups of fields into superclasses which we call prototypes. And that was sufficient to build object-oriented languages without having any static types. So here we look at what static types mean in all languages. Some little bit more motivation. What semantic analysis compares static and dynamic typing? Understand what are the benefits of static typing and why sometimes you still need dynamic checks even in static type languages. So here is an example. Imagine this is coming from, I don't know, JavaScript 164 or Python. And imagine you want to write a compiler which parses this code and rather than interpreting it, it's going to translate it into C++ code. And imagine that I ask you to generate a really good efficient C++ code for this piece of program. So how efficient can that C++ code be? What would the corresponding translation need to be? So imagine this is 164 since that's a language you know. What will the C++ code do when you compile it? Could it look the same? A equals A plus B? Clearly not. It needs to do a few more things. So what would it have to do? I'm not asking for C++ syntax, just the logical tasks that we would need to do. Okay. So we are talking about types of B and C, but clearly 164 does not have types, right? Or does it have types? So what sort of types are we talking about? Let's clarify what types are we talking about. If we talk about types of B and C, presumably these two variables, right, these names B and C, they don't have any type in 164. So what are we really checking? I'm sure you have the answer. I just want to be now very precise about what do we mean by checking the type. Presumably at runtime since we are writing code, which we'll check those types when executed, right, as opposed to a compile type. So what types are we checking? Types of variables or types of the contents of the variables, right? So we are checking the types of really the values of B and C. Okay, of the stuff that's stored inside. And this is, by the way, the big distinction that in static typing, you're not working with any content because you're doing static typing, static checking at compile time where you do not have the values of the variables. But at runtime you do, and you indeed check the values of the types. All right. So once we have that, okay, well actually maybe that's not so easy. So how do we actually determine the types of the values of B and C? So now think of the 164 interpreter. It also needs to check the values of the types of the values of these variables because it needs to decide whether to do a string concatenation or an integer plus. Where would you determine the types of these values? Okay, so we get the value by looking at B and C. At that point we have a value. We completely forget the fact whether it came from B and C, just the value. First class value. And now you get its type by relying on Python to say tell me the type of that value. Is it string or in? That's because Python does it for us. But if we didn't use Python as the underlying language for the interpreter but instead used C, you couldn't quite do that, right? Uh-huh, okay. So B is the variables would somehow point to a struct. So this is our variable. This would be our, this would be our value and which would be a struct. What would be in that struct? Okay, so there would be a type and maybe that type would be int. And then there would be the actual value which would be say 7. Okay. Uh, that's good. And indeed this is sort of the information that Python keeps inside. Okay. Have anybody heard of boxed values? The values that are boxed integers, for example. Can anybody say roughly what that is on the stack? So if you look here is this the integer and this would be the boxed integer. Could you say that the struct is a boxed integer because it, right? The struct is essentially an object. You can store it anywhere, right, in an array. And at runtime you can then check its type whereas the value inside is just the integer value, right? And you need to wrap integers or any types of, for that matter, any primitive types in these boxes which carry the type information in a language like Python because at runtime you need to determine their types. Okay. So here based on the type, what do we do? Based on type of values of B and C, what would our generated code do? We have a few choices, right? Among them is calling the right operation, say plus four ints versus plus four strings. Let's call it concatenate. Another thing could happen. What would be that, say, third option? If the types don't match, you want to concatenate int and string. Perhaps you throw an error, right? And now these errors are important. In fact, you could say that our entire discussion today will be about these errors. So just keep that thought why these errors are so important. So this is the third motivation I want to show you for using static types. Here is a piece of Python code. You may have written it yourself. This is a piece of code for the unit calculator that we started in lecture two. It contains a bug or maybe two. I don't know. It looked perfectly fine when I read it, but then an error occurred at runtime. The question is what could you do to change the program, perhaps the language that you are using so that that error would be discovered by the compiler because for me that error was a little bit cryptic as printed by the interpreter at runtime. The Python interpreter doesn't give you any errors except for syntax errors. All the errors that remain are delegated to runtime. So what error do I have here? Is it? That PowerPoint should do this sort of error. Okay, Ella. So this is a syntactic error. So that's one thing that the Python compiler can do. It wouldn't catch it. Oh, yes, it wouldn't catch it because it would just look like... Aha. Oh, okay. Okay. So let's look at this error. Yeah, let's assume that the intentation indeed is like that. All right. So let's use this as an opportunity. So what is the error? The intent equals to be part of the class, but it is outside the class, right? Well, could static typing help us fix this error? What was that? Yeah, this was in Python. No, it's just a variable name. That's right. So if self was implicit, then using self outside the class in here would be indicated as the error. This is not quite using static types, right? So this is just this... We would fix this or eliminate this problem by making self implicit rather than explicit, okay? This is completely orthogonal to static versus dynamic. Okay, so one of them is this potential equality. And then there is something about copying. Does anybody see other potential errors? Yeah, it should be. And this is indeed the error that it should be. The others may be errors in some other settings. They were not errors here. The real error was the fact that I was sort of careless in defining the name of this parameter. I named it U because it stands for a unit, but it's supposed to be really... Let's use this column. It's supposed to be an instance of class unit, right? It's supposed to be another instance of this. And here I'm defining a field U, and I really wanted to get this field U from here. Okay, so that was the error. And I was happy to see, oh, I'm just getting U, but it was not U the field, but U the parameter. And if I had types, where would the error be discovered? If I had types, this would be annotated as you see here, that this U is of type unit. Okay, and then where would the error show up? Compile time, but where exactly in this code? Yes, great. Yeah, I'm listening even though the machine is not listening. I'll be right there. So the error would be... Let me try to be safer here. So it should be this. Okay, I'll leave this one uncorrected. This one would be annotated of type unit, which is exactly as I intended it, but by the time I got here, I forgot that that was the intention. And here the type of that would be what? So we would also need to declare the type... So does anybody see why declaring the type of this parameter unit is insufficient to fix the problem? So by saying this U is of type unit, which is this class here, I'm saying, well, this is a unit, therefore this is a unit, therefore this field here U is a unit, because the right-hand side here defines the type of the field U, and therefore this is a unit, and therefore this comparison here is just fine. So by just giving this constraint, this parameter U here is a unit, would not quite be enough, because that type would propagate to the type of the field, and therefore both the initialization here and this comparison are fine. So we need to do a little bit more than just say that this U is a unit. What else do we need to do? What does Java do? Well, what do you propose? The solution will be the same. So we'll say this is a dictionary type, and now the mismatch will happen in this assignment, and similarly the mismatch will happen in this comparison, because now we are doing dictionary versus unit comparison, and that's an error. Similarly here we have an assignment from a class, from an object to a dictionary, again that would be an error. So we could catch this error, but it would come at the price of annotating types of parameters and annotating types of the fields. But this is how most of the computing is going. These languages are getting more static and more type annotations are added in, because the benefit of doing these annotations apparently is large, especially if you go to large software with interfaces exposed like this, the programmers come in, the interfaces are much more readable if they are annotated with type information. Okay, so quickly some terminology so that we can get to the fun parts of static typing. If we are talking about static typing, there must be dynamic typing, and static and dynamic is synonymous with being at compile time versus being at runtime. So static is the reasoning done by the compiler without seeing the input, and the runtime reasoning is the reasoning done by the interpreter on a particular execution path with the particular input and program state. Static always considers all program executions because it needs to do stuff that is correct for all possible inputs that the program will ever see. Runtime checking only does things on this particular program state. There are implications of that that we'll soon see. Now, typesafe is something that you often see as an attribute of programming languages, or strictly typed is another synonymous for that. What does that mean? What does typesafe mean? It is not the same as statically typed, so don't think that just because the program contains annotations, like we just added type annotation to the parameter U, this is different, okay, in the back. Okay, it is related, yes, you could say that protection against buffer overflow or being able to read or write from the end of the array. That's part of it. This is sort of a special case, so that's a great example behind you. It doesn't mean that there is no casting. You could have a strictly typed typesafe language with casting. So Java is typesafe. Well, if you take away the reflection part of Java, and it does have casts. So the answer is does typesafe mean, does it imply an absence of automatic type conversion? For example, turning an int into a string doesn't, no, not quite. I think these conversions from one type to another, as long as they don't do anything bad, are perfectly fine. In many other languages, they sort of exist in subtle forms, and they are handled just okay. The answer is it does error checking at compile time, and I wouldn't say so. Essentially, I think you are saying it does static type checking, so it does check types at compile time. Well, is it the case that to be typesafe, you need to have static checks? So how about Python? It does not have any static checks. So let's elaborate this. So languages like JavaScript, 164, Lua, Python, don't do static type checks. Are they typesafe or not? So that's the key question. So languages like Python, Lua, are they typesafe? Clearly, they do not allow you to do what you mentioned. They would not allow you to read the memory past the end of the array. So you would not be able to write a Python program which reads the content of the memory that is just outside an array object, and neither in JavaScript and so on. So at least that one example being typesafe we have covered. So could you do some bad things in those languages? So the mantra for typesafe is that things won't go wrong. Nothing wrong will ever happen during the execution. Of course, we need to clarify what that means because if you introduce a bug into the program, a typesafe language is not going to eliminate the bug magically. There are still logical errors. So there is some class of things that we'll not have. So what those might be? Okay, so a good question. Excellent. So the question is if I divide in Python an integer by float, what are the things that could happen? So an implicit conversion presumably happens from the int to a float, then you divide to float, then you are good. An interesting example perhaps is concatenating integers and strings. In JavaScript, maybe you convert the integer to a string, you concatenate everything is fine. In Python, you may get an error back and say the operation for the concatenation of these two types is not defined. But could you ever get into a situation that you somehow, well, we should hear your answer. Okay, so let's... I see. All right. It looks like we are converging. For the others, if you are still thinking about this, and this is an important part of being a well-informed programmer because sort of being able to catch errors and deal with them influences the choice of the programming language and therefore the vulnerabilities your programs will have against hacker attacks. So these are static types, and the sort of security here is pretty important. So let's understand it well. So it looks like we are converging to something. Okay. Yes. JavaScript will concatenate them. Well, so would that be a type-safe behavior? It's sort of well-defined. You just, you know, JavaScript semantics are going with the description of, if you are concatenating an int and a string, then this conversion happens and the concatenation... concatenated string comes out. So the fact that the programmer didn't realize that that's the semantics is a separate story and perhaps a language should never allow this implicit conversion, but that's sort of a much more by-sided debate. So I would say it's not because it did the right thing, nothing crashed. You didn't leak any information in the case of the array out-of-bounds accesses, which could override memory that they shouldn't or read from memory that they shouldn't. I would say this is still fine. It seems type-safe to me. As type-savers, int divided by float. Okay. So it has to do a lot... So the answer here is, because you may not heard it, is that is it the case that each time you perform an operation and continue running the program, the result of every operation is actually well-defined. And in the case of int concatenated with string, in JavaScript it is defined. You may not like the definition, but it is defined. So when is something not type-safe? So far almost all examples except for the buffer overrun we concluded that it's all type-safe. So more examples of not type-safe. Okay. Assembly is not type-safe. So let's get some examples of anything. And to make it more interesting for you now, think as hackers imagine that I write a program that is not type-safe. I run it on my server, open it on a socket. Now you can connect to the socket, do something with the program by feeding it the right input. Carefully constructed, of course. So what sort of mistakes in my programming not protected by type-safe language you would like to exploit? Okay. Now we have hands up. Criminal aspect of this semantics debate always gets people going. All right. So where do we start? It looks like you should have a turn to continue. Okay. All right. Okay. So that's interesting. All right. Go ahead. Okay. Overflow. Okay. So imagine we have a language where if you overflow just during an addition of two integers and so you take 260 orbit integers, you overflow them, you should presumably get a 65-bit integer. You know how it usually overflows, right? That the extra one is truncated and you get from a positive number into a negative number. So I would say this is still type-safe in typical languages because the arithmetic is defined in such a way that the overflows are silent and you just get into the negative range of the binary numbers. But you are on the right track. Except the way arithmetic is done, you don't screw up anything beyond the range of the integers because you still just have your 64 bits. Okay? Uh-huh. Okay. So this is an example of this buffer overrun where you take an input, a string, and you copy it into a piece of memory. And if the program doesn't check whether you have enough memory for that, yes, you can then overwrite the code, jump into it, run it. Now, would that be possible in Python, JavaScript, Lua? Yeah. Right? Because there you need to copy it into another array-like object that is shown appropriately because this implicit bound checking and growing of the memory is hidden in the implementation of those data structures. Okay? So, but in a language where that check doesn't happen, you could exploit it exactly like that. More exploits? So, in fact, you could do it in C, right? That in C, you can say, imagine we have int x. You give it some value. Now you have some star pointer to x, call it y, pointer to character call y, you assign it x, and now you have a pointer to a piece of memory that you have initialized yourself to an arbitrary address, right? And the compiler will happily compile this code into assembly that does exactly that. It stores what it thinks is an integer, and then it uses it as a pointer, and by doing that, it gives you control over whatever other data structure exists in the code, including those that should never leak the user of the program that uses the program over the Internet. We'll talk more about exactly what these exploits are. But now, okay, let's leave this example like this. So, the things that won't go wrong, how could we characterize things that won't go wrong? If you look at the execution of, say, Python or 164 interpreter, at every operation, and we have been insisting on it, you need to check a few things, such as when you do a read or a write from a dictionary, what kind of checks do you need to do at this point? The very first one that you do is what? Whether P is null, right? Because if P is null, then you effectively have a pointer to an object that doesn't exist, but somewhere at the beginning of the memory, and so this boot could override a piece of memory that you cannot do. So, these checks are important, but if you do all these checks in the execution of the program and you throw an exception, as opposed to just silently continuing in the execution, then the program is, type-safe in the sense that the properties of those types, meaning whatever the variable still points to, and it is a dictionary with properly contained values, are not violated. So, you could describe this as no type checks are omitted. Yes, a program that has a mistake will still run and receive an exception, but these exceptions are, of course, much better than silent failures, because silent barrels typically lead to corruption in memory and maybe a crash, but potentially an exploitation by hackers. So, is Python type-safe in that sense? Yes. Is C safe in that sense? No. So, let's look at this essentially what we said before. And dynamic type checking. So, now we can see what is static type checking. It's compiling the program and checking that variables will contain values that allows them to perform operations on them. For example, we will check that if we do p.f, then the object pointed to by p actually does contain field f. Dynamic, it does the same thing, except it does the check at runtime when the program executes. And so, the static check is a little bit more sophisticated in the sense you need to do it for all possible inputs to the program without considering all these inputs in isolation. So, now how about this? We have a table here of safe versus unsafe, static and dynamic. Give me examples of languages that fall into these quadrants. Okay, so here we would have C. Okay. It's not completely true, but it's really hard to find a language that is truly in that category. Why is it not completely true? Why is Java not quite completely statically typed? Even if you remove reflection, does it do all checks at compile time? Yeah? So, Java delegates some types to runtime. So, it does have a mix of static checks and dynamic checks. And we look today why it is difficult to do all checks at compile time and some of them you need to delegate. So, if you think about it this way, languages like Java are sort of at the boundary. But we think of them as static, right? I don't know what runtime checks do they do. You can, but it's sometimes difficult to express things like the size of arrays without too much pain. Well, so you could do it in principle. It makes the programming in that language sometimes too hard, right? So, that's why languages say sort of, yeah, being dynamic here and there is good and therefore languages like Java are on the boundary. And here we have languages like Python. I don't know whether we have anything here, dynamic and unsafe. I guess, well, it's hard to say. I would say it has as much typing as C. So, okay. So, let's talk here about what do C type annotations actually do? So, in Java what do you say? In Java I say something like foo p. What am I saying? I'm declaring a variable p that will store a reference to an object of type foo, right? Of an instance of class foo. So, I'm really saying p is a variable that will have values that have some property. What are the properties of those values? The instance is of foo. So, when I declare a variable in Java, I declare properties of values in that variable. When I say in C something like int x, what am I really saying? I'm really describing how should that memory for x look like? I'm saying use four bytes. I'm not really saying anything about the values to be stored in those four bytes. You see the difference? So, isn't that like assembly? In assembly you will say give me four bytes, give it a name, I'll use it whichever I want. Isn't the int declaration of x practically the same thing? Right. It is true. C does some type checking, but you also need to remember that the C that you know and the C with which I grew up are very different languages. So, it's good to look at the old C because it is a great example of how it is truly just sort of almost a preprocessor to assembly language. It's a great one. It's a tasteful language. It's fantastic in a way, but it does really describe the, the annotations describe the storage you could say, but not quite what values are there. Of course, then people added more annotations to it. So, here is assembly. Here, well, I don't know. I would say dynamic is, yeah, let's just say nothing can be here by definition and we are done with it. All right. Okay. So, how do you do static type checking then? There are two phases. There is nothing much more to say that. The thing about it, this way, imagine you have a program which has foo p and then sometimes later you have a use of p, perhaps p.f. You get an ast that looks like this root and then somewhere here you have a declaration node which names the type, in this case class, names variable, and then somewhere else in the tree, you have an expression node which is, let's say, this is the dot node. Okay. And here is a p and here is the f. Okay. So, this is a typical program. What do we need to do in symbol analysis? What do you think happens in the symbol analysis phase of the program? So, symbol analysis will be green. Before we can type check whether the p.f is type safe, meaning p is a variable that will point to some object and that object will contain an f field. Before we do that, what do we need to do? Okay. We need to associate variable names with their types. Right? So, we'll associate this here. Okay. So, here we essentially just bind symbols to their declarations. And now when I start doing type checking, which is the type analysis, I know that p is defined somewhere else in the tree and I can read out its type. That's it. You cannot do it by bottom-up analysis because you first need to collect those definitions and then given those, the definitions would be mappings from names to types or from names to nodes as we do here, right? Here after we go here, what do we know? We know that p is mapped into this node. And this information then can be used to associate this use of variable p with that declaration. That's it. Now, what do we do in type analysis? This would be a bottom-up propagation over the AST. Okay. So, here we would propagate information such as p is of type full. Right? Remember that this left argument p could be anywhere deep in the tree, but as the type checking propagates up, it would propagate this information here, that p is of type full. What sort of check we would do at this node? We check that p is a object or a reference variable, whatever the language calls it, and this class of p contains f. Okay? What would be the type that they propagate up? The type of f is propagated up, right? So, here I would actually not need to propagate the knowledge that this is p. I just need to propagate the fact that the value here is of type full. Okay? That's essentially it. So, let's look at the more interesting types. So, some terminology first. The static type is the declared types of the variable. As we said, it constrains the types of values that the variables can store. And dynamic type is the, what? Is the actual type of the object that the variable will point at runtime. Okay? So, in this program, if you look at the variable A, what is its static type? It would be A. Okay? What would be the dynamic type of A at runtime? After it is initialized? It would be B. All right? So, that's it? Which of these assignments now are legal? Let's start working towards the type checking of this program. Assuming here that B was created. Okay? Is this one legal? Yes. Is this one legal? Hmm. How about this one? This one clearly is. Okay? Well, so, we are not sure here. Do we need to make another assumption? Okay? So, we assume this and we assume B is a subclass of A. All right? If that's the case, then which of these are legal? This one is legal? This is essentially what we have here. And this one is not legal. Okay? So, more on this later, how the static type checker determines that. But before we'll talk about this, we need to define a notion of compatible types. The notion of compatible types may seem complicated because we use all of three lines to define it, but essentially it says that a subclass is compatible with its superclass. Or a class is compatible with itself. Right? So, okay? We'll say that Y is compatible with X. The type of Y is compatible with X when it is either a subclass or the same. Okay? We'll use that to define type checking. So, therefore, this is legal because this type of B is compatible with the type of A. This is illegal because this is not compatible with B. Okay? That's it. So, here is an important invariant that we are keeping with static type checking. And it seems like a trivial discussion, but soon we'll start talking about casts and arrays. And it will be good to come back to this and understand what we mean. So, static types are interesting because what do they do? They give you a certain invariant on which the type checker relies. So, when we have a variable of static type A and later we assign to it B, which was defined, say, this way, what do we know if this assignment passes the type checker? If the type checker approves this assignment, meaning it is type safe to perform the assignment, what do we know? We have some important information about this, right? Okay? Exactly. So, the type checker, even though it does not remember later down going through the program that B was assigned into A and therefore what we store in A could be a B object. But what it knows that for sure, for sure what A stores is an instance of A. And therefore, if an expression on A requires a field or a method that is in A, we will be okay. And so, we say that we conservatively overestimate the dynamic types with the static types. Why do we overestimate? Because, well, at this point A could store a B, a subclass, and have more fields, more methods. We lose that information, but we know that it will definitely have everything that A has because it is defined with the static type. Okay? So, if you like the theory a little bit more essentially by proving the program statically type safe, that it passes all checks, we are proving a theorem that when you run the program nothing will go wrong. In particular, it may not happen in a Java program that when you do something like this, we get an error that you could see in a Lua or a Python program. That's a different error, right? So, what is the error that as a result of proving the program to statically type check, what kind of error a Lua program could throw or a Python program that you would never see in a Java program? So, everybody should know the answer by now, okay? Exactly. So, or the object to which P points to to be more precise, right? Exactly. So, you could see an error in Python of that kind, right? Object reference by P has no method F. So, the static type checking doesn't eliminate all errors. It will, for example, not be able to prove that P is not null, but it will prove the absence of this kind of error. And this is quite important if you want to put your software on a space probe and send it on Mars, and you have no control to just reboot it when a crash happens. It's pretty attractive to be able to prove that certain classes of errors won't happen and static types is what will do it for you. Now, since you mentioned the null checks, can we design our type system or could we extend the Java static types in such a way that we catch null-d reference errors at compile time, too? All right? So, that's excellent. So, maybe you would limit the language in such a way that references cannot be set to null. And you would also require that each time you create an object and it has fields inside which are references to other objects, they would have to be set to non-null values, okay? Sounds good to me. If somebody can foresee the problems, it looks like, hey, that sounds good. Let's just do it. Or I could sort of invent the non-null foo. I could sort of invent the qualifier for types and then whenever I say non-null foo p, I know that p is never null. And therefore whenever I use p, I never get null pointer exception. Plus, I don't have to check for it, so it's cheaper. So, what problems do you see with just insisting that these reference variables cannot be ever null, okay? Now we are starting to talk, by the way, about funny interactions of language features that are hard to foresee sometimes. So, I would like people to think about what may go wrong if we insist on no nulls in pointer variables, okay? People who didn't speak, how about the... Uh-huh, yes, so that's essentially the answer that garbage collector collects garbage, which are objects that are not reachable by any pointer chain, right? And often you disconnect objects that are to be collected by saying, oh, I don't need what is pointed to by this reference variable, and you set it to null. And that's it. But perhaps now you would not have the ability to do it unless we find some other workaround for this. Perhaps we set it to some dummy object that is supposed to essentially be used as a null. It's still an object. You still could go and reference it. But, okay? You could do manual memory allocation, sort of mallog and free as you have in C. But then you have other problem. You would have a problem guaranteeing type safety because if I free an object and somebody still has a pointer to it, now I no longer can guarantee that the pointer that I have... So think of this, right? I have two variables, p and r, both pointing to the same object. Now I free r, right? And a few seconds later I will say, hey, p dot f. But that object pointed to by p might have been given to another object, and this leads to problems like, oh, there was a social security number stored before and now you can read it even though you're not supposed to because you have two pointer variables reading the same memory. We'll talk later in the semester about a fun way of exploiting this. You know, what I mean? In a type safe language, you cannot free this object and have some dangling pointer referring to some old object and use that to read out memory. But if you flip a bit with a heat lamp or some other way with a cosmic particle, all of a sudden you can modify your pointer sufficiently to be able to essentially subvert the security of a virtual machine and read content that you shouldn't and even write it. I'll show you later in the semester how static types can actually be subverted with a little bit of hardware trickery. So you see how the memory deallocation, the fact that you free some memory essentially breaks type safety. So, okay, so we could use a dummy object but then you have potentially other problems that yes, you don't have an all pointer error but now you have a dummy object which is used for all these null references. What should be the values of its fields if you happen to dereference it? So you're sort of delaying that problem only one step further. Okay, this should be by now clear that if we are conservatively overestimating the types of the variables then clearly static type checker is going to reject programs that are otherwise correct. What I mean by that, if you take a program and write it in Python or another dynamic language, it would be a fine program but the static type checker will reject it. So here is one example. Again, the same assignment to A of a value that has dynamic type B. This is clearly an error to the type checker because the only thing that the type checker knows is that the static type of A is B. The type checker does not keep track of what is the dynamic type of A and therefore the type checker will say, well, as far as I'm concerned, A does not contain field B and therefore this is an error. This probably can invent now a bunch of various examples of how type checkers are conservative and reject programs that otherwise would be fine in a dynamically typed language. So what you want to remember here is that the type checker only remembers the declared types of variables so it knows that A is of static type A and it checks operations independently from others so it would not track this flow of B into A. There are some more advanced type systems that do it but the one in Java and most of them don't do that either. Okay, so now to test the understanding of what we are doing, let's look at this program and find all statements in it that will fail the type checker. So that's the static type of type checking. So find where that error happens and also tell me why is type checker rejecting that program? It has a reason for doing so. The reason might not be obvious but if the type checker said, hey, it looks good, let me compile it, then at runtime certain things could go wrong including potentially hackers taking control of your program and executing the code that they downloaded through input strings and such. The first easy error is where? Bar of A, here you mean? I'm not sure what... Bar array, I see, okay. Change the color. Okay, so if we change B to an array of bars is now that assignment legal. Okay, so let's draw a picture here, right? So we do have an A which is an object of what? Of foos, right? We have a B which is an array that contains what? Bars, right? This is the static picture that we have, right? That A will store arrays of foos and B will store array of bars. And you could say, all right, A, let me read this way. So foo is a subclass of bar. We see that here. And therefore, it may seem that an array of foos will be a subclass of arrays of bars. After all, if I do something like A i, I get what? I get a foo. If I do B sub i, I get a bar. So it seems safe to just replace an array of foos with an array of bars because the value of A sub i is going to be a compatible type, a type that is lower in the hierarchy. So why is it that an array of foos is not a subclass of array of bars if foo is a subclass of bar? What could go wrong? I see. So the point you are making is interesting, but that's not the reason. If I understand correctly, you are saying that, maybe bars, bar objects are smaller than foo objects, and if you stack them into an array, you get sort of the offsets at which the individual objects are different. Except the way it works in Java is that these objects are essentially boxed, that they will be outside. Really, these arrays in Java are just arrays of references to these objects, and the references are always of the same size. So that is not an issue. Well, let's see. So if we now do B, so is this the problematic statement that I will now modify an element in the array pointed to by B to point to an object of type B of bar? Yes, it should be bar. Okay? So do people see why this is a problem? So after this assignment, oops, interesting, after this assignment here, what do we have? We actually have this situation, right? Now A and B point to the same array, because here we copy the reference to the array stored in A to B, and now if I assign into some element of B, this is what I get, right? Now what if I write, would you expect this statement to succeed? Now based on the static types of A, you would, right? Because A is an array, so I can access its element using ASAP-I. It points to foos, so you expect to find this F here. So the programmer who looks at the type of A says this is a perfectly legal statement, and according to the invariant that the static type checker should maintain, this must be legal, because ASAP-I gives you a foo, yet this one here, does it contain a foo? It contains a bar, a superclass, which does not contain F. So the reason why this statement is made fail by the static type checker is that you could now have two variables pointing to an array. This one expects A, that this would be all foos. B expects that it is bars, so you could put a bar object here, but then a statement like this is going to fail. Now the error cannot find a field. F all of a sudden would be possible in a language that due to type safety should not make it possible. So what I told you is not a completely true story, but it leads to different design decisions. It could be that you don't have to reject the program right at this point. What I showed is a scenario that if we continue with the program and perform this assignment, now we have broken the invariant, that this array here is an array of foos. So one way to protect against this danger is to reject the program right here. Another option is what? To be a little bit more dynamic. You remember at the beginning we said that Java is not a completely static language because it doesn't do all its static checks at compile time. So this program in particular would not be rejected in here. A static type checker would allow the program to proceed and let these two variables point to this array, and what would it do? Would it pray that nothing bad happens at runtime? Of course not. It would put a runtime check where? It would put a runtime check each time you write into an array. When you read it, it's okay because everything in memory is guaranteed to be type safe. Arrays of foos will contain foos, but if you want to now modify an element of an array, they should be foo. Am I writing a foo into an array of foos? Yes. So effectively we have delayed the danger that would be prevented by rejecting statically this program here into runtime by placing runtime checks into writes into arrays. That of course slows down the writes into arrays a little bit, but allows you to write more flexible programs that are not rejected as easily. This is essentially the discussion of what we've seen before. So let's close by a discussion of downcasts. Why would downcasts be needed in a language like Java? So can you think of what programs you would not be able to write if downcasts were removed from the language? Okay, so you still could convert double into, say, a single by perhaps calling a library procedure which receives a double and gets you back a single, right? But now your library would need to contain these conversions, but you could still do it at a reasonable cost. It's more elegant to do it with a cast, but you still explicitly are doing the conversion whether there is a cast or a library call. What might be a more serious limitation if we remove casts? So essentially what we are saying is that when you program with collections like hash tables, it's quite convenient because you do not need to re-implement those data structures. So your library in Java is full of sets, hash maps that have both types. Usually they are declared as storing objects. But it's different in generics, although that has its own problems, when you do something like a hash table get and some key, the type that you get from this expression is usually object because that collection, the hash table, is defined to work with objects of any types in Java. And then when you get the value, you need to convert it down to full. So the static checker, what knows? The static checker knows that the static type of this call is what, it's object. And now the static type of this is what? Is it object? It would be full. So the static type checker say, fine, I'm casting this object type that is the value here to a value that is the same value, but I have more guarantees about it, right? So the static checker knows that the value that is the result of the entire expression here will be a full, whereas at this point it only knows, it knows less, it knows that this is some kind of an object. It still knows something, it's not an integer. How will it ensure the static type checker that indeed the result of the entire expression with the cast in front of it is a full? It doesn't actually do any conversion of the value, right? What does it do? It inserts a dynamic check, right? So the dynamic check is how these casts are compiled. Essentially, you materialize the dynamic check into a little dynamic... Yes, the cast into a dynamic check, it does what? It looks at the tag of the value, and if it is a full, everything is fine, otherwise it throws an exception. But now it allows you to prove that P is indeed of type full, and now when we do P.f, are we doing more checks down there? No, because we have already checked each time P was assigned that P has the right type. And at that point, this is free of any further dynamic checks. Okay? And now you can see how you could use cast to show that, oh, again, the static type checker could reject some correct programs, or could it not? Well, what if we have, again, bar, maybe I miss... and a subclass full. If I do a cast to bar into a variable B, and now I do B.f, where f is in full, not in bar, again, static type checker would reject this, because it only proved with this dynamic check that B is a bar. It didn't prove that it is a full, even though it might have been. So again, I create an object of type full. It contains an f field. I invert, I create a dynamic check, checking that it's a bar. B now is a static type bar. Here this will fail. Even though B will actually, at runtime, contains the value of type full. But the static type checker could do one optimization here, and it is to remove this, because it can see that the right-hand side here is definitely of type full, and so it has enough knowledge to reject it. So we'll continue about static checks in the next lecture, looking at static checks and different kinds of programs, where more inference will happen. We'll use our prologue to do some type inference, so that you understand how inference happens. And then later in the semester, we'll get the exploits of the static type systems with the hardware failures. Thank you.