 Thanks very much for coming to this workshop. I myself wanted to go to the pure script react, but what are you going to do? Okay, so let me Where shall I begin where I should begin is in fact just telling you that this entire talk is online and I'm doing this rather weird thing where Where you can actually edit the slides as we go along Okay, so not all of you you will not be editing my slides if you point your browser to that What is it called URL at the top? Then you know you can see all these slides and as we go along there'll be little code What do they call windows that appear and we'll be filling in code as we go along, okay? So I strongly encourage you to go and go to that particular URL and follow along as I as I as I do the talk right and Also unlike Richard's lovely workshop, which I ought to have learned a lot from but I was too busy fiddling with my CSS to get everything Right There's gonna be questions as we go along right, so I'm just gonna be throwing our exercises as we go So TLDR just go to that URL and scroll down with me as we as we go along Okay, so what exactly is refinement types or what what is the purpose of refinement types at a high level? What we want to do is the kind of thing that you'd want to do with something like a dress but We want to make it a lot more automatic and a lot less Yes It's not working. Oh you lost the URL you are right. Sorry my bad. I wonder if I should even just Oh, I should just keep that thing up there. Okay? UCSD dash proxies dot github dot IO It's also on the lambda conf. Yes There isn't a branch, but it's we're having some trouble upgrading so Unfortunately, the short answer is no you go to the github and you'll switch your branches and it's a bit convoluted So I wouldn't recommend it just yet. There's something we're trying to actively do I couldn't get it done in time for this for today. Okay. Can I can I switch the URL? Okay? No, no worries Okay, good I'm just gonna do this locally because it'll be a little faster and I don't want to have to deal with any concurrency issues Arising you do you guys? Okay, fantastic. So here's the general plan, right? So as as we all know well type programs can go very very wrong Here are some of the favorite examples So here's the dinkiest example that we've all had and really I'm not so excited about division Except that it's just an easy way to motivate what's good to get the ball rolling, right? So if I write a little average function, I try and divide by the length of the the list It's all very well if the list is not empty, but God help me if the list is empty I get this nasty runtime error. Most of you probably haven't seen this kind of thing But I bet you you have seen this kind of thing, right? This is the the one thing that I get really irritated by the one source of sort of null dereference errors in nice languages Like like Haskell and Okama you create a little map and then you look up the map with various keys For example here I create a map with Haskell and JavaScript And then if I look up the map with the key Haskell I win But again if I look up the map with some key that's not in it then it blows up, right? So here's yet another it's a very small example of something Where it's easy to make mistakes and get things going wrong Here's another example where it's more and more more of the same kind But where the consequences are more drastic so for example If I you can actually try this if you fire up ghci you can run you can load up data vector And then you can create a little vector with two elements Haskell and JavaScript and you can index it at the position zero And it would work out fine You can even index it at the position one it would work out fine But if you index it at the position three then you get a nasty seg fault Okay, so it's on a high-level language and what have you but in fact you get a seg fault And of course as you know, I give I give this talk elsewhere and in these kinds of situations You're actually lucky when you get a seg fault Because then your machine sort of you know, it stops running the program crashes and life is good The more pernicious situation is when you don't get the seg fault and the thing just keeps happily running happily running and will sort of just violate all your nice type safety guarantees and Can possibly leak various bits of information so for example again There's something you could you could try at home. You could try on your machines load up data text Unsafe create a little string and then you can you know suck out prefixes from that string So for example, if I pull out the first five characters, I get lambda with that's not right. I should only get lambed There you go. And but if I at any rate if I try and pull out 20 characters, then then I die right because well Actually, I don't die what happens is it works just fine because at its lowest levels data text is built on top of C right it's manipulating a bunch of pointers and what's going to happen is it just happily goes and reads the next Whatever seven bytes that end after that buffer where lambda conf was was placed in memory Right and this is how you could very well get your sort of hard bleat type errors in your high-level languages So really one of the things we want to do with dependent types And you know for example the talk we saw this morning is Precisely to prevent these kinds of errors. So as Richard put it The the most information you pack into your type system the greater confidence you have about things not going wrong, but what's really What we'd like to do is to not have to constantly coach the Coach the type checker about very simple facts about arithmetic and about comparisons and about you know Just for example if x equals y and y equals z that x equals z Okay, we would like to automate all these boring bits as much as possible And that's really what what our goal is so here what we want to do is To develop a dependent type system that will let you catch all these kinds of things that compile time But without the without having to sort of break everything down into tiny steps to the type checker So for those of you who were at the address talk The reason you have to spell all these things out to to address is that primarily the underlying decision procedure is unification This is the way I think about it and unification is is fantastic for many things So for example the kinds of things Richard demonstrated with GADT's unification is really really good for But there's a lot of other theories for which unification is really not your not the best idea Okay, and we're going to use a somewhat different technology to make all of our machinery work And this is it has it sort of pros and cons and I want to sort of walk you through what you can do with it Okay, so here's the plan of the talk. I just gave you some sort of very high-level motivation I have no idea how much of this we're going to get through but you know, we shall see out as this goes Right, so what I'm first going to do is I'll just give you a very high-level overview of what exactly are refinement types anyway Okay, how does one write a refinement? What does it even mean? And then I'll you know show you how you know first just describe very simple refinement types So things like integers and then we look at more interesting data structures like lists and maps and so on and so forth So having given you these very basics then what I'm going to do is I'll illustrate how one works with liquid Haskell With three sort of increasingly more sophisticated examples Okay Okay, so let's jump right in. I I imagine there's no questions just yet, but there will be no doubt in just a second Okay, so I click click click and here we go. So simple refinement types quite simply For those of you just joined by the way, you should go to this particular URL which is in the lambda con What is it called github repo that John sent a link out to and you should just follow along because we'll be doing the Exercises as we do the as we do the talk Okay So what is the refinement type quite simply what we're going to do is we'll take just plain old types Plain old Haskell types and we'll dress them up with logical predicates Okay, and the idea is that these predicates are going to be drawn. They're not arbitrary predicates They're not arbitrary user-defined functions. They're going to be drawn from Well-behaved logics and when I say well-behaved logics what I mean is these are sort of these are logics that have been studied for Really nearly a hundred years now for which we have completely automatic decision procedures that work very very fast And every so often I'm going to say this word SMT solver and you should in your head You should just think of SMT solver as a black box that can answer questions in this specific in these specific logics Okay, so there's nothing there's no black magic all the decision procedures all the questions We're going to ask the SMT solvers are things that are completely decidable. There's nothing undecidable going on But we're just going to constrain the logic so that you can so that you can prove for properties about your programs while remaining automatic Okay, so for those of you like this sort of thing. Let me just give you a high-level description of what I meant Okay, here's a little grammar So we start with just base types So for now just imagine integer Boolean string What have you and then well not string actually ABC are just type variables, right? So I'm just going to write B for basic types and this is the thing you want to keep your eye on a Refinement type is going to essentially look like this. We either have a basic type Which is defined with a predicate P and I'll tell you what a predicate is in just a second Or it's going to be a function type where we say that you take as input some variable x Which is of type T and you produce as output some other type T Okay, and these predicates are from decidable logics and just for the sake of completeness really the purpose of this talk Is to give you a whole bunch of examples, but for the sake of completeness. Let me just describe what I mean by predicates Okay, so predicates are going to be things like Either an expression equals an expression or an expression is less than an expression or and or not Etc. Etc. Of predicates. What do I mean by expressions? Well, here's what I mean by expressions Okay, so variables so these are going to be literally programmed variables x y z numeric constant zero one two three You can add expressions multiply them and this last one is something to keep an eye on because this is where we're going to actually pack a whole lot of power and into our Into our into the type system It's what is called an uninterpreted function and don't worry about it We will we look at lots of examples as we go along, right? And the high-level bit is that the logic that we're in is decidable in the sense that if P is a predicate in my logic and Q is another predicate in my logic then there are very efficient algorithms that will decide whether P implies Q That's the high-level bit. Okay. There's and this is these have been known for a long time Okay, good. So let's jump in and look at a very simple example, right? So this is the simplest possible refinement type that I could think of It's a type that I'm calling zero the reason I'm calling it zero is it describes Integers the subset of integers that are equal to zero Okay, so what's going on over here is let's look at the notation says v colon in so it's value v Which is of type int such that some predicate holds and the predicate here is just value equal to zero, right? So there's exactly one value that is equal to zero and that's the number zero, right? And so what's going to happen if I if I push play over here Which you can those of you who are following along if you push play this is going to grind along and it's going to say Yes, that's right zero is in fact of type zero and Now if I was to tweak this for example if I was to replace zero with 20 Then what would happen is it would not be so it would not be so sanguine It would say certainly not 20 does not satisfy that predicate value equals zero and so please go away Okay, so far so good. This does nothing. There's no magic here. It's totally straightforward Okay, the other thing to note is that I'm going to write refinements in this What is this? I guess these are Haskell comments, right? And this was a kind of design decision where we intend this to be what is the word? I think the word people use these days is an optional type system So we want you know if you just if you're not using our stuff. It's just plain old Haskell code, right? So everything's going to be all the all the stuff the liquid Haskell looks at is in comments Okay, fantastic. So a lot of things are of type zero. I promise you we will move along For example one minus one is also of type zero, right? And this is because what's what's going to happen underneath the hood is you know I'm going to get an expression one minus one. There's no unification We just throw it at the SMT solver the SMT solvers We're going to ask it a question like hey, I have in my hand a value that is equal to one minus one Does that imply? That that value is equal to zero. Okay, and the SMT solvers going to say yes, of course I can do that and so we don't get that little pink thing. Okay Fantastic. Okay. Good. So here's a slightly more interesting example I might define the set of natural numbers. I would say type NAT you can call it whatever you like But NAT is a good name is equal to value such that integers such that the value is greater than or equal to zero Right and now I can say something like this. So I said we're only going to look at simple types where I'm cheating But you get the idea, right? I'm going to say NAT is of type list of NAT Right and I have zero one two three over there, right? All of those are in fact NATs Totally straightforward, right so far so good. Okay. Let's move along. How about this? This is the first exercise. So I hope I hope some of you are following along Here's the type called POS which is actually looks just like NAT right V colon in such that zero less than equal to V The first question is can you fix POS so that in fact this this variable POSS is rejected Because I want this to only be the positive numbers and clearly zero is not a positive number So the question is how should we fix the type of POSS the definition of POSS so that this first thing gets rejected What's that? Remove the equal. Thank you very much. So let's remove the equal So if I nuke that and now I run it again It's going to say okay. I don't like that list anymore because that's certainly not a list of positive numbers, right? If you're following along you can also put your mouse over this little yellow check and it'll give you this it'll basically say something like This may be too much information at this point, but it's going to say well the type I have inferred for POSS is Blah blah blah it's a list of things that are greater than or equal to zero Which is not a subtype of great strictly greater than zero will we look at what this means in just a second? Okay, but at any rate the next part of the exercise was can you modify POSS so that now it is Accepted using this definition of POSS perhaps you can suggest what we would do or anyone else for that matter It's pretty obvious remove the zero. Thank you. It's pretty straightforward, right? We don't like zero that was the purpose of this whole thing So let's remove our friend zero and with any luck everything is happy now, okay? So fairly straightforward, right? Yes Sorry to remove to put one over here absolutely so you might over here, right? So we might have said this if that's what you're wondering, right? And that would have had exactly the same result and in fact Clickity-click. There we go, right? Okay, fantastic. So how exactly did this work? What is this like what just happened? Okay, so I I don't want to get into too much detail about how this works But still I want to give you a rough idea of what's going on, right? So let me give you a very high-level overview of how refinement type checking works the key thing to bear in mind is that unlike in Let's just say usual type systems a term can have many many different types And this is actually a very important thing in my opinion once you get into fancier types like this So what do I mean? Let's go back to our friend zero. It already has two different types The number zero satisfies the predicate value equals zero It also satisfies the predicate value greater than or equal to zero, right? And maybe it satisfies a predicate that is even and it satisfies a predicate that is between Minus five and plus five etc etc etc, right? So a term has many many different types I don't want to lock it in and give it a single type Okay, and so the way that we're going to connect all these different types is via a notion of subtyping specifically what is called Predicate subtyping so here's what it means. I promise you there will be no more Greek after two more slides Okay, it'll just be Haskell So what it means is that in an environment gamma where an environment is just the list of variables currently in scope, right? I have variables xyz currently in scope. I say that if type t1 is a subtype of t2 That's that's just the definition, right? So when is a type t1 a subtype of t2 a type is a subtype of t2 if the following holds So imagine I have a bunch of definitions in my currently in scope I have variables x1 x2 x3 each of which satisfy some predicates or refinements p1 p2 and p3 I Have a type something such that q is a subtype of something such that are if If you conjoin all these pIs if you take their and and you take q that implies r Okay, so for for this reason where did that piece of chalk grow? We know that value equals one minus one is a subtype of value equals zero Right because value equals one minus one implies value equals zero pretty straightforward, right? There's there's no terrific magic. So let's look at let's look at why zero has all these different types. Well The reason zero is of type nat is because zero satisfies value equals zero which implies value is greater than equal to zero Okay, and so for this reason we can actually end up viewing the same value at many different types Okay Okay, the nice thing about using SMT solvers is that we aren't going to have to fiddle around with these implications And this is really where this is in a sense where all the heavy lifting happens, right? You and I that is say the programmer is not going to have to worry about does this imply that do I have to cast this into that? Do I have to convert something to the other? No The SMT solvers just going to kick in and and and do all these implication queries for us whenever we need to Okay, so we can pretty much just we can just forget about when those things happen. Okay, so let's move on So I just showed you these really simple things for like the number zero a list of integers Let's look at functions and here's an interesting looking type Okay, so let's stare at this for a second because it's slightly counterintuitive. What is going on here? So I'm defining a function called impossible and impossible takes as input a string and it just calls error on that string Okay, and the purpose of impossible is to be a function that basically denotes all those cases where I want to say This will never ever happen at runtime Okay, so what I want to say is that this function should never ever be called and the way I specify that Is I give impossible an input type so the thing v equals false as an input type? So by the way just as Richard pointed out GHC has these nice typed holes. We've also had them for a while. It's a pretty convenient So this is you just we get to you know not have to write the string So here's what I'm saying impossible takes as input a string such that falls and returns any a and what's here's the deal What satisfies false nothing satisfies false Okay, and what that means is it is not you cannot call the function impossible Okay, there's so if your program type checks and it has impossible in it You know that that particular thing is dead code and it will never happen Okay, so let's use impossible. Let's here's a quick exercise. I want to write a safe division function Okay, I want to write a safe division function which takes two integers. It returns an integer and what I want to do is I want to say that That particular case safe div is never called with the input zero So can someone tell me how would I tweak the type of safe div to promise that I will never call safe div with a zero denominator? Excellent so all I have to say is that the second one the second int cannot be zero right and there's various ways I might do that so the simplest one is to say something like value colon in size that value is not equal to zero Okay, so that would certainly work and now we are guaranteed that so now what's going to happen is going to say okay I you will never be called with zero so what it knows is that essentially that zero case never happens And the reason it all type checks is because is because what you know is that if D is not zero D is not equal to zero and when you do the pattern matching what you get is and D equals to zero Implies false Right because that's a contradiction over there, which is why it says okay that you can call impossible Because in fact that's never going to happen Okay, the the falseness the only thing that satisfies false or that implies false is false itself Okay, so we've established a little contradiction there So this might be slightly counterintuitive, but it kind of makes sense right you want to say that safe Div is not called with zero Okay, fantastic So as a consequence these things work out quite nicely imagine I want to compute the average of two numbers So I would say average two of xy I call is equal to safe div x plus y by two Right this is going to work out because two satisfies the predicate not equal zero right Reason it works out is that value equals to implies value is not equal to zero Which means the type value equals to is a subtype of value not equal zero and so the type checkers is happy about this Does it make sense? Okay Let me ask you this here's a quick exercise at this point people always yes Yes, but where do you get these values from in the real world numbers come in from the user, right? So here's a little program you could actually run this not that I will this is not terribly exciting What it does is it asks the user to enter numerator n it enters a denominator d right? So n and d are arbitrary numbers because the user is going to punch them in right and then it calls safe div Nd And so notice over here the thing is not happy right? Why is it not happy if we go and look at the if we go and look at the error message It'll say well here's the deal the only thing I know about that d is that it's e of the type integer such that value equals d and I need it to be of type value not equal zero and there's no reason for me to think that d is not equal to zero right I can't prove that this implication holds. So what gives? So can anybody suggest how would we tweak calc so that this type checks? Is going to return something okay, and how would we write such a function? So fine. Let's so somehow you propose that we write a function Let's just call it fool that takes as input some sort of integer or whatever right and then does what it returns a maybe Of an integer. I think it would just make it Ah, it would have to return maybe of an int that's definitely not equal zero Okay, and then we should only call safe div if that if this function returned What do you call it right a non-zero value? Okay fine So how we write such a function so foo of zero would be nothing presumably right and foo of Let's just say n or whatever D. Let's call stick with D is just of D right and now maybe what I would do is I would write something like Pop up up up. Here we go Foo has the type it takes as input an integer and it returns as output an integer that is not equal to zero Oh, and a maybe of that Like so right. Oh, and then of course we have to do some pattern matching and whatnot up here How should we do that pattern matching? This is getting complicated, but you get the idea. Let's do it case. Oh my goodness foo D off Nothing, so we want to say something like put sterlin shame on you Okay, and if I get just of D prime or whatever now I can do put sterlin of Just of D prime Okay, fantastic. Let's see if that works Sweet. Okay. The pinkness has gone away. So that should certainly make you quite happy or makes me quite happy There's also something simpler one can do which has the same effect more or less Which is to just put in an old-fashioned if statement, right? Oh, did I say statement? Sorry. I meant an expression I've been teaching undergraduates. Forgive me. Okay You might just do something like if D is not is equal to zero if this has exactly the same It has exactly the same effect right then I would just say put sterlin shame on you Frankly, I prefer your solution, but I may as well show you this as put sterlin the real thing There we did I get that wrong oh right there you go. It's grumbling that it doesn't know what the prime is Exactly or put on well to cut a long story short if you know a little bit about how ghc works is Essentially it converts the thing that we have here into something that looks like a case of right So the way ghc works is all ifs are actually they get converted into case blah off true or false and And that's how it all works out. Okay Okay, fantastic, but either way either way it works Fantastic, but now here's a bit of a bummer. Here's another version So here's average, but I want to compute the average again for any list, right? So I have a list x's I just want to compute the average of all of those So I get total which is the sum of all the elements in the list and n is the length of the x's and now It doesn't like it Why why is that? I mean we can go and stare at the error message But essentially it says well it's the length of the list x's but I need it to be non-zero and of course it could be zero Right, there's no reason it couldn't be zero. So can anyone make a suggestion? How can we appease this? Is there any way you can think of? Handle the empty list specially right we have a somehow special case on the empty list excellent So in fact, that's what we'll do. Okay. I'm going to write I mean we'll show we'll show you the real way to do this but for now Let's try this approach, which is essentially what Gershom suggests what I'm going to do is I'm going to write a size function That is guaranteed to return a positive number Okay, and the reason it's guaranteed to return a positive number is because we are pattern matching on the empty list And we just somehow pretend it doesn't happen. Well, I trust me. I'm not going to call it Right. So of course we really don't want to so let me uncomment this Okay, let's just uncomment it. Now if we were to uncomment this right so what it's going to say is well Sure, you'll only get back a positive number, but I can't make any bets about this impossible Right, but if we do this, okay, so let's leave it uncommented and we'll come back and fix this in a second if we do this Then one of the nice things that happens is I can now tweak this to be what? What shall I how shall I modify this code so that it works out? Remember size is guaranteed to return a positive number. So how shall I fix average? It might crash, but let's assume that won't happen. Yes Excellent I simply change length to size and now it's all good Okay, but of course, this is slightly sketchy because Because yeah, sure it's not gonna crash over here. This safe diff call is certainly safe The value n is definitely non-zero, but we might we might end up with a problem over here, right? So really what I want at this point is some way of saying trust me I will call average only with lists that are not empty and If I can call average with lists that are not empty then I can call size and be guaranteed that this horrible thing won't happen Okay, so let's go on and see how that happens Okay, fantastic details details details Blah blah blah, but here's a quick recap of this first section Which is I just showed you what a refinement type is a very simple kind of thing, right? It's just a type with this little predicat slapped on then what the way we verify properties the way we specify properties is we just write Functions with either input types or output types that are refined if the input type is refined Then it means you must call the function only with values that satisfy that product property if the output type is refined It means that the function is guaranteed to produce values that satisfy that property and finally the machinery of actually Generating that this satisfies that satisfies is all handled by the SMT solver Okay, so let's move on Mumble mumble mumble I were now I know for a fact what the next file is called actually I don't so let's just go back over here and Click on data types, right? So I for those of you who are following. Let's just hop on to the data types part of the show Okay, so the problem we had was this Let's just define so let's just you know do everything from first principles as it were you could do this with GHC's built-in list, but let's choose it for the purpose of this workshop do everything from scratch What I'm going to do is I'm going to define a list type which has either the empty list called m or cons which For better or worse. I've chosen a sequence of three colons single colon is taken by GHC for its own cons Double colon is taken for types and so well So we're stuck with this three colons, right? So I'm assuming everyone's comfortable with this cons takes an a and a tail which is a list of a's and so on Okay, so remember the problem that we have is we want to somehow say that a given list is non-empty Now those of you who saw the address we know a little about address or dependent types You know that the usual trick for dealing with this is to index the type of a list with its length That is not the approach I advocate, okay, and that's not what we're going to do We're going to do something slightly different and it's different in a very important for an important reason But we'll get to the reason later Let me just show you what we do what I'm going to do is I'm just going to let you write Haskell functions That have a very specific structure and that structure is this the Haskell functions with a single equation per data constructor Right, so for example mp and cons as a single equation And the second important part of the structure is that all the things on the right-hand side That is the definition of the functions belong in the refinement logic so for example 0 was in the logic plus is in the logic 1 is in the logic and Length of x is also in the logic because remember those things those uninterpreted functions if x is in the logic Then length of x is also in the logic, okay, and we're going to call these things measures Doesn't really matter what you call them, okay The important thing is that having defined these measures we can now use them inside inside our our function definition So here's what's going to happen and it's actually Somewhat related to what Richard talked about but here's what we're going to do really when you define a measure What happens is that it automatically changes the types of the data constructors under the hood So what happens is internally liquid haskell knows that the type of mp is that it's a list of a as before But where the length is equal to zero just from that definition and for cons What it knows is cons takes an a and a list of a's and it produces as output a list of a's But whose length is equal to one plus the length of x's, okay? So it's kind of like the indexing but not not quite and we'll discuss why that matters later Let's just see what you do with these lengths Okay, so here's how you use a measure and Rather than me telling you how to use a measure. Why don't you tell me how to use it? So here's head and tail which you know by from birth we are trained not to use because they're notoriously partial But can you let's see can can you tell me how we would tweak the signatures for head and tail so that these? Impossibles wouldn't happen so right now. This is just a haskell signature right list a goes to a list a goes to list a if I push play it'll say That's no good because you might you know both of those cases might might fire right So how can we tweak the signatures for head and tail to say trust me? I will only call head with good list and I will only call tail with good lists. What would those good lists look like? What's that go on length of something? Fantastic there we go right so that's what that's exactly what we do. We would just tweak this to say Well C's as good as anything else, but why would we call it C? It's some list X's such that the length of X's is Greater than zero you could put in not equals as well, but you know it's got to be greater than right and Over here you would do the same thing La ma ma ma list of a such that length of X's is greater than zero But now you notice we've been typing we typed almost the same thing two times And so instead we might as well just define this thing as a separate alias much like haskell's aliases So I would say type list let's just say not empty is Equal to list of a Okay, so list not empty is a list of a's such that and I'm going to just copy and paste this warm warm length X's this should not be X's this should be V and And There we go, and now I can change this to just be list not empty and this can also be list not empty. Oh Thank you. Thank you There we go, and I really like these things to be lined up, and I'm kind of a white space bigot So okay, there we go right so now those are all good, right? So I promise that I will only call head and tail with non-empty list And so now of course what I've done is I've shifted the burden of proof onto those functions that call Head and tail and they must now guarantee that when they call head and tail should they why would you call head and tail really? That the list are not empty Okay, so on this next slide I'd actually define this alias. That's what we're going to use from now on I like any for not empty And here we go so here's a here's a function That's not as lame as head and tail, but which is nevertheless pretty useful in my opinion on occasion. I've used it Let me not I will be the first to admit I've used this right it fold our one Okay, so fold our one is fold our little brother Where fold our one assumes that the list actually has elements inside it So fold our is good old for right right ABB takes an initial be a list of a's and crunches you down to a single B Fold our one is a sort of simpler thing It assumes that the list actually has one element in it And it just uses that first element as the thing to start doing the folding from right So fold our one has this very nice type a a a a list not empty and you get back in a So far so good Okay, fantastic. Let's move along. So now let's see. How would we tweak the type of average so that? So that a type check so there's a bunch of errors now, right? We're calling N. N is an N calls length So that's that may be non-zero. I call fold our one with X's but fold our one also expects non-zero Non-empty list. Let's see why it's grumbling over here. What is the problem? Same deal right is not a blah blah blah is not a subtype of zero less than length, right? It requires a non-empty list And that up there we know what it's the old story for div, right? I need a numerator denominator. That's non-zero So what is the simple fix? How do I tweak average to make it happy? List any right sweet, so I just make this list any and now we can go home Okay, so there we go Now let me make a brief digression very brief, right? So I just showed you the thing about list any what we did is I said you can you have lists You can define its length and so on and I can say certain functions take lists that are empty or non-empty and so on and so forth often what we want to do is we want to there's this very nice slogan I like it a lot making illegal states Unrepresentable so often you have invariance on your data and you would like to to tweak your type or define your type in A way that essentially it's impossible to construct invalid Values of that particular data type where invalid means those which do not satisfy some desired invariant for example And this is not the most compelling example We'll see more interesting ones later in the talk, but I want to throw this out there Imagine I wanted a data type that represents a sequence of values one for every month of the year Right, so I would say data year of a is a year of a list of a's But since it's supposed to be a year Follow me humor me in this contrived example. I want to rig it so that that list has size exactly 12 Right, so I don't want you to be able to say just a list with two elements I want like it has to have all 12 elements. Okay, so here's what you can do What you can do is you can actually create a refined data definition So the thing that's up on the top over here And here what I'm doing is I'm defining a type alias called list of n That takes a which is the thing. It's a list of and n which is the size I want it's a list of a such that length v is equal to n and Now what I'm going to say is data year a is simply a year of list n of a of 12 It's pretty obvious, right? But now the nice thing that I get is that illegal states have become irreparably a non-representable for example This is a year that has just one element in it. That's no good, right? So the only way to fix this would be to actually put in 12 different elements. Does that make sense? Okay, so let's just work with these years for a little bit So this is a slightly longer example. So let's Let's let's work with this, right? So I want to write a function As I'm defining a data type called weather weather has two fields temperature and rain Okay, and I what I want to do is I want to take is I want to write a function called temp average that takes as input a year of weathers and Returns a single integer which is going to be the average temperature for all the months of the year pretty straightforward, right? So what I'm going to do is I have year which is a collection of months I just call average on m's of months Why m's and months? Where the idea is I call the map function to suck out the temp from each month Does that make sense? Yes, I'm just running map on each of these on these each of these weathers to suck out the temp I do map on that and so what I want and then I want to call that an average So first of all can someone tell me why is this not what's the problem here? Why is it being why what's up with this pinkness? What's up with the error? Fantastic, so remember average our old friend average. Well, I may as well show you this I'm very proud of it when I got it to work. So remember our friend average We gave it the type that says it must take a list of integers That is greater than zero length is greater than zero and then it'll give you back an int Right because it was calling fold R and whatnot. So this is one of those cases We're now on the call side at the place where we call average We need the list to be non-empty very well. And so now we don't know that months is non-empty, right? Well, we know that m's has length 12. So what gives? What's the problem? Exactly right map lost the type that and what is the type we want? We want to say that map takes us input if you give map a list with a hundred elements and the output also has a hundred elements, right? So well here we go So the way I'm going to tweak the type of map is it takes x's which is a list of a's you get back a list of b's Whose length is What? What shall I put in here? Exactly, it's in fact exactly equal to the length of this list x's right length x's Okay, so there we go. So now map the type of map is going to change because we've given it this nicer type Okay, so here we oh it's scrolled off the top. Oh, it hasn't actually fits a to b x3 which is a list of a's and you get as output a list of b's whose length is equal to the length of x3 Right and so this is this is in fact a dependent type in the sense that the output refers to the value of the input Yes, is there a way to reuse? I'm not sure I follow your question Oh, and can you later add on the length fact? Is that what you're saying? Um, we would like to do that but not yet. That's a short answer. So right now you give it a type and Right, so if once you lose that information, it's gone. So for example, if I associate this map thing if I Yeah, the short answer to your question as I understand it is not yet Can you define a map? Yes Oh for sure But that won't type check it So the way to think of it is the way the type system works is is currently the way all types systems work in the sense that The only information it knows about map at this call site is the information that's in the type of map I have no idea how it was implemented other than its type signature So let yes, I think let's do exactly your experiment, right? So so this is fine in that I've defined map primate users map and and so on, right? This is great but if I If I go back over here and I remove that The the fancy type for map then that's all going to be so good There's nothing more. I can't prove that because the only information I have for map is whatever's in the type for map Yeah, yes Yes, it absolutely does I haven't described any of this there's a whole there's a whole preload A lot of this actually it's a good thing you mentioned this because I may as well Um, a whole bunch of stuff is in the preload. Let's just scroll up and see if we can find something That's in the preload. Ah, here is our friend plus for example So things like plus are all just refinements of the ghc preloads plus for example Um, let me just do this. I'm just going to say and this is because ace doesn't like in fix operators Um, I'm going to do blah and I'm just going to put plus over here and I'm just going to replace this with plus, right? Grind grind, but now if you look at the type for plus it looks something like this and this is pulled out of the preload Right, this is just because the way plus is implemented is the ghc primes mumble mumble And we have a preload where we just say assume this type for plus Okay, let's look at another example. Actually this one is okay fine. This is an This is an easy one. Okay In it so in it. I remember I didn't want to type out all 12 months I wanted to create an example of a year. So here's the sandiego weather It's really easy because the average weather every month is more or less around 72 So i'm just going to say year in it con 72 of 12, right? So it's just creates a list so the in it function is the usual in it function You can google it up. It's maybe slightly different I give it a function from into a and I give it a nat Right, so I want to get back lists of that size and if in it of zero is mp And in it of f of n it just creates the list, right? And the value that it creates for each n is f of n So how would we what is the right type for in it over here? We want to capture the fact that if I give you some number n it actually produces a list of size n Right, so I would just say n that this maybe makes you slightly happier But here I actually have an n in scope, right? So we don't have indices at any time when you want to write refinements It's going to be over Val values that are in scope Okay, and blah blah blah this just becomes n Okay, that's totally straightforward, right? So now here's a slightly trickier version of that same map And this is the kind of thing where it's actually really tedious to do if unification is your only type inference engine So for example try doing something like this in address It's not impossible. It's just kind of irritating because you have to teach it a little bit of math, right? So here's in it prime It looks almost like in it except that instead of counting down from n to zero It counts up from zero to n, right? So I have this little loop called, you know called go starts off at zero and go of i is if i is less than n f of i followed by mumble Go of i plus one and otherwise amp, right? So this is actually going to give me the months one two three four five Which is probably what I want instead of backwards, but anyway So now I would like to get exactly the same signature over here I would like to say that if you give me a list of n then in fact this output list has size But it's not so happy It's like no I can't prove that I don't know that go in fact produces lists of go of zero returns a list of size n What happened can someone explain how we would tweak this? How shall we make it happy you and I know that what's that? And what should that signature be how is go behaving do you think? so go of i So imagine n is the way I think of it is imagine n is 100 go of 25 is going to return what? a list of size Exactly, so you need to use subtraction in your signature So if go is 20 so go of 25 is going to generate the list from 26 through 100 right so of 75 elements So in general go of i is actually going to return a list of size So go is going to take his input i I don't know what i is so I'll just like some i and it returns as output a list n Of a is I imagine and how long is this? What's that of size n minus i of size n minus i thank you oops okay So there you go Okay, so in fact go is going to give you back a list of size n minus i and it's one of those things where you know Is there if there was an off by one like if I put this n i is i plus one It's just like no I don't think so Right, so you need to get it just right Okay, fantastic. Okay, so I'm showing you this because it turns out that once you start Well in the in the largest of the case studies We'll see that this kind of thing happens again and again Right you start filling in buffers from left to right right to left up to down so on and so forth So it's nice to be able to just do structural recursion On nats starting from n to go down But it's also nice to be able to go up sometimes And that's another handy thing that you want to just throw at the smt solver because to them It doesn't really matter which way you're going Okay, fantastic. So this is it as far as the sort of first part of the show is concerned Just I've shown you basic refinements. Now what I'm going to do is I'm going to show you three Let me just call them quote-unquote case studies Which use these to you know, not for quite so contrived examples So the first one is is not contrived really it's insertion sort. Okay, it is contrived The second one is slightly less contrived and finally I'll show you one of my favorites, which is just Low-level memory. Okay, so let's get started. So here's the first of the little case studies Insertion sort. So what exactly is insertion sort? I'm hoping everyone knows but if not just to refresh your memory. This is how insertion sort works, right? Let's just say I'm going to call the function sort It takes a list and it returns a list ignore those type signatures because they obviously don't type check But it takes a oh, they do type check. I should list away to list away And the insert function is where all the heavy lifting happens, right insert takes as input a single element List of the elements that have already been sorted And it takes the element that you gave it and sort of plonks it in adjust the right place inside the The list that you supplied. Okay Okay, fantastic And the way it works is like so if you have x and you want to insert into the empty list is just x cons cons mp Just stick it in If you have x and y's then you compare x and y's you check is x less than y If so, ah, x goes at the very beginning and the rest is y cons y's And otherwise we know that x goes somewhere inside the list and so I put y as the head and then I recursively insert x into y's Okay Fantastic So what do we want to do with insert? So what is the quote unquote case study? So we want so let's do three different things The first thing I would want to say is that it has the output has the same size as the input Okay, so that's going to be pretty easy It's pretty much the same thing we've seen before right the second thing I want to say is slightly more interesting I want to say that the output list has the same set of elements as the input Okay And finally what I want to say is that well in fact it is a sorted list the list You know the values are produced in increasing order. So let's see how these three things work one by one Okay, so the first property is size. So let's see. Can we fix the type of insert? Insert function so that this actually type checks if I push play here, it's not going to type check It's going to grumble about one thing or the other So here what I want to say is that the sort function takes as input a list x's and produces as output a list Whose length is exactly equal to that of x's But right now it says no, no, no, I don't think so I can't prove that over there and clearly well, it's not so clear but It's not it's not impossible to see that the problem probably has to do with what insert returns Because if you look at the type of insert it says you give me a you give me a list of ways I give you back some random list Right, so what do you think we should say for the type of insert? So that we would be guaranteed that in fact insert sort produces a list whose length is exactly equal to that of x's What's going on? Thank you Okay, so it just it sticks it in it takes the old list and drops x somewhere somewhere in the middle Right, so it's whatever the old X list was plus one more element Okay, so that's pretty easy right so list of n one plus length x's um Over here The the downside would be so that's a very very interesting question The reason I can't say anything over here is it's somewhat of a technical point I can't refer to values in the output because technically they're not not really in scope You see what i'm saying so I can't say the input is one less than the output I mean there's no Fundamentally important reason why you can't do that but right now you can't right? I can allow the outputs refer to the inputs, but right now we don't allow the inputs to refer to the outputs Okay Okay, was there there was no hand. Okay, so that was super easy right So let's move on and let's look at this more interesting question of the elements How do I guarantee that in fact the output list has the same set of elements? And here I'm going to actually Tell you a little bit more about these smt solvers The main thing I'm going to tell you is just like they can reason with integers They can actually also reason about sets. Okay, so the theory of sets is what do I mean by the theory of sets? I mean with the operators empty union Intersection Compliment blah blah blah right all of this is completely decidable and has been since like 1922 Okay, and modern smt solvers implement these theories and so here's what we're going to do I am going to let you write set valued measures So just like we said here's a list and you can write a function that tells you how long it is You can also write functions for what is in the set. Okay, so and just to make this convenient. We we sort of expose this as With refined signatures for the functions in data set. Okay, so how does one do this? Well, let me just demonstrate Here's how you specify the elements of a list. So just like we said measure length I can say measure lm's of a list and the measure lm's is a function the empty list returns s not empty set not empty Right and for cons I return add lm x x's where add lm is simply How do I add lm to x and x's? Well take the singleton x And union it with the lm's recursively in x's okay And this all of this happens to belong again in the refinement logic because union singleton blah blah blah All of these are in the sort of theory of sets smt solvers can sort of chew this up for breakfast Okay, so in line just lets us write this as haskell code and now let's see how we would use these as Let's see how we would use them inside our insertion sort Okay, so here's a variant of insert sort where I now want to check So just like I defined list n list whose length is equal to n i'm defining list e Which is a list whose elements Are equal to s Right this list whose elements are exactly equal to whatever s is So now what I want to say is that sort e takes us and put a list x's and returns as output a list whose elements are equal to those in x's right But of course I can't because insert e doesn't have the right type right insert e as before is lame So as you give me an x you give me an x's and I give you back a random list So can somebody tell me perhaps you what how would we tweak the type of insert e? I'm just picking on people Just like a natural How how would we tweak the type of insert e so that it let us prove that sort e in fact does the right thing It's the same idea I'd use union. So what what am I unioning? I'm unioning x to the elements of x's right So basically the output elements of insert are whatever's in x. Sorry, whatever's in x's plus the element x Which is to say This is why I conveniently wrote it as a separate function add lm that we might reuse right So here what I would say is the output is simply list e of add lm x x's Okay, so it's the equivalent of one plus length x's Whoops Grind grind grind and there we go Okay, so now we have that the output list of sort in fact has exactly the same set of elements as the input Okay, but we said sorting and I've said nothing in the types about how we might actually Order the elements in the list, right? So how do I specify that the output list is in fact ordered? As it happens, I've told you the answer earlier, but it's a bit But let's just see how how we get there right so often you have to be a little creative and how you can use these Primitives so let's recall our old friends. They're refined data types So the whole thing about year where I said a year must have 12 months Let's just go and revisit that idea for a second. Okay So let's just Let's just do the following exercise Imagine I want to write an ordered pair type So an ordered pair has an op of x and an op of y both of which are integers And what I want to specify for no good reason Is that the legal values are such that op of x is strictly less than op of y Okay, so I want to say that op of 2 4 that's legal, but op of 4 2 not allowed Okay, because I want the x to be strictly less than the y Okay, very well How would you do this? Can you think of how we might tweak the data definition or write a refined data definition that would That would tell us that op of y must be less than op of x. Sorry greater than op of x Sorry, say that again. Can we put that in the type that Yes, and what is your suggestion? Was it the same Fantastic, okay, so what I'm going to do is I'm just going to say op of y must be some integers such that op Is a is a value that is strictly greater than op of x if we put it on op x It's kind of the same thing and the reason why is let let's see why let's see if I can explain why Here's the deal the way when you create these refined data constructors. What really happens is the following Let's go and see what the type of op is. Oh, it's not working. Damn it lasted Why not? Oh java script. This is why I wanted to be at that pure scripting by the way, but anyway What what's really going on I need the eraser. I can never find erasers Is underneath the covers Each of these refined data definitions is actually remember in haskell when you create When you create a data type like that you end up with op becomes a function the constructor function That is of type int to int to whatever odd pair or whatnot Right and so when we have these refined data types What's really going on is we just get a refined function, right? So this becomes now this is called op of x and this becomes some op of y such that Whatever op of y must be greater than op of x Right and so what that means and this is the way we enforce that you can only create legal values Because whatever you pass into the constructor must satisfy the invariance, right and as a consequence what happens is to get back to your your question Later things can refer to earlier things. This is one of the recurring themes in this sort of dependent setting Output can refer to this input and this input this fellow can refer to this and so on and so forth, right? But I can't get this to refer to these things in the future again There's no fundamental reason for this. It's just the way it's set up right now. It just turns out to be the easiest Okay, fantastic. So this is also illegal, right? So this is how we have ordered pairs Here's another example of the same thing. So returning. Okay, so csv, right? So I want to create a csv type and the csv type I'm currently in the middle of making some exams. So this is what popped into my head I have a bunch of headers which are strings, right? For example student id midterm score final score and I have a bunch of values one for each student So they have their the student 125 and then 88 etc etc, right? But what I would like is for my csv My legal values to not have any holes in them Okay, so I don't want oh, I forgot the midterm score for one student or so on so I somehow want those to be flagged Okay, so how can we tweak? Can you tell me? So here's the csv value that has a hole in it for some reason, right? I forgot or something happened and I've left this out So how can we how can I modify the csv type so that it enforces this legality that if I have five headers Each list has five things inside it Exactly, right? So I just say each of these is a list of integers That is equal to the length of headers, right? Okay, so now when I do this you get the idea right it's going to grumble and This whole thing will turn pink when it comes back come back come back Let me bring the net worth for this. I don't know what's going on here But you'll have to oh there we go. Okay, so that the whole thing is rejected Okay, good. So now with this little refresher on ordered lists Let's see how we can tweak the list data type to ensure that you only create increasingly ordered lists. Okay Here's the deal. So here's I'm going to create a slightly different data type for lists I'll just call it o list for ordered list o mp and I have a cons and I put this little Frowny face over there just to indicate that it's going upwards from left to right, right? And there's an o head and an o tail the o head is an a the o tail is an ordered list And here's the trick. I'm going to now ensure this ordering becomes a legal constraint like so Okay, so check it out what I'm going to say is that a head is whatever a But inside the tail I only have elements all of whom are greater than the head Right. So in every tail So the tail is an o list of values of type a such that that value is greater than o of head Okay It's just it's the same machinery as before right the same csv the same ordered pair All I'm saying is that recursively the tail elements are all greater than the head And as a consequence here we go This is just fine 1 2 3 that's ordered, but this is not so good 1 3 2. Nope can't do that Okay And of course if I now change this from 1 3 2 to 1 3 I don't know 25 then it's all good Right because at each level the the numbers are going up and as and when this comes back I don't know why things have suddenly become so sluggish. There's no telling It'll come back. It'll come back. What there you go. Okay, it's gone And so here we go Here's we're back to our insertion sort Sort of o so this now it actually has order as well So sort of o takes his input a list x's produces an output list which is an ordered list of elements Which has the same elements as x's And so here the element part is fine, but it doesn't like this. Can somebody tell me what's the problem I want to say that insert produces an ordered list, but it doesn't like me What's going on? Well, you want to look closely at the code here If x is less than y then who should be at the beginning of the list? Okay, so you actually want x over here and you want y over here And now when you do that it's going to be all fine Okay Grind grind come on You just blame the regions of the university of colorado. Okay, so here you go Long story short here is insert. Sorry looks pretty much like the insert sort we had But it is this same insert sort that we had at the beginning, right? Except that now I have the guarantee that it's the same set of elements and it is in fact They are in fact an increasing order Now since Richard has stepped out I can rain a little bit on on singletons I don't want to rain on singletons. I mean somebody just asked me this question What is really the difference between this versus something like Idris? The main difference is this think languages like Idris and cock and so on are infinitely literally more expressive Because you aren't limited to this this dinky little refinement logic, right? So the trouble here is that you're limited to the refinement logic But on the plus side by virtue of being limited to the refinement logic There's a whole bunch of stuff you don't have to write. Okay, so for example, this is the insertion sort in In whatever in Haskell right and you get all those guarantees Here is an insertion sort in Idris Which is significantly more involved and and these don't have to do with like some of this is just boilerplate Because you have to describe length and sets and so on right But even if you throw that aside, you have to do a whole bunch of work just for you know This is if x is less than y and y is less than z then things are transitive The elements are the same flip. I'm not really sure what's going on And maybe this person is not the best Idris programmer, but I think he's pretty reasonable But I think it's pretty clear that this is a significantly more Involved example than or the code here is rather more complicated than what we had Then than what we had here, right and my purpose there is not to read on Idris But just to demonstrate there's a whole bunch of stuff We're getting for free from the smt solver because really what's happening in that Idris code Is that the programmer is essentially generating these proofs or writing these proofs manipulating these proofs in their program Which we don't have to because Because they're all in this nice theory and we just get those for free Right and so I guess my point is there's the the way I view the world is I was just explaining this to to a person which is I'm sort of down in this space where I assume I want everything to be automatic and then I want to just improve I want to figure out how can I improve my expressiveness? How much more can I pack in while remaining completely automatic while in the Idris world? They start with just always maintaining arbitrary expressiveness and what they're trying to figure out is how can you improve the automation? Right and hopefully there's some point at which we'll meet Okay, I don't want to point out one more thing which is I mentioned this earlier Which is this business of multiple measures So notice we did this thing which is again slightly different than What one does in the dependent in the sort of classical dependent style where you index a list and you say this for all n Right, so I took the same list a and I gave you a function length and I gave you another function lm's right So these are two different measures that I wrote for it And the way it works out is quite nice actually because we're in this nice refinement land You can write down five measures if you like remember how I said length just gives you this extra predicate that goes into the data Constructor well, you can have five extra predicates. They just all get conjoined. You just and and and and Okay, and the nice thing about this is that I don't have to bake in any invariance a priori into my data structure So I can have lists and I can view them. I care about lengths. I view I can care about sets I can care about various different properties of that same data type and this actually lets me reuse a whole bunch of code Okay, let's move on and instead move to a slightly more interesting example Which is the well scoped eval so let's look at a second example. What time is it 25? Okay Which is um, let me tell you about associative map. So here's the here's the problem I alluded to it at the very start of the talk imagine you have a map Haskell lazy javascript if I look up python I get this nasty exception, right? So can we have compile time ensure that these kinds of things will not happen So here's what we're going to do. We're going to use the same set business To describe the set of keys that are in fact defined in the map Okay, and then we're going to use measures to ensure that every time you look up an associative map The set of keys is in fact present Okay, and that's all I'll describe how that works and then I'll show you a little Evaluator quote unquote, which is think of it as Richard's evaluator But like a really mickey mouse version of it a really really mickey mouse version of it Where all I'm going to guarantee that is is that you have well scoped that you don't have sort of variable non bound errors at runtime Okay, so here's a very simple associative map definition. It looks just like the list Um, you could implement this with binary search trees and and so on but you know just to keep things quick I'm just going to implement it as a list so a map kv is either empty or you bind the key k to a value v And then you recursively have a tail map kv, right? Fantastic So the way we are going to ensure that we don't get any of these Key not found in the map Is by designing by by writing a measure called keys, which describes exactly the set of keys that are defined in the map Okay, the empty map is going to have the empty set of keys The lookup function is essentially going to say you give me a key k But the map you give me to do the lookup must have that key k and I'll describe what this has looks like in a second The insert function is going to take a key and a value and a map and it's going to give us output a new map Whose keys are equal to the old map plus the key k Okay, so we very precisely track the set of keys in a map inside the type system such that anytime we do a look Up which guaranteed to succeed Okay, so let's see how we do that. We've pretty much already seen this right The way I can define the set of keys of a of a map is quite straightforward The empty map has the empty set of keys if I have a binding from key to whatever value I don't care I just add the key k to the to the bindings of m where add is just like add lm right Just recursively compute the bindings that are in m and add k to that set at the singleton k The empty map it just looks like this the keys of m is empty It's set amp is a predicate that just says the keys of that particular map are empty Okay, how do we add a key to a map? It looks like this the insert function just sticks that key In front of all the other bindings, right? So if I want to insert a key k to a value v and the map m I just say bind k v m. So I just stick it in front And the type that I have for insert is something like Given a key and of and a value and a map the output map the set of keys in that output map Is equal to add of k m where add of k is just this thing right take the singleton k and union it with the keys already in kv Okay, so just it's just just like the thing we saw for this earlier Okay, this is an interesting. This is the interesting one. So I'm going to turn it into an exercise Here's the lookup function Okay, so lookup function takes us input a key k prime And a map which is either bind k v m or empty, right? And what I want to guarantee is that the impossible never happens So I want to say that any time you call lookup you should only do so with keys that are guaranteed to be in the map And so the way I'm doing that is I'm going to write a predicate called has k m that says that the key k is in the map m And what I've done over here is I've given you a bogus definition of has which is just true Right. So has is a predicate takes a key a map and returns a boolean and right now it's it's just it's bogus, right? I've just written true. I could have written undefined So let me ask you a question. How should we modify has so that we are guaranteed that the impossible will not actually happen What do you think is a valid definition of has fantastic? There we go. Okay, so I'm just going to tweak this and I'm going to say s dot member k So k must be a member of what of keys of m End of story, right Pretty straightforward, right? So now if I say this I'm just going to define has the lookup Is guaranteed to not is guaranteed to never fail, right? And why is that it's the same thing as before if we know that k is in the map m We already know that the empty map Well, when I say we already know it's the smt solver that knows this right just from the definition of sets That remember keys of empty is equal to the empty set Nothing is in the keys of the empty set. There's your contradiction Right. So if you know for a fact that k is in that has of k m that is so you know that this predicate is true Then there's no way that this last thing can trigger Does it make sense? Okay, so again, there's a whole bunch of stuff that's just sort of happening under the hood But it's it's all just it's all it's really just math. There's nothing to be complicated about this okay, um Yes, and this is a yes the short answer is that the short answer is yes What if the inline just lets us so I can give you it's a bit of a history lesson earlier All these things all these little expressions like s dot member and so on they were all inside the green stuff They were all just here are little predicates that are in the logic and you have to define them in the green stuff For reasons for various reasons. We would like all these things to just be haskell code And so here what we're saying is here's a just a raw haskell function And I intend to use this inside my specifications and hence we write this little inline That it's all exactly exactly and if it's not then it'll like grumble and and curse Just like some arbitrary exactly the inline has is going to make sure that yes, you may use this There's nothing weird going on, right? Sorry. Was there a question? Yeah, so that that's the purpose of the inline has but I will say this is a relatively recent feature. So yes Oh, thank you. I don't at all have to include it. I can just delete it I'm just including it purely for exposition. So um, yeah, I mean I'm including it for exposition There's a minus totality flag Which would even if you delete them would actually check that you've checked all the cases and would grumble, right? So I'm I'm including them just for exposition Okay So can anyone suggest what so what's the so here's by the way? So I promise that these errors would go away now at compile time So I'm creating a little map called langs. Here's a map called langs Which binds haskell to lazy binds javascript to league eager, right an amp and now I'm looking up haskell in langs That's fine. There's no pinkness there, but I call if I call look up python on langs Then it's not allowed because remember pi is not in the keys, right? It's actually interesting. What is in the keys? What exactly is the type of langs? Let's just see it's probably something gross But yeah, it's not terribly gross. It's not horrible. Here you go What you get is set mem hs in keys of value and set mem js in keys of value and something something something and value is equal to Langs, right? So this is not this is this kind of a this is just a kind of a simple example because in reality Of course, who knows what's inside a map, right? We here it's a silly example because I'm just putting two things in So it's not unreasonable that the types of similar catch it But next I'll show you an example where we use the same we we're going to use exactly this map But but we have no really idea. We have no we don't it's not like there's one or two things in the map Right, it could be some unbounded set But how would we fix this? So the type over here is well python is not there, right? So if I just change this for example to js Then it's going to be all fine Right because now the lookup function Is appeased because we know that js is in fact inside langs. Okay Okay, so now let's look at a slightly more interesting application of this same associative map trick Well, what i'm going to do is i'm going to try and write a well scoped evaluator. So Richard had this nice thing where he was using dbrine indices So instead of using dbrine indices, we're just going to do it the old-fashioned way Right, we'll have names or variables and let's and we're going to use these associative maps to make sure that we don't have variable not found errors So here as I said, it's a really mickey ma's version of what he had. I was not lying So a variable is just going to be var v of string Okay, and an expression is either a value which is an int or a var of var Right, so a string or it's a plus of two expressions And this is the interesting case right a let binding that binds a variable to some expression in some other sub expression. Okay Okay, so first of all you can this is just a little Digression right you can in fact define so when you have adts you can define subsets of those adts by using measures For example, I can define a type called val which corresponds to those expressions that are in fact values So that is say fully reduced right so a fully reduced expression is simply the one that is val under val of some number Right, so we just write a predicate is val expression to bool is val of val is true and is val of everything else is false Right, so nothing else if it's a variable if it's a let in that's not a value And now I can define val's to be just all those expressions such that is val of v is true Okay, so let's see how you would use that so here's another simple example of this kind of totality checking right So i'm writing a plus function the plus function should only be called with two val's So val i and val j and it should return a val which is i plus j and you can see you get this nasty red thing Impossible plus of course if I call it with two arbitrary expressions I might very well get a you know the impossible might happen But if I promise to only call you with val's Then we know for sure that these other cases are not going to fire Right and the output is also just a val Okay So that's a simple one now what I can do is I can describe environments So I'm going to evaluate build an evaluator for the expressions and the way the evaluator is going to work Is this going to map for the so every time I say let x equals expression let y equals expression I'll keep adding to my environment the values of x and y And the environment therefore is going to be a map from variables to values x got evaluated 10 y got evaluated 12 etc etc etc And so here's what I get here is my evaluator looks quite straightforward If I have a value you just return i if I have a var of x you look up x in the environment g If I have a plus you can just call plus if I have a let x equals e1 in e2 I'm going to first evaluate e1 using the environment g Call that value v1. I'm going to insert x v1 g where insert is the same insert for our map And then I'm going to use this new map gx to evaluate e2 Right this is pretty much the obvious way you would write an evaluator for this The only thing is that this lookup is being it's like I well what if I don't know about this Remember this looks up. It's that same lookup function from before it's like how do I know x is really in the environment And in fact it may not be in the environment, right? Maybe I wrote a function I suppose I just write an expression x plus five. I don't know what the value of x is So how can I guarantee that the bad thing won't happen when I see x plus five I'll try and evaluate x and I have no idea what x is So somehow I want to rig it so that my evaluator only Is well formed right as well scoped. I don't have these variables. I don't use variables that are not defined So it's a pretty non-trivial invariant and let's see how we would express that So check it out What I'm going to do so I'm how many people here know what a free variable is Okay, non-trivial number. I'll have a quick explanation later the high level bit is for each expression I just want to compute Which are the variables that are not in fact defined inside the expression right whose values need to come from the outside and This is pretty much the like the textbook definition of a free Of the free variables of an expression, right? So free it is a function from expression to a set of variables The free variables of value is empty because there's no variables in that the free variables of var x is just the single turn x The free variables of e1 plus e2 are compute the variables of e1 that are free compute those of e2 that are free and take their union The free variables are let x equals e1 in e2 is recursively compute the free variables of e1 compute the free variables of e2 and then It's the variables of e2 Minus the variable x because x goes out of scope right after you evaluate e2 and union that with the free variables of x is 1 Okay, so this is how you define the set of free variables of an expression And so here's an example right if that made no sense Imagine I had an expression let x equals 10 in x plus y Then x is defined because I say let x equals 10, but that y is free Right y is not actually defined anywhere And so I would represent that expression something like this And if I compute free of e1, I would get this y, right? Yeah, did you have a question? It's all good so far, right? And so really what I want to say is that I whenever I call this evaluator I must in my environment have values for all the variables that appear free inside that expression right So I would define that like so I would say a scoped expression In g where g is an environment is an expression e Such that e is well scoped under the environment g and what is this well scoped well scoped is simply a predicate that says So well scoped is a predicate that takes an environment and an expression and returns a boolean And it says that well scoped of g e is true if the free variables of e are contained inside the free variables of g Or sorry are contained inside the keys of g Right so all the variables that appear free inside e are in fact defined in the environment. That's all it's saying And we say that So e is is the scoped expression in the environment g if well scoped g e holds pretty straightforward, right? Okay, so now that you have all this can somebody tell me how we would tweak how shall we fix the type of eval So that that lookup succeeds How would I modify this type of eval? What kind of expressions can I call it with yeah? Yes In the environment I pass in right pretty straightforward. So I would name this environment. Let me call it g And this would not be any random expression. It must be an expression that is a scoped expression in g Okay, and so in this case I'm actually I'm not like putting two variables inside it This is just I don't know what variables are inside the environment But the expression I'm going to give you all its free variables must be contained inside g Okay, okay, so let's look at another exercise Same deal right so scoped expression. How about this? Suppose I want to write a function that works with any expression So just like the thing we had at the beginning for the calculator. Yes Which error where you still get an error It looks um, shall I look at your laptop? Let's see Plus oh, that's a different error. Um, this is just the plus you probably didn't fix the type of plus And furthermore you haven't fixed your eye. You see you have val val expert So what is complaining about is so you should think about that Why don't you meditate on that output type is also a val so plus takes two vals as input and returns as output a val That's the thing that you're missing. So you should change the output type not to be an arbitrary expert, but in fact a val Okay Sorry about that Blah blah blah. Where was I? Okay, let's go back to this one So here I want to write a safe eval that's going to work for any Expression and for any environment g but as you suggested at the beginning we want to do some kind of Option trickery. So I want to say that look you give me a g and an e I'm going to check if e is okay. And if it's okay, then I'll evaluate g and e right and otherwise I'll say nothing. No, you can't run this not allowed. So what do you think is a good definition of okay over there? What should we plug in? What's a good runtime check to put in? What else right well scope all we say is that you must be well scoped g e Okay, so if in fact you are well scoped then you may uh, then you may call eval and then everything is great Okay Fantastic, so Right, so just this was a quick exercise where we use sets to track the set of keys inside a map I'm using this kind of lame associated maps where you just add them in You can actually you can imagine implementing the same map as is done in say data map as a balanced binary search tree And you could use all this machinery to in fact check that it satisfies this set interface In pretty much the same way, but you have to deal with the ordering constraints and so on right Okay, so we're on to the very last part of the show, but I feel a little break is called for at this point or shall we Okay, so let's do the last of these Let's do the last of these so the first two were somewhat Well, not somewhat. They were pretty academic exercises, but the last one I think is actually something you would one would actually want to do in practice So what am I referring to? So what we did is we took liquid haskell and we essentially tried to use it to show that Sort of widely used libraries like byte string and text don't have buffer overflows And what does that boil down to right? So I alluded to this at the very beginning It's perfectly possible to have things like hot bleed in haskell And in fact, you know you could you could try this at home So here's a function you might write on top of which one was this text No byte string where it's a function that takes a string and an integer and gives you back a string right the idea is to suck out the first 10 characters And chop s n what it does is it takes the string s and it packs it into a byte string Let's call it b And then what do you do is you call this unsafe take function which is a fast take function if you wish That sucks out n characters from breed that gives you a byte string b prime And then you unpack b prime to get back s prime and you return it at the top, right? And as it happens if you try and run this if you pass in, you know, the right values you win You get the substring, but if you don't you don't you bad things can happen So for example, if I try something like let x is equal to Ranjeet loves burritos and I say chop x of 10 Then I get Ranjeet love right which is a prefix But if I you know instead of 10 if I put something that's much much longer then weird things start to happen It won't crash. It'll happily tell you what comes after Ranjeet loves burritos Okay, and this is the kind of thing you would want to make sure doesn't happen if you're using a nice language like like Haskell Okay, so here's going to be our plan Okay, so this is the kind of thing where it's quite difficult to just use Haskell's native type system to to prevent this kind of overflow But what we're going to do is we're going to use refinement types Okay to to dress up the type system to prevent this sort of thing from happening And here's the general strategy and this is really how one uses something like liquid Haskell or actually any type system It's how you would design You know how you would design a type system to solve a problem like this So something like byte string is you know, there's a there's a whole layer right By string itself is a kind of sits in the middle below it is ghc's low level sort of interface to pointers and memory often which is which is Where you tinker with that in c right, so there's a very low level pointer api By string builds on top of that pointer api and things like chop build on top of byte string Okay, so what we're going to do is we'll start at the lowest level and we'll you know We'll write nice types and at each level we'll write Interesting types and we'll sort of check those types using the types at lower levels Okay, so let's start at the very beginning which is the low level pointer api So some of you might be all over this but for those of you want here's a quick here's a quick sort of refresher Right, so this is how the ghc's low level pointer api works There's something called a data pointer a the pointer to things of type a and there's something else called a foreign Pointer and it's not terribly important what the distinction is a high level bit is a foreign pointer wraps around a pointer It can it can be sort of exported to and from c right there's some various details about how this works But high level bit is that these two pointer types And the way you interact with these pointers is that ghc gives you this kind of interface I'm simplifying things dramatically right So there's a function to read the value of a pointer that takes a pointer returns an i o a Peek there's a There's a version that writes a pointer right called poke that takes a pointer the new value you want to write and you know Does the i o action and finally the way you actually work with these pointers is you do a whole bunch of pointer arithmetic So there's something that let's just call it plus pointer that takes a pointer some offset and sort of moves the pointer along by that particular offset Okay And finally the way you actually create pointers in the first place is your functions that are the equivalent of malloc There's definitely a missing i o there, but never mind Okay, so malloc takes some number 45 and gives you a pointer to a buffer of 45 bytes Right and finally the way you actually do things with foreign pointers There's a function called with foreign pointer, which you give it a foreign pointer right from your application You give it some sort of callback or continuation or whatever you want to call it That where the this function is going to unwrap that foreign pointer into a pointer run this function on that particular pointer And it'll give you back an i o b Okay, so this is roughly speaking the sort of lowest level api that is that is exposed for manipulating pointers So here's an example. That was a lot of detail. Here's an example So what this does is it allocates a block of four bytes and writes a bunch of Uh, it writes four zeros into it. So here's zero, which is a word eight What i'm doing is I create a foreign pointer fp by calling malloc of four and then I have this with foreign pointer fp I call this action. What does that action do? It's a lambda p. So p is the unwrapped pointer do Poke p plus pointer zero so at the position zero write a zero at the position one write a zero position two write a zero position three write a zero Right and what we would like to make sure is that so this is fine because I created a pointer with four You know of size four and so I should be able to write these things to it But if instead of three over there I had 30 then that would not be so good Because then who knows which part of memory I'm mucking with over there Right, so what I really like to do is to set up my type so that if I said the block has four bytes Then you can only write that position zero to three. You can't just write at position 30 Okay, so let's see how we might do that Okay So the high level bit is very similar to what we've been doing so far It's just slightly more complicated because there's a little few more moving parts What we're going to do is we're going to refine the pointers So each pointer is now going to carry with it information that says how many bytes are remaining in the buffer that it points to Okay, and then what we'll do is in the various pointer operations. We'll make sure we track exactly what those sizes are Okay, so for example, because what it looks like again, these are all things that are implemented in C Right, so we just have to assume these types So for example, I'm going to track the size I'm going to create two measures plen and fplen That tell me how much space remains in a pointer and in a foreign pointer, right so many bytes remain And I'm going to say that a pointer of n is a pointer that points to a block that contains n remaining bytes and the same for foreign pointers Okay Now what I'm going to assume Is that when I create a pointer with malloc if I I must pass in a net So I can't create a pointer with like minus five bytes in it I must pass in a nice net and what I get as output is a foreign pointer That has exactly n bytes You know remaining right And what the with foreign pointer operator does is you give me a foreign pointer And I will now run it. I will unwrap it and I'll give you a pointer whose size is exactly equal to that foreign pointer Right, and you know, whatever I'll run and you get this iob as the output. Okay, so we're just tracking And so now once you do that We can also modify the type of plus pointer, which is the pointer arithmetic operation So it's going to take a pointer p It's going to take an offset, but this can't be some random offset So if I had five bytes, you can't just move the pointer by 25 bytes So what I'm going to say is that if you give me a pointer p You must give me an offset, which is a natural number that is well It's non-negative and it's less than n where n is the number of bytes remaining in that buffer Okay, so that buffer has 10 bytes remaining. You can only bump it up by at most 10 And what you get as output is a new pointer that has that many bytes Minus the offset, right, because you just sucked up five bytes, whatever the offset was Okay, so all of these are in our prelude right as it were where we just sort of assume that ghc.foreign.pointer exports functions with this signature Okay, fantastic now to prevent buffer overflows All I have to do is define a type for valid pointers So as a pointer not empty of a Simply a pointer such that plen is greater than zero the space remaining And I'm going to rig it so that p can poke take pointer not empty You can't give a random pointer. You must give it a pointer that has some space remaining Okay, and this is exactly why if you sort of if you do all once you sort of specify all these types Then bang over here what it knows is that the size of p is equal to the size of fp The size of fp is equal to four and I'm bumping it up by five not nice Okay, you get this pretty pretty easy, right? Okay, fantastic Next let's look at the byte string api Okay, so we're going to use this very low level pointer to build a higher level structure, which is the byte string So this is what a byte string looks like Okay, so for those of you who have not had the pleasure of of sort of staring at the byte string code Here's what happens. It's actually very clean So I'm laying it out in memory. So byte string has a has three things inside it, right? So it's it's essentially It's a block of memory that starts at a given pointer. I'm going to call it b pointer It's just a foreign pointer to a to a bunch of word aids There's an offset that tells you where the byte string begins And this makes it very easy to compute suffixes of byte string You want to drop the first five characters? Just bump up the pointer by five We drop another four bump it up by four, right? It makes it really really fast So there's some offset parameter that tells you where the byte string actually begins And then there's a length parameter that tells you how many more bytes are there Starting at that offset, right? So in this picture, the valid part of the byte string is this green thing over here And the red stuff is do not touch this And in fact, even the white stuff is not part of the byte string. Okay, so it's only the green bit that's in the in the byte string So now this this seems like it's crying out for some make illegal states Unrepresentable because there's a whole bunch of information about what is legal and what is not But here it is it's quite straightforward. We can just write a Refine type that says okay b pointer is whatever The offset is some number which is a natural number and must be less than the size of the Size of the buffer the buffer had a hundred elements. The offset can't be 20 to 2500, right? The the the size of the offset must be between zero and a hundred And the length that's remaining must be some number such that if you take that number Plus the offset. You're still within the boundaries of the buffer Right, so if you if you just think about that green thing over there If you had a hundred bytes your offset is at four then the length can't be more than 96 Okay, so i'm not making up these invariants. This is just how it is Okay, so far so good Fantastic, okay, so as before i'm just going to create a little alias for byte strings of size n Which is just a byte string such that b len is equal to n, right? Okay, so here are some good byte strings I'm going to create. I'm going to say malloc five and then i'm just going to create You know one of these byte strings which with the foreign pointer fp starting at offset zero with five characters That's the first one and I can say yes. This byte string has size five the second one. I can say it has size three Wait, that's totally not right This is no good. Oh, no it is it is it does in fact have size three, right? You see why right because this size is in fact three If I was to change this to 30 for example, that would not be so good I can't claim that this byte string has size 30 Because I only allocated five bytes, so I can't now suddenly magically pretend that the byte string has 30 characters Okay, so far so good, right? So here's a quick exercise. Can someone tell me how we would fix this? So here are two bad byte strings, so I want to pretend that this byte string has 10 characters But liquid haskel says no it does not what's the problem here? I just allocated three right, so that's obviously shady. I want to make that 10 and what's wrong with this Why is it that this byte string does not have size two? I allocated three Exactly, but the offset's two so once I'm at offset two there's not that much space left So I need to tweak this and I want to make this zero right Okay, fantastic. So grind grind grind It'll come back and life is good Now, of course, this is not how one really creates byte strings One does not call malloc and then do it in the shady style The real way that byte strings are built is something like this And this is in fact how it's built in the library, right? There's a little function called create It looks roughly like this create takes as input an integer It takes a function from pointers to That return an io action that essentially fills up the that particular pointer and then returns a byte string and here's how it works Unsafe perform io I guess you never thought you'd see a talk with dependent types and unsafe perform io all at the same time But here you are. Okay. This is a real world people. Okay create and fill Unsafe perform io. That's why we get this nice pure interface do You create the byte string with it you create the foreign pointer with fp You call malloc of m n and then with foreign pointer fp You call this fill function and then you return the the foreign pointer starting an offset zero with size n So what's the problem here? It doesn't like the n y Well, let's stare at the error message. It's pretty obvious Uh, what does it say value is n not a subtype of greater than zero Remember how we said malloc must not be called with negative numbers. So how should we tweak it? We just want to turn this int into a nat Pretty straightforward, right? Well, we will come back to this in just a second come on come on byte string. There we go Okay, so we thought we had we'd fixed that particular problem But not quite because here is another function that calls create inside the byte string, right? So it's a function called pack and what pack does is it takes a string Like you know the string or list of characters the old old fashioned haskell style list of characters And returns a byte string whose length is exactly equal to that of s So if I give you if I give you the string burrito, then you want to get back a byte string burrito Okay, which are exactly that many characters And now there's just like a sea of redness over here. It's like I don't like this at all And what do I not like? Well, first of all, I don't know if this poke is legit I don't know how much space there is inside that pointer p You're trying to write something there, but is there stuff there? I have no idea. Am I allowed to increment that pointer by one and I'm calling create n How on earth am I to guarantee that the byte string has the same length as s? Right, so remember the type that we had for create was a rather not a terribly exciting type Right the type the type we have for create just says you give it a nat you get back a byte string So can somebody tell me how would we tweak the type of create? There's a lot more that's happening inside create. There's a lot more invariance. Yes Fantastic. So that is fact one So what I want to say is that n is a nat and this is a byte string n of size n But there's more as they say weight. There's more You see how you have this create function and I'm passing into it an action Right and all this stuff where I'm actually filling in the contents That depends pretty crucially on what like what's going on with that pointer p Right, so the reason this is not working out is that I have no idea how big that pointer is Like what is the size of that pointer? And so how shall I is there can you think of a way I might capture that information also in the type of create? Exactly, okay, so in fact it is a pointer n. It is not any random pointer But I promise you it will be at the same pointer that you created With the nat and hence it'll be a pointer with n bytes in it Okay, and now when I tweak the type of create down there, both of these will work out right or so I hope Blah blah blah okay fantastic So pinkness has disappeared from here and it has also disappeared from here Okay, so it's a more precise type and it's one of these cases where sometimes you have to iterate a little bit Just to figure out what exactly is the right information that you need Okay, so there we go for pack How about this? This one's a really easy one. Here's the unsafe take function Remember what this does is I give you a byte string and I give you just give me the first five bytes of this But it doesn't like it. I give it an n. I give it a byte string And here's why here's my byte string is so fast right you want to take the first five characters I just replace l with n Right, you just you want to drop off the first five you want to leave You know a bunch of characters you just like change the length parameter. That's it super fast Right no crazy recursion none of that stuff, but nevertheless liquid haskell is not happy with our function. What's the problem? l needs to be Yes, exactly. I can't take five characters if there's only two characters in the byte string Right, there's some relationship between the number of things I want to take and the actual byte string size And so what I want to say is that this nat n is in fact smaller than the size of the byte string So I would do that by saying it's a byte string such that b len of v Is no greater than n is less than or equal to the b len of v Okay, so now Because again here we are creating one of those illegal byte strings not not so good and so now it's quite happy It's like, okay, that's good as long as you promise to call unsafe take Okay, and here's a here's a more involved the unpack function is kind of like pack except that there's a lot more Stuff going on but the high level bit is it takes us input a byte string and returns a list of characters Whose length is exactly equal to the byte string and there's a whole bunch of stuff that's happening over here That the smt solvers just kind of chews through right it all works out quite nicely No proofs needed. Okay, so that's pretty much it that was our second level the byte string api and I promise I will finish in 60 seconds Okay, so here is the application api This was that same chop function at the very beginning that I was so paranoid about Well, here's the deal now you see there's this b and it's marked off in pink It's like no you can't just call unsafe take over there. That's no good. So how can we tweak this? What was the problem remember you can't create a byte string with 10 characters and then take 25 characters from it Because that's where you're reading into the the overflow buffer. So how shall we fix this? How big can n be? Or s for that matter No bigger than the size of the list, right? So It's a nap that is less than or equal to the length of s And there we go. Well, not just yet, but you get the idea Okay, and so now if I try and do this shady thing where I create say lambda I can suck out six characters. That's fine. But if I try and pull out 30 characters. That's a type error, right? So just reject it at compile time Okay, fantastic. So that's pretty much it This was just a super fast overview of how you would actually use something like this for you know complicated libraries You just slice it into a bunch of layers and really the only thing we're trusting over here is the lowest level pointer api Right. So our memory safety policy or property is encoded inside those types And we're kind of trusting that that's okay. But everything after that is being checked by liquid ask Okay, I said I would finish and I will finish As soon as I can just pop back right here This is ah, no, no Sorry Here we go. Okay. So we're done with the case studies and see I said conclusion. Here we go Refinement types. What are refinement types? You just take you take your types and you dress them up with predicates You get your proofs done automatically by the SMT solver, which is fantastic There's a whole bunch of machinery on how these types get inferred that I have completely not talked about but that's okay You don't really need to know about it And the upshot is you get super automated dependent typing that you can do a bunch of things with There's a bunch of work we're still doing if you're interested come and talk to me I guess the main thing that I would really like to do is as somebody put it Have ghc have a minus x refinement types option That seems like that seems like a fun thing to do. So what I'd really like to do is to somehow figure out how me and Richard can somehow put out put our technologies together because that'd be pretty sweet And blah blah blah there is if you like this material, there is a ton more of it There's a whole book that I wrote that has sort of far more involved and interesting examples It's at this particular url and with that I must stop. Thank you