 I am Adam Fultzer. I work at Galois in Portland, Oregon. We're kind of renowned as a place that has been using Haskell in industry for a really long time, like all the way back to 1999. And what we're going to be talking about today is Cryptol, which was born very shortly after the company itself. So this project has been around for a while. It's, for most of its life, been a closed source, but sort of freely available, like we would send it to academics and whatnot. But it turns out security researchers don't like using closed source tools to do their research. So about a year and a half ago, we did like a ground up rewrite, a much simpler, cleaner implementation, and released it as open source. So that is what I am excited to share with you today. It's our domain specific language for specifying cryptographic algorithms as an executable specification so that you can actually run it and see how it works, which can be done with LaTeX, but I wouldn't recommend it. But LaTeX is the typical format for cryptographic specifications. So Cryptol lets you be precise about what you mean when defining an algorithm, but then actually go do stuff with it. So run it, build a formal model of it, prove it equivalent to some other thing, prove correctness properties about your algorithm. And so these are a few of the things that I want to talk about. We probably won't be able to do justice to everything in the amount of time that we have, but I'm hoping to get through as much as possible. And yes, so, you know, plans encountering reality and all. This is like what I have in mind for what we can cover in this time, but of course let's be flexible and if folks have more questions, I'm not going to stop answering them just because I want to adhere to a schedule. So like the idea of formal methods is motivated in large part by wanting to make software that is correct. And there are many different definitions of what correct is. For the purposes of Cryptol, correct is a sort of functional notion that if I have this specification that is supposed to take this value and give you back this other value that is related to that somehow, that it is going to do that every time and there's not like some secret value that you can pass to it that suddenly gives some bogus value and is a backdoor flaw in your security. We also want to apply formal methods to help improve performance of things. So, you know, a lot of people think about formal methods, they think like Coq or Agda and languages that aren't exactly renowned for their runtime performance, but you can use formal techniques to say, have your nice easy to verify specification over here and move it down to a highly performing one and then prove that those two are equivalent. And so you're using formal methods, but you're not sort of bound by the shackles of having to use a research tool in your production code. Like you can prove equivalence with C that's been optimized to Helen back. And so, you know, the standard questions about software correctness are if you hand me an implementation of an algorithm, how do I know that that's actually an implementation of that algorithm? So, again, this is that type of like equivalence checking between implementations that we want to answer. Formal methods are not a replacement for testing, particularly given the scope limiting that I did a moment ago where I said that we're concerned with functional correctness. If you're implementing crypto, you have all sorts of other things to worry about like power and timing that don't necessarily show up in the semantics of programming language like cryptol. You could imagine a programming language that does have those semantics integrated with it, but cryptol is not that language, at least not yet. So, you want to use testing to verify these other properties and also verify or validate rather any sort of integration and glue code that might not be amenable to the formal techniques. Like you write a C module and you prove it equivalent, but then the wrapper function around the call from your website to that C function to run the crypto might have something wrong in it. So, at Galois, sort of generally speaking, we want the software to be trustworthy primarily from correctness or by construction so that you get that correctness property because the software was built in such a way that it has no other choice essentially but to be correct. And depending on what language you use, you can get stronger and stronger notions of what correct means just by the construction. With cryptol, it's sort of a balance between the standard benefits that you get from a typed garbage collected programming language where you're not going to try casting an int to a float. You're good Lord, dereferencing a pointer as an integer or something like that. And you're going to have memory safety, so there's no array out of bounds that can lead you into trouble without raising an exception. And we use this technique of building mathematical models from things in many more domains except cryptography. And we try and be as open as we can, but cryptol one was a hairy mess of a code base and honestly the reason it wasn't open source sooner is because if you have 13 years of development then you've got to go through and make sure that all that stuff is ready to go out and we're only about 40 engineers so cryptol two was born and it is much nicer and more maintainable. So this is kind of what I was talking about. You've got your LaTeX specification of some algorithm and then cryptographers know just enough C preprocessor to write reference implementations that look like this and so you'll get a paper where this code will be in the appendix and so you can compile it and run it and it's like okay it agrees with the tests that they wrote down in the paper but I have no idea if this is actually a reasonable implementation. So I'm going to break with my schedule already and show you an implementation of a cryptographic algorithm. So this is the Zuck cipher. It's one of the stream ciphers used in 3G which I think extends to LTE as well, phone communication and so you have diagrams like this you have lots and lots of subscripts and superscripts and confusion about whether they're zero based or one based and what all of these things mean so this is definitely much better than a lot of crypto specifications but it's not really something that is trivial to sit down and type into C and like okay I'm going to squish the 15 high bits with the 14 low bits of this word and call that x0 that's something that's kind of easy to screw up and so this is the cryptol specification and I'm not going to go into detail because I haven't properly introduced the language yet but I just want to give you a feel for it looks a lot like Haskell it has one colon instead of two for the type signatures as the Lord intended but otherwise things look kind of familiar and sorry about the syntax highlighting it thinks it's Python so when it sees the concatenation operator it's like oh I found a comment but you have your enumerations you have lots and lots of list comprehensions and one little detail here is that you have parallel list comprehensions by default that's like a syntax extension in Haskell but we'll get to more of that stuff later here is this function that I just showed you over here so bit reorganization is sticking these things together in this particular way and so we are just sticking them together in that particular way and then defining what does it mean to be the low bits of a word in this situation so it gives you an idea of how the law tech doesn't have to change too much to become cryptol and so we have S boxes and these are you see them in all kinds of stream ciphers because you don't want your cipher to be linear and so they cook up these magic numbers that accomplish that for you I took a crypto class once and I understood it at the time that's about all I can say for it thank you Everiste Galois for inventing all of these things for us and so let's see here's a quick example of as you're going through one of these specifications it'll occasionally have examples and so here is 12345678 as an example input to this function and wow is the answer oh yeah thank you and so we can basically just copy and paste this into the executable cryptol code and make sure that this evaluates to true and so I guess I could actually go ahead with this it skipped me way ahead and oh boy this is running slowly got QED down there there we go alright so we can just evaluate example 8 because that's like a Boolean so you can like write little things like this to check your work along the way this cipher is notable because in version 1.4 it had a bug and that bug significantly compromised the secrecy that the cipher provided basically as it's sort of expanding the key out into a sequence that it can use to XOR with all of your message and therefore provide you with secrecy it would sometimes take the same keys to the same key sequence so effectively each of those pairs of keys is just one key and the fewer keys you have available in your crypto system the less secure it is that output is not supposed to happen let's restart and so we can express the collision as a property in crypto and say that when you initialize Zuck the first, this is going to keep scrolling on me the first key that you get out in this key stream is not equal to the first key that you get out from the key stream when you use a different initialization vector so as long as for each key I get a unique thing that comes after that then we're fine and of course there we go and so Z3 can tell us that we are safe but we can also set version 1.5 to false and so this not only tells us that we are vulnerable it finds the initialization vectors that expose this flaw or at least a couple examples of them we can ask it for more if we want to that's probably not easy to do in the notebook but basically we like historically crypto was supported by the Trusted Systems Research Group at NSA and when they saw this example their socks flew off like the idea that you can use formal methods tools to actually test cryptographic properties of an algorithm and not only test them but get out useful example data that you can use to then go in and fix the algorithm so that is the kind of power that these approaches deliver on this domain so I'm going to try not to keep talking too much longer I've already gone over a lot of this stuff it's a statically typed language and the weirdest thing about its type system is that everything or sequences are polymorphic in their size so at first glance this looks a whole lot like dependent types because people who give dependent types intros I'm sorry Brian use vectors with a length I know I saw your strange loop talk and you very wisely demurred from that but that's like the canonical hello world I'm dependent types is a vector with a length field and so people see cryptol and see lengths and sequences and think dependent types it's actually not it's more like if you're living on the bleeding edge of GHC the type level NATs that are there where your type system has two kinds one is the kind that we are used to dealing with like functions and values and the other kind is natural numbers and the type checker can reason about inequalities and equalities with natural numbers to make sure that everything has the right length and this is important because when you're dealing with crypto you're dealing with a sequence of 32 bits you're not dealing with like just a generic bit vector that could have any length so that is the weirdest thing about it and it's probably the thing that you will stub your toe on the most once like this in tax feels familiar is that many programs that are easy to write in other languages are very difficult to write in crypto but cryptographic programs are pretty easy to write in crypto so that's sort of your classic domain specific language trade off is that you can write merge sort but God help you instead you're able to reason about these bit vectors on a level of precision that you can't get without having a heavier duty type system like this so the question is how related are the crypto type system and liquid Haskell and refinement types they are definitely like cousins so we call out to SMT solvers under the hood in order to resolve the constraints that come up when we type check a crypto program but the cool part about liquid Haskell is that you can write these more complicated predicates in your refinement types to talk about different kinds of constraints other than just natural number inequalities so we use just enough of that power to get what we need but we don't try and make a more expressive language there right and there like I would say that it's more about having the ability to take the typical case for cryptography and make that the case in your syntax where I don't have to specify that I'm talking about natural numbers I'm just writing a number and it's using that theory and you know there's none of the overhead that would be required if it was more general purpose alright so some more things just about crypto specs functions are pure there's no IO so you can't unfortunately call out directly to crypto and have it back your web server or something please just don't do that our approach here and what we intend for crypto is that the interpreter is all about the reference specification that happens to be executable and can be transformed into these other forms that you might be able to actually then run in a real system like you might generate C code but you're not going to be running that directly in the crypto interpreter let's see so if you're using the crypto repel so just the command line redeval print loop you get these nice few commands I type set them in markdown because they don't work in the notebook and now that I am sharing my VPS with all of you fine people I am happy that you can't load all my files but it would be nice to support that safely in the future you can browse kind of like in Haskell the pretty printer doesn't like this width but this is basically showing the prelude functions that are in scope like the Haskell repel you can ask for the type of something and this is how we write a for all in crypto so these braces mean that for all A as long as A is finite and A is greater than or equal to 2 then this is somebody's editing my notebook I'm going to overwrite them sorry if you go back to this screen and duplicate the original notebook then you can have a copy to work on and we have sort of this built-in ARIF type class to limit the types in the language that arithmetic makes sense for so you can set your base so suddenly I'm printing everything out in octal you can turn ASCII on which all that does is just interpret bytes as ASCII for strings this help output is going to be a lot more comprehensive at the repel because you will have, oh I guess it, that's a bug it shouldn't have all these things and you can look at the available user settings some of those I've already shown you is like setting the base to something else and setting ASCII on so the basic type that makes up pretty much everything in crypto is a bit and it's either true or false not terribly interesting what gets interesting is when you start combining those things together so this is a sequence of length 7 and it contains bits and you can index into the front of it you can index off the end of it because both of those often are how people describe their algorithms we have enumerations and concatenation shifts and rotates among others you can nest sequences so basically each sequence is parameterized by the thing on its right so this is a sequence of length 2 that contains sequences of length 4 that contain 4 bit words essentially and you can index into that and there's also this, it's like a projection where you can pass it some indices and get that set of indices out of a sequence and you can ask the width of a sequence is its length but when you're dealing with cryptographers they like talking about widths you have sequence comprehensions should be familiar to Haskell or Python programmers and words are kind of treated specially so I'm down here dropping the conceit of putting bit there if you just have the sequence notation with nothing else to the right of it it assumes that it's a bit just because it's so common and you can enter your literals in various bases the classic hella world of cryptol is actually 1 plus 1 what do you suppose it equals? 0, I heard it so arithmetic is modular in the length of the words that you're doing the arithmetic over and so if we give one a type signature and say that it is a 2-bit word then we get 2 because there's actually enough room to not wrap around we've got tuples and they are 0 based because we had to please Python programmers it was a great debate in the office and those can be heterogeneous as opposed to the sequences which must all contain one type we have records which are basically tuples with labels and we can also define type aliases so you can use type for other things than records and then use the dot syntax to project out from the records you've got Boolean operators which work in exactly the way you would expect on bits but they are also defined on other types so you can do bitwise operations on words and if you want to you can sort of point wise do bitwise operations across larger structures so I'll type in these definitions and you can see what comes out so we've got or and xor and tilde is the complement we have if then else but you almost never want to use it and if you want to know why try writing a recursive function like factorial in the way you would expect to write it it gets difficult well maybe factorial is not the best example but like a recursive function over a sequence like write the function that adds one to each element of a sequence and you'll see why if then else is a problem we have ware clauses for local bindings so this just is like in Haskell and what not functions can take multiple values separated by arrows and if you want to return multiple values you can tuple them up and we have anonymous functions with lambda and that's kind of the basics so I wanted to stop here and kind of give you all an opportunity to play around with the interpreter if you've downloaded the PDF of the book which is at the top there's a link at the top of this notebook chapter two has a sort of a tutorial walkthrough of these features and some exercises scattered throughout so let's see how we're doing against so let's see it looks like I'm behind schedule so maybe we will maybe we can just like go through if then else is as a quick exercise together so let's add a new cell and let's write a function add ones that if we give it this sequence we'll give us back this sequence sorry see style comments for legacy reasons okay so what is the type or let's try writing it without the type signature first I'm sorry right so we have A is finite and we're taking a sequence of A's and actually let's make sure that it can't be a sequence of anything it has to be a sequence of finite width words thanks okay so the first thing that you'll run into is that you can pattern match on sequences but it doesn't work in the way that you would expect where you could have like one line giving the base case and then another line giving the recursive case and so right well so I'm not sure what direction you're going in with that but yeah so in like a language like Haskell or something you might write something like this if you were doing the explicit recursion and this is totally Haskell syntax so don't actually try and type this in to Cryptol you might write something like this but the way that you want to go about attacking a problem like this in Cryptol is to think instead of pattern matching how can I write a sequence comprehension that will accomplish this and the sequence comprehension winds up looking like a fold very often so instead of this it really thinks it's Python so we are going to say it is x plus one where x is coming from x's uh oh alright so there we go so I had to put this constraint on the width of b so that we aren't trying to add zero bit words together okay so let's go over this again so what we have here is well we can like actually use this as an example so what we have here is a sequence with five elements in it and each element let's say is a four bit word so the thing that comes first is sort of like the outermost level of the sequence like how many things are in that outermost level and then what comes next are the contents of that sequence which happens to be another sequence that contains four bits and so I'm abusing the implicit word syntax there so we have a sequence with five things containing sequences with four things that contain bits and so if we want we can add ones of the empty sequence and it's fine with that but we can also add ones with our example oops I haven't re-evaluated that and get out the result that we want oops did I just kill my audio? okay so it's not actually talking about this A it's talking about a different A that is the contents of this sequence yes and so on our issue tracker is to make these type variable names more informative and less likely to clash with definitions that might be in scope yes it's just like coming up with a name for this unification variable and it's picking A and yeah I think so so the defaulting is a little tricky because there are obviously many choices that you could make for B there but it uses heuristics in various cases to try and come up with something that seems sensible and that's particularly tricky if you are using decimal literals so if I just evaluate five here it's going to assume that A equals three and alright so let's for some reason the short form of the commands doesn't work in the notebook so the literal five has this polymorphic type where the width has to be greater than or equal to three because we need at least three bits to represent that this is as opposed to hex literals that are just basically four bits for each digit so each nibble so any more questions before we go on? hopefully this is mostly familiar stuff so far we try not to get too different from the expectations of people who are familiar with other programming languages but by necessity some things are a bit strange like the fact that these two are the most used control structures in Kryptol and the first one where you just have a single bar here is sort of the list comprehension that you've probably encountered the most where it's basically the Cartesian product like I'm going to combine everything from this list with everything from this list and so the size of the thing that I produce is going to be that length times this length whereas a parallel comprehension this is really like zip or zip width where you're going to pair wise march up a sequence and use some combining function at each point and as soon as one sequence runs out of elements then you're done so even though this Y is coming from all of these we're only going to get three in the output yeah no there are no monads there's no IO so it's just expressions yeah yeah so you can put any expression here for the generators as long as it evaluates to a sequence and so you can nest Cartesian inside a parallel vice versa to your heart's content alright so a bit more about the types we have both monomorphic and polymorphic types so the curly braces are a giveaway that you're dealing with a polymorphic type and then you saw in the add ones example that you can also put constraints on the polymorphic variables so here we're saying that A and B both have to be finite and B has to be greater than or equal to two and then you have your fat arrow to then end your context like in Haskell finite means not infinite so you can't infinite sequence you can but that definition would not be able to operate on it so the semantics of the interpreter are lazy so you can write something like this and of course never use decimal literals really? that's right that was a recent change I'm still not entirely clear on it but you can use the dots for the infinite like it's from enum to inf in the AST so you can build an infinite sequence like this and it's going to wrap around because you're still dealing with elements of a finite bit width but you can keep things going forever and so if you're defining a cipher or a hash function that can operate on things of any length the type signature is that it takes an infinite number of 256 bit blocks so inf is kind of like this special element of the NAT kind so it's not just the naturals it's the NATs plus I'm sorry pragmatism no it's just I really need to fix that bug so let's just say that we have why is that polymorphic that's probably also a bug anyway so all that's happening is that instead of having a finite natural number here you just have this inf that's the extra natural number in a way yes yep so there is a rudimentary module system I cringe to actually call them modules with John in the audience but they kind of work like the Haskell module so it's more of a namespacing thing than an abstraction thing and you've got syntax for qualified names to talk about things in other modules they don't work right now in the notebook because we don't trust the interpreter to go off chasing after other file paths when running in a web browser at the moment and so recurrences are a really common pattern that come up in crypto algorithms and so you can start to think about the combination of these sequences and laziness as a way to describe circuits and so here we have a circuit that has you can think about it as a bit of state that starts at zero and every time through this loop it adds one and we do that by putting the initial state here and then appending to it a sequence comprehension that recursively references our NATs so this is very much like the Fibonacci stunt that people will pull with Haskell and so you are able to get just enough information from the beginning of this sequence in order to have that recursive reference actually give you a value back and so this if we evaluate it and see what it is it's of course assuming that all we need is a one bit word in there and so it's just going back and forth between one and zero that's not very good natural numbers but the compiler would not warn you, you would diverge though or really? that's surprising maybe it's like asking hey what's the length of NATs it's saying zero and it's saying zero, okay that's kind of a surprise to me but Yavor has been working very diligently on the constraint solver and type checker in the past few months so it does not surprise me that I am surprised I guess that's about a solution yeah now if we then gave this thing a type signature though and we said that NATs is an infinite stream of 32 bit words for example now we go off the rails and I have to kill that so the comment was about being able to check productivity of the recurrences and that is something that we are looking at as part of a larger scheme to sort of come up with an easily compilable subset of crypto so if I leave this type signature out the question was NATs is assuming let's actually fix this first NATs is assuming that A equals one and the question is is there a way to tell it to do something different well we can just give it an explicit type signature here when we evaluate it I wonder if I could give it a finite length something like that oh this is, sorry yeah you have to put parens around it just so that the notebook doesn't get confused and think that I am trying to do a standalone type declaration so if you give expressions explicit type signatures you can force the polymorphic variables to be particular types and another way of doing that we'll see another way of doing that later I'll be sure to point it out alright so this is not a very useful circuit here but you very often see patterns like this like if you've ever seen a linear feedback shift register it's like a building block of a lot of hardware and crypto algorithms and in math usually expressed as like this polynomial but you can think about it in circuit form as well as like another one of these infinite streams but where you're sort of tapping back in like a handful of steps potentially so you're not just limited to looking at the last thing that came out of the sequence you can actually go further back and we're actually already here at the point where here is another way to instantiate type variables so this back tick lets you back tick curly brace lets you give explicit arguments to the types of things and to make that more clear let's look at the type of drop so drop has three type variables how long is the front like how many things do I want to drop off how long is the back and what are the types of the elements and in this case all we really care about specifying is well I want to drop one thing off of this so it's either positional or you can like give the explicit names like front equals one, back equals whatever and so on and so in this one we are still doing this recursive sequence but we have to start off with enough initial values that when we drop three elements from it there's actually still something there for us to include in the recurrence but that's all we need like as long as we have enough base cases now we can essentially build this circuit like XORing these things together as it produces values so questions about this cause this is probably about as subtle as it gets for this talk alright cool oh yeah so speaking of take and drop they work kind of how you expect but the weird thing is that your number argument is actually a type variable rather than being some value that you can compute in Haskell and pass to the function and get that many elements so that is coming from the fact that I am using decimal literals here and so as we saw before if you put in a decimal literal it doesn't know exactly how wide to make that word so it just makes it polymorphic and so if I instead said this is a two that is a 32 bit oh yeah that's right I don't think I should try to explain that right now let's see this is like 10 32 bit words the short form of that error that yeah thank you the short version of that error that just happened is that when you're building these enumerations these things are actually NATs in the type system because the length of an enumeration depends on what values you're enumerating between and since we need to know the length of the sequence then they have to be types it's another thing that you'll tend to stub your tail on when you're learning how to use crypto but here since we gave them an explicit width then we don't get any assuming well that the definition that you entered is polymorphic but I'm going to choose a monomorphic type for you and that might not always give you what you want or what you expect and so that's why this warning is on by default is to let you know hey there is something here and that it's this but you might want to go back in and double check it and then the yeah that's just like this is the only variable that I have left unresolved yeah so group by is an interesting one it sort of generalizes like split and join and so if we take the sequence from 1 to 100 and group by 2's then we get all of these two element sequences inside that larger sequence and likewise we can split which sort of does it on the other axis we wind up with a two element sequence each with like 50 elements in it so you can put underscores in types and it will fill in the blank if it can so here it's like alright well maybe I don't want to think very hard about this enumeration and know that there are 100 things here and then have to divide 100 by 2 to like put the correct type here oh look I messed it up didn't I had to provide the the type of the contents so rather than having to like compute this you can just say alright put whatever there the question is can you put a hole in the middle yeah that doesn't parse so I think it's it's only okay so yeah you can you can put it where you would expect a type to be but the underscore itself doesn't have rules in the parser like the brackets do so it doesn't know to expect another type after it that would be kind of interesting though there is zero as you might expect but zero is is polymorphic oh I haven't evaluated it by a point 3D you know what let's just run everything so zero of type sequence of two containing sequence of two point three D's gives you back that structure with everything set to zero so even on larger complicated types it basically goes out to the leaves and makes all the bits false and if you negate zero that is a convenient way to get something with everything set to one because the Boolean operators work on the larger structures in the same way zero does alright so let's get to some real cryptography so rot13 is the hello world of crypto it's a very straightforward cipher and it's a specific instance of a substitution cipher where every character in the ciphertext is replaced by some character from the plaintext according to some dictionary it has the property that rot13 of rot13 of some message equals that original message because there are 26 letters in our dictionary and so you wind up wrapping around and so here is the cryptol definition of this most famous crypto algorithm and just to walk you through this a little bit what we're doing is kind of the same pattern that we had with the add ones example that we did earlier where we're going through the message and running this shift function on each character that we're getting out of the message the shift of a character is and if you forgot this is the sequence indexing operator it is looking into this map at position character minus capital A so if you think about well let's turn ASCII off and just evaluate the character A so A in ASCII is 65 and so if we take any character say C and subtract 65 from it that gives us an index that then we can plug into this map and so if we say map at C minus A with the appropriate parans then we get 80 and if we set ASCII back on so that that turns into a character then it turns out that C maps to P in ROT13 and so all we're doing is converting into an index that we're then looking up in this sequence that we have pre-computed by enumerating all of the capital letters and then rotating them over 13 places so literally ROT13 so any questions about this definition your first cipher you can encode all sorts of messages with it very securely I'm sorry can you repeat the question right so the question is why does this compile because the map is only 26 characters long and C minus A could be greater than 26 so let's try little C which is 99 and so little C minus capital A is 34 and if we index into map at 34 then we get this invalid sequence index so crypto is not a total language it does not check that for every input to a function you are going to necessarily get an answer things could diverge they could throw an exception like this which is different from the concept of memory safety so crypto is memory safe but all that means is that you can't take some invalid index like this and then have shenanigans happen where you're then using that invalid index and the program isn't crashing it's actually letting you mess with memory that you're not supposed to touch so there are exceptions and there are user errors so no so is the question like does crypto check whether a particular value is in the range of a function I mean like when you say map at it knows that maps seek the length of the map no so basically as long as that index is of the right type it's fine there is a mode in the formal methods back end that can be used to check the safety of functions and we're working on getting that hooked up again that was a feature that we had in crypto one where you could type colon safe of a function and it would tell you whether or not that function was safe and that's what it puts it's something that we would like to get back but it's a pretty different technique involved with that versus a totality checker that you might come across in a language like address good question so the fact that the list is indexed is it only for adding different lists together when does the compiler use the notion that there's 26 elements in them so arithmetic because you're sorry the question was when do you actually make use of the information that your sequences are of a particular length and the answer is that since we're doing arithmetic on unsigned integers essentially we need to know when to wrap around like we need to have some semantics of what plus means and so if we didn't have the length associated there it would sort of be you know up to the implementation to just decide that well in this case let's use big nums and then it only overflows if you run out of memory on your computer or let's use machine words and then it wraps around it 32 or 64 bits so when you're writing crypto algorithms you need to be like very precise about overflow behavior and what are the exact semantics of your arithmetic operators so it comes into play there it also let's see it also comes into play when you're using the formal methods tools because those tools need to also be precise so when we run symbolic simulation on the crypto program it's producing essentially SMT live output which is very specific that all bit vectors have to have a length so it's necessary in order to take advantage of the tools in the way that we need to but it's also it also is very much a domain specific thing to more accurately reflect how crypto specifications are written because they're always written with explicit lengths yep that's a good question so this question was what happens when you try and verify the property that ROT13 of ROT13 of x equals x well let's just add to our definition here and say that property ROT13 that's involutivity right bring our x into scope let's get a new cell here so first of all what is the type of ROT13 in vol so it's polymorphic so the first thing that we need to do is actually try this out at a monomorphic type because if we try and do colon proof of ROT13 in vol then it complains that it's not a monomorphic type so what we can do instead is say alright let's check the involutivity at a particular length of message because our ROT13 is polymorphic in the length of the message that we're processing and so let's say that we'll do a message of 32 bits or 32 letters rather and it's false and this goes back to the question over here which is what happens when you have an index that's out of range and so in order to actually state this theorem more precisely we need to do some checking on the message coming in to make sure that it's actually all uppercase letters and being the prepared person that I am let's just copy this whole thing I'll leave that one up as a cautionary tail get another cell here and again let's do it on our 32 letter size this is a lot slower on the VPS okay can we interrupt this yep it died let's just re-run these cells make sure our prover is Z3 because I had you install CVC4 for this because that's the theorem prover that the type system in the currently released version depends on but Z3 is a lot better for most things and so you can also download Z3 have it alongside CVC4 and use it as your prover for theorems the type checker will still use CVC4 because we've written stuff directly binding to that I wonder if that was just yeah so Z3 run off the rails Z3 comes back immediately Z3 is also now liberally licensed so you can just get that and use it for whatever you want whereas before we did CVC4 because Microsoft had like a weird proprietary but academic license on Z3 and so it wasn't really compatible so what we've written down here is a statement of the theorem that essentially for all X if not all is upper X so this is gonna like traverse the entire message and make sure that all of the letters are upper case then about 13 of X is that original message and so what's happening here behind the scenes is a process called symbolic simulation where rather than evaluating a function for a concrete output value like true or false because this thing returns a bit it's generating essentially a circuit representation of the function that would compute a bit of the signals and once it has generated that circuit representation it can pass it off to formal methods tools like Z3 that specialize in taking what is essentially an NP complete problem of checking all the possible inputs to this function and they use incredibly clever heuristics and techniques to make that go much faster in the usual case and therefore provide a proof that this theorem always evaluates to true well it's saying for any possible inputs is it going to evaluate to true it's asking that question later on we'll talk about satisfiability which is maybe more of what you just mentioned where I sort of know the answer that I want to get give me the input that will get me there I think we were trying to prove that but now we're proving the inverse I'm sorry that's just the name so the property here so it is its own inverse is the property but you didn't write that code that was that was code that I wrote before and used it in because I was anticipating this question so that's not going to work because it's possible that some of these inputs here are not going to be uppercase letters and so quite rightly the prover says hey for this input your theorem doesn't hold so if we go back here and do this other one let's set this back to base 10 so it's got a bunch of C's in it for some reason Z3 chose C's and then it has 128 which I don't think is probably a printable character but I don't know what it is and so that's outside of the map and so that would result in an error and so right so so in the formal model yeah it's encoded in such a way that the solver is going to give back unsat essentially or sat actually for the way we encode these it's going to give it back in such a way that indicates that it doesn't hold like this formula just does not hold for this particular input value it doesn't necessarily explain exactly what happened but it's going to find some inconsistency in the formula that it sends off to the prover so let's see we're skipping ahead but this seems like a perfectly good direction to go in instead of prove you can put check here and really like quick check the interpreter is going to come up with a bunch of random values to throw at it and it found an error for this input sequence and since it is actually running the Cryptol concrete interpreter for this now we see that it's invalid sequence index and what would you call that yeah so this is an exception that just happened just as if we were to take this theorem call ROT13 involve of this sequence of course so I want to call the theorem I guess I could just call ROT13 directly okay there are new lines in there there we go so if you call oh dear if you call the function directly with these counter examples essentially you can do the same thing for the counter examples that come out of the prover and so if you're like alright the theorem is false so why then you can take this counter example and run it in the concrete interpreter and then the concrete interpreter will say oh yeah there's no exception here this sequence is out of range yeah right if you choose so the question is this theorem right here instantiated at this particular type is making a much weaker claim than this theorem which is saying for any size x and for all x of any size this property holds right now the automated tools don't support reasoning about polymorphic theorems like that that's where you start to get into very often needing interactive techniques like you would have in Idris or Coq but we're not set on having it be that way forever so it's not like illegal to send something polymorphic to the proof command it's just that right now it doesn't happen to work because we're translating eventually to the bit vector theory in the SMT solver and that theory can't reason about bit vectors of symbolic lengths so due to the needs of the underlying solver we have to be monomorphic about it and a way to sort of mitigate that depending on what you're specifically trying to prove is to basically like take the same definition and like say we have this property that plus with zero on the left is the identity function that's monomorphic but we can run it at all these different sizes and it's like okay well I maybe trust it more now I don't trust it completely but this is sort of going down the path of providing more evidence and another nice thing is that the drum I've been beating about crypto specs always being very explicit about what lengths everything is and all this stuff is that properties that you want to prove while developing a crypto algorithm are monomorphic and if they're not monomorphic it's just because you have something like a hash function where you can run it for an infinite number of blocks and so doing the proof that one block hashes correctly is a monomorphic thing but you have to make a larger argument that the hash function works as expected over arbitrary size blocks so the question is you can quick check or you can prove and why would you want to do one or the other symbolic simulation does not work for all programs that you can express in crypto so if you have essentially a loop that is not statically bound you don't know how many times you are going to loop through this recurrence relation then you can't symbolically simulate it because it just will keep finding more branches to go off of as it's doing the simulation and it's like well it could go in this direction so I do need to come up with more term here and it will just never stop but even with things that do symbolically simulate not all of them work that great with SMT solvers and so it might take a really long time to prove something but you might just want to like have colon check run every time you load your file and so you don't want that to happen every time and so you just want something that's a bit faster and then maybe you run the proof overnight that's usually for things like checking properties about an entire like actual crypto algorithm that's not ROT13 that can take a while or doing equivalence checking between non-trivial functions it doesn't have too many smarts right now I think it tries to do a little bit of that but mostly it's just random that's definitely something that would be nice to have but prove is really the bread and butter any other questions about that stuff yeah so we're actually well caught up in terms of time so let's just take a let's say 20 minute break and or exercise we've already covered a whole lot of this thanks to the excellent questions so far so property driven development is sort of the methodology that you wind up wanting to use once you have tools like call and check and call and prove by your side basically instead of writing your program once and then trying to prove that it is correct after the fact when you're writing crypto algorithms you're very often writing properties along with the algorithm sort of like like it's very common in a spec for to say okay here is function f and here is function g and g is f's inverse and so you can encode that g is f's inverse and prove that as you go to make sure that what you're implementing is correct in small pieces and then either have the proof for the correctness of the larger thing or maybe it's too large for the SMT solvers to tackle directly so you use the small pieces to compositionally build up the assurance that the entire thing that you've built is correct this can also be used once you have a completed specification to evolve that specification into something that is maybe more useful for your target environment and so we have this picture here where you start off with a reference spec and you refine it and optimize it and then finally compile it to your particular target and each step along the way you are building a formal model and making sure that all the way across what you're building is equivalent even though the specific algorithms or techniques that you're using might be slightly different along the way like a classic example of this is the AES cipher sort of the straightforward definition has algebraic expressions that define how to do I believe it's the mix round step of the cipher and for certain implementations they basically will just pre-compute all possible inputs or all possible outputs for that function and put it in a big table it's like a 32k table or something like that and then just look up in that table whenever they need to do that step and it's not necessarily obvious that an implementation that uses that is equivalent to an implementation that looks a lot more like the spec so then you prove that those two are equivalent and then you wind up with a more optimized version of your algorithm that you still have all the same assurance from the original specification that it's correct so already talked a lot about properties in crypto basically the fine print on it is that anything that is either of type bit or a function that eventually returns type bit is a perfectly good property so you can have a property that is just a bit that is sort of like test vectors essentially you're saying well just check that 2 plus 2 is 4 so you can do proof by evaluation and be done with it or you can have these functions or properties that are more polymorphic have more possible input values and then you wind up having to decide whether you want to do colon check for the quick version colon prove for an expensive version how much do you need to monomorphize it to get the assurance that you want yeah so we've already seen all this stuff if you want to just play around with the REPL you can like build lambdas I mean any value that returns a bit is fair game as a property so Brian earlier asked when you would want to use check and when you would want to use prove and I gave an answer that was mostly when do you want to use check well when you want to use prove is when the size of your input is so vastly huge that it would be intractable to prove it by positively checking all the possible inputs so here is a haystack function and haystack just checks x to make sure that x is not dead beef and so this is a this is a function from 32-bit words to bits and so there's 4 billion of these and we're checking 100 and our coverage like is a rounding error for this function but that was fast so these SMT solvers are really really good at taking this kind of problem that you know the worst case is I have to check them all and coming back with an answer much much quicker and the common case when you're dealing with cryptographic algorithms is that you're dealing with functions that are too big to exhaustively check so that's why prove is so important check is like a nice design aid when prove is a little bit too heavy to be running all the time we already went over these yeah so if you do call and prove without any arguments it'll just automatically try to prove every property in your file and likewise with call and check it's just like alright I want all my properties to pass this is like if you were putting up a continuous integration you would just load the file do call and check call and prove and there are your test cases so satisfiability is sort of the I don't want to say dual in a room with category theorists in it but in many ways it's like a mirror image of prove and in fact we implement prove by turning it into a satisfiability problem so prove asks is this property going to return true for all possible inputs and sat asks what input do I have to pass to this function in order to get true out the other side so it might not be a property that is true for all inputs but there might be one in there and so we would like to find out what that is so just to throw in a quick example here we can do sat of haystack and it finds that right away whereas like when we were trying to prove haystack you know it says oh well this thing is false and so here you can kind of see how these are mirror images of each other it's like prove is giving me something that is not dead beef and sat if I set base 16 what? oh sorry right so let's sat lambda x x equals dead beef there you go so here is a stupid crypto trick you can define matrix multiplication forwardly using sequence comprehensions and such I'm not going to go through the details of the definition but it's doing exactly what you would expect like taking the dot products of their rows and columns huh? yes and so here is a 3 by 3 identity matrix so one's on the diagonal and we can use sat to invert the matrix by taking an example matrix here and saying alright find an x such that m mult of this matrix and x is the identity function which is like the definition of great timing and there's the inverted matrix poorly formatted so tada there are a few more exercises that you might want to check out offline that are sort of like good introductions to how these sorts of properties are written and how to check them but I just wanted to close by pointing you toward like the various open source resources wow did I really write homepage why did it not show the link that's annoying alright so iPython's notebook is broken in the way that it renders this markdown so we have cryptall.net which has got links to the book and to the downloads and the mailing list and everything all the development is on github and all of our releases are there it's up on hackage as of a couple months ago and I managed to get it into homebrew and some kind soul has put it into nix packages as well and has been submitting bug reports to me because nix packages breaks so that's awesome it helps us fix things that we didn't know were broken the mailing list is pretty low traffic it's mostly me announcing new cryptall releases but people will also chime in and ask for help getting something to type check which is almost always the problem when writing cryptall and then if you come up with a neat algorithm or a cryptall implementation of your favorite bit manipulating algorithm we are very likely to want to put it in our contrib directory a bunch of great crypto implementations from members of the community definitely check it out we really really love seeing people using cryptall and getting value from it and we aren't getting any money from that we just like it so please don't hesitate to email me email the mailing list just get in touch we'd love to support you as you try messing around with this stuff yeah so that's the cryptall workshop thanks