 Hey there, welcome to today's Protocol Labs research seminar. Today we are joined by John Burnham, who's the co-founder and CEO of Yadama Inc., a Y Combinator, W22 startup, and Filecoin Foundation grantee. John has been using and building functional programming languages for nearly a decade, helps you create the Morley framework for verifying Tezos-Mark contracts, and also helps create the formality-proof language. Today John will be talking to us about creating trustless on-chain marketplace using lean, lurk, and a smart contract platform such as Filecoin Virtual Machine. And John, I will let you take it from here. Awesome, thank you so much. So today I'm going to talk about the design of a trustless software marketplace, and I have a little asterisk on trustless because nothing is completely trustless, but it's trustless relative to existing ways people transact around buying and selling software. So as is traditional, I have a cast of characters, Alice and Bob, who some of you may have met in other contexts, and some additional characters. We have a malicious third party, trustworthy third party. We have Phil the Robot and Peggy the Prover. And I'm going to go through some diagrams, a little sort of drama around what Alice and Bob might want to do with one another when they're buying and selling software. So really simple example, Alice has some software she wants to sell, Bob wants to buy the software, Bob says, I'll pay 10 coins, maybe that's Bob coins, maybe that's file coins, maybe it's Bitcoin, doesn't really matter in the context of this simple example, but he has some specification and Alice says, OK, I have a program that fits that specification, Bob says, OK, here's some money, and Alice gives him the program. There's some problems. How does Bob know that the program he gets actually does what he wants? How does he know that even if it does what he wants, that he's actually going to get the program? And how does Alice know that she's going to get paid? This, if both parties trust one another, then no problem, they can meet in the parking lot, give two USB keys, one with the Bitcoin, one with the program on it. That's fine. They can send an email. But if they don't trust one another, then the usual way to solve this problem is to add a trusted third-party intermediary. So we have Trent, the trusted intermediary. And if Alice and Bob both trust Trent, Trent can escrow the payment, Trent can store the program, and Trent can resolve any disputes about whether Alice has correctly met Bob's specification. So this is an example kind of like what you would see on Fiverr or Upwork or freelance marketplaces. But in some sense, this is sort of a general concept of software development work in general. I mean, Trent could be management at a company. And Bob could be Alice's boss, and Alice could be a developer. And Bob wants Alice to write some program that does something in the course of the daily, I don't know, work of some business. So I think that this idea of people buying and selling software is a little bit more general than the Upwork model. But I think it's a pretty good sort of concrete place to think about. OK, but there are also some problems with this model. So if the third party is untrustworthy, what guarantees do Alice and Bob have that they're trusted third party, that the Upwork service, that the company, that they're both interacting with, what guarantee do they have that they're not going to take the money and run? What guarantee do they have that they're not going to hike their fees? For the third party, what guarantee do they have that Alice and Bob aren't going to collude? So there have been some cases, interestingly, with software marketplaces where buyers and sellers meet on the software marketplace or meet on the platform. And then they get to know one another and they trust one another. And then they cut out the intermediary and just transact directly. And third parties don't like that. And so there's a lot of interesting governance questions around how do platforms like Upwork manage that. And finally, and most most relevantly for our purposes, this idea of this Alice, Prague, colon, Bob spec, this idea that Alice, that Alice's program fills or fits Bob's specification. What if that's subjective? So if Bob writes up a specification document that says, OK, I want my program to do this, I have this following business logic. I want it to send HTTP requests. I want it to process data. I want it to do a variety of whatever. That is an English language document. And how do we know that the program actually does that? In a lot of cases, specifications can be very ambiguous. Even if Alice, in good faith, tries to fill it, what if Bob's unhappy with it? What if Bob doesn't want to pay? Because, OK, it fills the specification. But specification was bad. This ends up, I think, being sort of a fairly general or fairly big problem when it comes to software transactions where we can send money to one another. We can send programs to one another. But figuring out how to describe what the programs actually are is a very challenging and tricky thing in anything other than this subjective human language interpreted way. Fortunately, we have a solution and we have types. So this is not really a type theory talk, but it requires a little bit of background. But I've tried to keep this sort of as high level as much of an overview as possible. So sort of 10,000 foot overview of type theory programs. Think of them as recipes for computation. And that's a very hand-wavy and general and not totally precise. But I think thinking about them as recipes, a computer takes recipes, it does stuff. And we can sort of treat it at that level of specificity for this talk. A type is some set or space of possible programs that have some description around them. And that's also really general. There are a lot of different type systems. There are a lot of different theories. There are a lot of different ways of talking about types. But for our purposes, let's just think of types as this really general set of programs that have a property. And then we can notate that with this P, colon P, which is what I was using earlier. And then most critically, the thing that I think really makes a type a type is it's not just a random set. It's a set that has some associated computation where we can determine whether or not a particular program that we have is actually in the set that we're talking about. There has to be some mechanical way of checking whether or not this program fills this type. Whether or not this program meets this specification. Some examples, a program add, adds natural numbers, does addition. Everyone knows addition. How do we notate that? Well, we can use these function arrows. It's pretty common in a lot of type programming languages. Rust has arrows. Functional programming languages have arrows. It basically just means I take these inputs on the left side of the arrow, I produce these outputs on the right side of the arrow. So add takes two natural numbers, returns another natural number. Multiply has the same type signature, takes two natural numbers and returns a natural number, but has a different implementation. We can have sort of a LISP or functional notation of add one, two, returns three. Model one, two, returns two. And that corresponds to the more classic mathematical or imperative notation with the parentheses and comma. And then critically, we have this check function, which is that's the type checker function. And the type checker function at a very simple level takes claims, takes type signatures. And I put these in quotes. Sometimes that strings, I'm not intending this to be a string, it could be a string, but just in this very sort of pseudo notation, check takes a type signature or some claim that a program has a type and then returns either true or false. In reality, it would return either a success condition or a result type or errors. So the design of type checkers is very complicated and interesting and has consumed a lot of my evenings over the course of the past few years. But I'm not really going to go into that in too much depth. Types are really general though. So we can use types to describe things that are very concrete and very abstract. So we can use them to describe the abstract property of having a sorted, a sort function for a list. So this is again, kind of pseudo notation, but sort, it's parametric over some other type. We have a list of that type and it produces a sorted list. What is this sort of list? Well, this is an opaque thing that I made up that would have some inner definition that says you go pairwise and each one has to have, each element of the list has to be less than the subsequent element according to some comparison function. But the cool thing about types, especially in dependently typed languages, is that these properties can be encoded statically in the type system. When you have dependent types or types at this very, with a very expressive type system, you can even encode arbitrary mathematical proofs or theorems. So this is a pseudo notation, a representation of the Goldbach conjecture, which is for any natural number, such that's greater than two, there are two prime numbers that sum to that number. So, and then what is this prime? Well, it's an opaque thing. But, and so this kind of looks like a little bit like code you would see in a theorem prover or dependently typed language. Not exactly. I tried to condense it to fit in this slide and then I think probably most relevant for this audience is this idea of some column where you can express using types that this some column function has to take a spreadsheet or let's say a matrix and take some column number and then it produces a natural number that is the result or is the equality of folding an addition over the list of natural that you get from that column. And that's kind of important because if you think back to our software transaction example, that could be something that Bob wants. Bob could, when buying software, he could legitimately say, okay, well, I have this spreadsheet. It has 100,000 rows in it. And so I don't have a function that adds everything in a column. I want, and I want that function. So he can either write it himself. He can ask someone to write it. He can go look for it online. But how does he know that he's getting the right thing? Well, using types, there's this way of potentially creating an automatically checkable way of for Bob to describe the computation that he wants without having to implement it and then in a way that's machine verifiable. So the idea here is that types can encode arbitrary static properties. And the big idea is that a type checker is a program for verifying programs. At our very loose notion of types, you have some static properties about your programs. If you have a program for verifying those properties, in some sense you've created a type system or a type checker. I'm using the word type a little bit loosely, but I think for the purposes of this talk, it's okay. So let's think about what that looks like in the context of our software transactions. So BobSpec is a type, check is a program that can check whether Alice Prague inhabits BobSpec or is of BobSpec. So whether if that BobSpec could be the GoldBot Conjecture and Alice could be proving the GoldBot Conjecture or BobSpec could be some operation over Bob's spreadsheets, maybe Bob is an accountant. Maybe Bob is a big financial institution that wants to do some sort of processing over data. If we have this constraint, this specification encoded using types, then Alice, Trent, Bob, the software developer, the intermediary and the purchaser can all use a program to verify whether the software in question does what it was supposed to. So Bob says, I will pay 10 Bobcoin for a program that sums columns. Alice sees the job posting or the bounty and says, oh, I know how to do that. She writes Alice Prague. She sends it to Trent. Trent checks, does the program match the specification which Trent can do because he has this type checker program. And BobSpec was written in a way that was compatible with that type checker. And so then they all have this objective way of figuring out if this transaction was correct in some sense that Bob asked for a thing. Alice provided the thing and therefore deserves to get paid. There are some problems. Check might be an expensive operation. Not just in running the program itself, but it also might make writing like the software in question harder. And it might be complicated to write the specification in the first place. There are a lot of different type systems out there. They have a lot of different properties. It's fairly complicated to learn how to program and then learning how to program with types. A lot of people find that to be an additional challenge and complexity. And it can take a little bit of knowledge and experience. And there are a lot of different tools. So then the question is, how do we actually know that check is correct? We're still trusting that our type system, that our type checking operation actually matches some sort of consistent or real system. Because obviously if our check function is wrong, then Alice, Trent and Bob can all be fooled in some sense. Alice could create these, could create an Alice product that doesn't actually fit Bob's deck, but she could forge a matching program. And of course, adding types doesn't solve any of the collusion problems or the escrow problems or any of the transaction or transaction related issues. So I want to talk about this having check be expensive, though, because I think that this is actually one of the big barriers to why people don't use types for this. Because even if it's a little bit more work to write types, Alice, Trent, and Bob all have to agree on this check function. And that's not just they have to agree on some type system or some language or some tooling, they have to agree on the specific version. There's a synchronization problem. And then there might be a compute cost problem. So one interesting technology, though, for solving problems of getting everyone to agree on the result of a computation is ZK Snarks. So I'm not going to go into detail about how ZK Snarks work. I'm going to basically treat them as a black box for the purpose of this discussion. But very broadly, a ZK Snark protocol is one that allows you to verify the result of a computation without leaking information about that computation in a way that is fast. So think kilobyte witness sizes and can be verified in milliseconds, regardless of how complicated the computation was that generated the witness in question. So it's ZK Snarks are often called zero knowledge proofs. They're not proofs in the same sense as a formal proof. So they're not proofs in the same sense as a proof in a dependently typed language or a theorem prover is a proof. It's a probabilistic argument, but it's a very good probabilistic argument. So it's a problem. It's probabilistic in the same kind of way that a hash function is probabilistic in a secure hash function. We know mathematically there are collisions, but the odds of getting a collision are so low that we can ignore them. It's the same kind of thing. The inner workings of the ZK Snarks are very interesting. They have a reputation for being semi mystical. I think the expression that has been used in Ethereum is magic moon math, which seems kind of accurate. But the basic idea behind them for this talk is that they are a certificate that allows someone to sign the result of a computation in some sense and so that other people can verify it. I compute f of X and I want you to know that it's why I create a snark, a witness that f of X equals Y and you can verify that fast. So there are two operations. We have a program that we have proved, which takes a program and generates a witness and can do that for any computation in a turn complete protocol. And this might be an expensive operation and we have verify, which takes a witness and returns a bull. Remember that our type checker is a program that that was how we began talking about types that a type is something where we have an automated way of verifying whether a program is in a certain set defined by some property. So we have this check program and prove takes programs. So what happens if we feed a type checking program or a type checking operation where we have check that program P is of type capital P? What if we feed that into proof? Well, we get a witness if assuming that we can do that, we can represent our type checker with the right representation for proof. That's actually, this process is actually pretty complicated for in the real world. But in theory, we can pass a type checking program into our ZK snark prover or witness generator and get a witness that will return true if and only if our claim is true. Because remember that check takes claims, takes type signatures and returns bullions if they're true and verify returns bullions. If so, if our check function returns true, verify will also return true. And if not, then this. Oh, I think I actually should have a check claim equals true in there. But anyway, the I have a little asterisk on this. If and only if because it's not technically true, you could if there's if there are flaws in the ZK snark protocol, if there are flaws in the type checker, then you could forge these proofs or you have a negligible probability of being able to because a ZK snark is an argument of knowledge. There is always this very small probability that you maybe just got phenomenally, unbelievably unlucky in the same way as hash collisions. So the idea here is that by embedding a type checker into a zero knowledge proof, we can generate succinct witnesses that are verifiable in constant time of arbitrary properties that we can express in that type checker. So what does that imply? OK, so we now have this a little bit more elaborated protocol that I'm going to call a ZKTC, a zero knowledge type checked software transaction. Bob says, I did 10 Bob coin for a program of Bob type and sends it to Trent, Trent broadcasts it. Alice says, Alice progress of type Bob type. And here's a witness, WAB, that this is so. Trent runs the verification operation and it returns true and there, thereby knows that Alice did correctly fill Bob type and sends the funds to Alice and Bob can confirm that that WAB is true. And Trent sends the program and the proof to to Bob. And so now we have agreement among all of our participants about the the correctness of this transaction without having a potentially unbounded type checking operation. We have a constant time verification with this WAB certificate that Alice Prague does actually inhabit Bob type. And this, so this is the core idea. This is the core concept of how you get a software marketplace that has some notion of trustlessness to it. Because in this little system, there's there's now no one is trusting any of the intermediaries about the correctness of of the result. We have if they all have this same stack of this proven check, this ZK SNARK protocol and this type system theorem prover, dependently typed language, whatever. And they can all agree using a cheap operation that that this was that Alice Prague correctly fills Bob type, that this transaction was correct. However, we still have in this example, we still have some trust in terms of funds, in terms of, well, storing Alice Prague W, the WAB witness, storing Boz spec. So we have some trust in terms of storing information securely, because of course, what what happens if Alice stores Alice Prague on some, you know, other data host and then sends a link to Trent and Trent sends a link to Bob and Bob tries to go look at it and oh, Alice Prague has disappeared and Alice got the money. And so there's a data availability question and there's a transaction finality question. Those sound like pretty familiar things to an audience of protocol labs, filecoin, IPFS, interested people. So the practical stack for this is our type system is the this really cool theorem prover and programming language called Lean, which is being developed by Microsoft Research. And what's really exciting about Lean is that a lot of mathematicians in the academy, in the ivory tower, have been using this system to digitize, to formalize real proofs, like actual mathematics research. So not so previous theorem provers have mostly been used by formal methods engineers and haven't really gotten a whole lot of buy in in terms of people create in terms of mathematicians, professional mathematicians, creating proofs that are interesting to them. And also the lean theorem prover has been adopted as people have been interested in it as a practical programming language. So there are a lot of people from the Haskell functional programming community, people from the Rust community that have been contributing to and using this language as a way to write real applications. And so this is really exciting. And I wish I had more time to go into all the ways in which Lean is amazing. But the idea is that you have a very general, expressive language for describing these kinds of types that we've been talking about. And then we have Lurklang. Lurklang is a really awesome project out of protocol labs, being run by PortiaQuine. And this is a turn complete, a list language for generating ZK snarks. And we've been contributing to this project really fun. And so Lean is our theorem prover. Lurklang is the ZK stark portion of the stack. And then we have Filecoin, which can act as the reliable host for all these different pieces of data that we've been talking about, the types, the program, the witnesses, and its content addressed, which is really cool. And with FBM, we have an ability to actually do these transactions directly on chain, removing the trust component of the buying and selling. The idea of, well, has Alice actually received the money? Has Bob actually sent the money? Et cetera. So one way to sort of think about this is that these three technologies are all doing verification in different ways. And it's very interesting to have them combined together. So in Filecoin, in the Filecoin VM, we're doing verification by replication, essentially. So we have some state machine. We have whether, so in Filecoin, in the Filecoin virtual machine, it can be either EVM or WASM. So but we have some state machine that has defined inputs. It's a transaction hash. And then if we know what the state machine is and we know what its inputs are, we can replicate that deterministically over anyone that wants to synchronize with the chain. In Lore-Klang, we have this verification by argument of knowledge that I'm not really going to get into, but basically involves a lot of polynomials and is cool and interesting. But and then in for theorem provers, we have some kind of verification using a formal system, using induction, using using something that looks like a something you would see in it in a computability theory class. It's talking about if I have a function, a function has a function type. And then if a function has a function type A to B, and I pass it arguments of type A, then it's going to return a type B. So there's a whole universe of these formal systems talking about software. Lean for is a particularly interesting expressive one. But these are all sort of trying to solve the same or similar a similar kind of problem, which is that we have programs. We'd like to have some way to agree with one another on what the programs actually do. So this is our practical stack for how this actually works. So we have Mathlib, we have applications that get written in Lean. And Lean is not just a theorem prover. It's a general purpose, functional programming language like Haskell. This whole stack substantially is written in Lean, not obviously the Lurklang, ZK Snarkwitness, not the stuff at the bottom. But a lot of the but the Yatema IR and our compiler is substantially in Lean. So you can run Lean on Wasm, you can and our work has been to compile Lean to this Yatema IR, which is a content addressed intermediate representation using IPLD that has some interesting properties. Particularly around anonymizing the proofs and programs from Lean. And I won't go too much into that, but the idea is that we want if you have two programs that are structurally the same, but have different variable names, we'd like them to hash to the same thing, plus have the names be optional metadata. And then we compile that IR to Lurklang, which is a Lisp. And this process of compiling a functional programming language to Lisp is fairly well understood. There are some wrinkles when it comes to doing this in ZK Snarkland. But this is well underway and we're pretty happy about how that's going. And then Lurklang compiles to a ZK Snark witness and then eventually we will have an FEM actor that verifies those witnesses. What's cool about Filecoin is that we can do both the storage of this data we care about. And the execution. Of these operations that we're interested in from this these software transactions. So that means that we can on Filecoin and really just using Filecoin has this software marketplace where software purchasers bid Filecoin. They send a transaction or they send an API call to a Filecoin to an FEM actor marketplace. Where they say, I want to bid this number of Filecoin for a type, for a program of this type. And then the software developer can say, OK, I'm going to fill that bid. And this is the CID. This is the content address of my program. And here is a zero knowledge proof that my program actually fills that bid. So we have this so the marketplace, you could think of it as a code bounty. Bob posts a bounty that has some associated Filecoin. It could be any token, but on Filecoin, it makes sense for it to be fill and and has this associated type. And if we have a Filecoin actor that does verification of these Snarks, then we've removed the idea of Alice, the intermediary, and Bob, all having to run and agree on the same software stack for prove and check. We then have a defined stack, a defined set of prover and type checker pinned by this FVM actor, by the marketplace and by this Lurk verifier. So and that's substantially it. By having type checkers, by having a type checker embedded in a ZK Snark protocol, we and putting it on an FVM actor, by putting it on a blockchain and critically, a blockchain is a block chain, a block chain is a block chain that has relatively cheap storage, because I think if you were going to do something like this on Ethereum, you would quickly the cost of storing anything on chain would be prohibitive and you could store it on IPFS, obviously, but then you have the same data availability question. And yeah, so by doing it on Filecoin, we have a single platform that can both do the storage and the verification. And my claim is that this is a minimal example of what a trustless software market or at least what a trustless transaction for a piece of type software would look like. Now, we can elaborate this a little bit, you know, maybe general, maybe this prove operation is expensive and maybe Alice doesn't want to run this prove ZK Snark generation herself. Maybe she wants to ask an external proving service. Maybe she wants to ask Peggy to generate this proof. Maybe Peggy has a lot of servers. And this does have some trust associated in that Peggy now knows that what Alice Prague is and what Bob type is. But it doesn't have really any semantic or computational trust. There's just sort of a data privacy trust. In principle, all Peggy really could do is leak or publish Alice Prague and Bob type. But as long as you trust Peggy to just keep the operation secret, you can farm out this this actual potentially expensive prove operation. And then you have another variation on this, which is zero knowledge software as a service, which is pretty similar. But instead of Bob posting a bid, Alice can commit to a specific function, which we can call Alice SAS. And that takes inputs of type A, returns outputs of type B. And she can post on an FEM actor or or generate an FEM actor that's a functional commitment. So F com. And she can publicize, OK, I have this function. I'm not going to tell you what it is, but here's its hash. You can send me inputs either directly or mediated by this actor. And I will run my function against your inputs in exchange for you paying me. So pay me a hundred fill and I'll run my function against your inputs. And I will post a proof, this W Alice proof that I correctly ran your input against my function. And that is the output that you got. So Alice doesn't have to publicize this Alice SAS function. Bob can send her inputs to run against her function. This this is a software as a service business model mediated entirely on chain with verification that the computation was done correctly. And you don't get a proof in a traditional SAS model that your API call was computed correctly. It would it would be nice if you did. But here you get that proof. So Bob, Bob actually has a very strong set of guarantees around the outputs that he's getting. One way to look at this is you could think of Alice in some way as like a computational oracle. And I think there are some interesting ways in which the idea of oracles could get applied to these this zero knowledge type checking ZKTC concept. So at sort of a high level. One really cool, exciting application of this is that. There's a way in which this architecture brings together. I think three really exciting things that are happening in computing right now. Obviously, Blockchain and Filecoin and decentralized protocol is one really exciting thing. And zero knowledge proofs is a little bit separate, but I think is is very connected. A lot of people have been exploring the connection between your knowledge proofs and decentralized systems. But there's this other really exciting thing that's happening where people I mentioned this earlier in the talk about how people are using theorem provers to formalize the entire corpus of mathematics. And this is it's still really early days. There are some projects for formalizing the entire undergraduate curriculum. There's been some really interesting work about formalizing cutting edge pure mathematics research. But my belief is that in the next 10, 15, 20 years most of mathematics is going to get moved onto this system. Lean also has been used by OpenAI to create some really interesting automatic theorem provers or automatic proof search techniques. And I think this is a really cool application for Filecoin. The idea of having the entire universe of mathematics living on Filecoin in a content-addressed way with proofs that all of these hosted theorems or typed software programs are correct. And then if we have that in the same place as a Filecoin virtual machine with a crypto economic layer, we've sort of monetized or gamified potentially a lot of the business of writing proofs of writing mathematics. And I think that it and it ends up with the business the practical work of doing math looking a lot like software development and vice versa, because I think that with these tools software development gets enhanced by these types, by these more mathematical constructs. OK, so a little bit about us. So Yatima Inc. We're founded in 2019. And this is with this mission of trying to help this process of we have computing, we have math, we have these typed theorem provers, independently typed languages. And it seems like those two worlds are substantially coming together that in a sense software developers are constructive mathematicians. And really, it's just been a tooling and language question of why we're constructive mathematicians in a very obscure niche of discrete mathematics using very, very specialized systems so much so that we don't even conceive of ourselves as mathematicians anymore. And then mathematicians are in some sense software developers. They're OK, they're writing a lot of LaTeX, but they are producing what really amounts to a ad hoc improperly or incompletely specified where executed programming language. And I think that with this notion of dependent types and particularly with this possibility of being able to use zero knowledge proofs to succinctly and securely verify that these types are correct, those two worlds are going to come together. So we have a team of both engineers and mathematicians, and it's been really exciting to see how people with a more engineering background have been learning about mathematics and how mathematicians have been learning how to write software through this lean theorem prover. We were in Y Combinator and we're a we're a grantee from the Filecoin Foundation. To do a lot of the work that I described in this talk and we're active contributors to work line clean for some of the IPLD repositories in a minor way. And so and we have a GitHub and a whole bunch of repositories that you can look at if that's of interest. And yeah, my email is john at ytema.io. And so if anyone has any questions after the talk, feel free to shoot me a message and yeah, this has been really fun. Well, thank you so much again, John, for taking the time to talk with us. I really appreciate it. Yeah, thanks for hosting.