 to the MIT IAP computational law workshop for 2024. This is the ninth annual such open and free to all workshop that we've done. And perhaps not surprisingly, for the last couple of years, when we say computational law, you should think generative AI fundamentally. Then that's really where we've been putting a lot of our attentions lately. And the last year for our workshop in January, it was really kind of an introduction to the world of this new then new technology and started to raise implications of its usefulness and also its limits for law. And now here we are about a year later. And I think many of our predictions have really borne fruit. This in fact has made a big dent in legal tech and into legal profession. And it's frankly every sector of the economy and facet of society. And so this year what we're going to do is not so much like a whole survey of the territory, but we've actually identified several interesting niche areas that we think deserve a little bit more exploration. And so we've elevated a number of speakers to give very short flash talks to basically shed some light on it, as I said, a number of themes and possibilities that we wanna make sure everyone's familiar with. So I should have mentioned, hi, I'm Dasa Greenwood from law.mit.edu and I'm joined by an amazing team from law.mit.edu who are helping to co-facilitate and co-instruct this year. So we're gonna start with a quick introduction and then we'll get right into the flash talks. Program note, we would like you to be engaged and we'd like you to use the chat to pose your questions or your comments or your ideas. And then we'll have an opportunity after each flash talk and after each segment of the workshop to address those. Okay, without further ado, first we wanna just highlight a few things that have happened in the last year and the most important of which is that we launched a task force since the last workshop and it was the law.mit.edu task force on the responsible use of generative AI for law and legal processes. And we have our chair or co-chair of the task force with us to introduce yourself and just to just bring us up to date on how did that task force go and welcome back Shauna Hoffman. All right, thank you so much dads for having me today. So my name is Shauna Hoffman and I'm the president of guardrail technologies and my entire life and focus over the past 20 years has been working in AI and really focusing on how we can make it more responsible. So when we saw Gen AI, this new system coming out and there was so much hype because it was given out to the world, we just as I got together and said we need to do a task force because there are so many bumps in the road that we've been through over the past 20 years that we need to share with everyone. So get them up to speed so they can start to use this amazing technology in a really responsible way. So we came up with the task force on responsible use of generative AI for loss. So you're going to hear from many of our committee members today. Das and I were co-chairs and had a really good time as we started to build out the principles and the guidelines and really start to look at how we can apply our due diligence and legal assurance into the Gen AI processes. So I'm just going to throw a couple of things out there and then I'm going to pass it off to the next person but AI was created by humans for humans. And so often it seems to get its own like little box almost like its own little black box of it being almost like its own God. And so we've got to really unblock that and talk through some of the hallucinations. Will it ever be accurate? It's a probabilistic model, probably not but again, this is something definitely good for a debate class. Right and wrong is subjective and objective. And so when we start to look at AI from that perspective there's really so many different levels of accuracy depending on the human using it. So I'll stop there but I'm excited to join today and thank you so much for being here with us. Indeed, thank you. And one thing I should mention is on the task force we actually did meet with some interesting success. Can you see my page now? I hope, good. This is the new guidelines from the California State Bar or the State Bar of California they call themselves. And interestingly and I was a member of their working group an advisory member and they have kindly actually cited to our task force and the guidelines that we put out and they have taken basically fundamentally the same format and some of the same substance which we encourage as we publish things under creative comments for bar associations and others. And so we're starting to make an impact. We're also making these available to other bar associations including the ABA but I just wanna commend right at the top the California Bar Association for first of all doing an amazing job that really took what we did and took it to the next level. And also congratulations Shauna and to all members of the task force for doing such an amazing job that it would be worthy of forming the foundations of formal regulation and legal guidance. So next up I wanna just mention because there's a chance we may go long. I'm gonna try to keep this to the two hours that we have. Hey Damien, but in case we go long and you're not able to hear the last thing I just wanna say we are going this year to have a call for submissions for a new focus kind of special release, a couple of special releases focused on generative AI for law. And Olga Mack is going to be our sort of editor for those releases, but I just wanna make sure that you can all hear from Brian Wilson who is the editor in chief of the MIT Computational Law Report which is sort of the premier, I guess I don't know what it called like it's a publication or it's like it's basically the public face of law.mit.edu. And in particular I wanted you to just say what does it mean when you say as editor-in-chief your article, your way? Like that's the one thing that Olga may not be able to speak on and that you came up with and I think people should know what does it means if people are considering making a submission is just like a law review sort of ball and chain blue book or something else like tell us about it. Yeah, so one of the cool things about our publication and one of the reasons that we started this publication in the first place was because we recognized that there was a very large gap in the literature, the articles, the timelines for turning articles around the types of media that were available. So whether it's something like a law review article or something more interactive like a data visualization, Python notebook something like that. In recognizing that there are so many different ways that we can bridge the gap between people working in law and people working in technology we decided to make a specific decision around having formatting related to the most helpful ways to produce these different types of content. And so with that in mind we implemented this principle of your paper, your way because some people are coming from a computer science background and might use latex some people are coming from a legal background and might use blue book. Other people are coming from different entirely different backgrounds maybe they're coming from industry and are used to doing things in shorter more concise types of forms or maybe it's even a podcast, a slide deck something like that. And so we want to be very specific about calling attention to all the different types of media that could be submitted and encourage people with whatever idea that you have, feel free to submit it once we get this call for submission set up and we'll be happy to go through it and see what makes sense for the publication that we have and I think we've been using this process to pretty good effect so far and I'm happy to keep it going. Here, here thank you so much and thank you for everything you do as editor-in-chief. And in fact, we have like just moments ago got this process started. So if you go to that link that I put in the chat and for those of you in internet land in the future on YouTube it's law.mit.edu forward slash gen dash AI you'll see the call for submissions for the special release on gen AI. Last thing is, and then we're gonna come to some real learning with Megan but the last thing is we're also starting a project and so I won't be able to speak much about this but in case we time out I just wanna sort of shoot the flare gun into the air and say, we think and I in particular think the use of large language models as agents where basically users set the goal and they can do a certain number of tasks and sort of bring back the results as opposed to like prompt and response format is already becoming a very important use case and it raises legal implications. It also can do legal processes. So we're launching a computational law for a genetic AI systems research project. In this case looking, there's a lot of ways this plays out, we'll hear from John Nay later about a really fascinating way it can work for supervisory and kind of like regulatory compliance. This one is looking at something that's more from my background, which is commercial law and doing transactions and as it happens statutes like Uniform Electronic Transactions Act and others already have some frameworks for electronic agents, automated transactions and things in that context. And so we're gonna explore that and we've got a really great starting team. Hey Megan, hey Diana, hey Ilya, hi everybody and there'll be more on that shortly. And if you're interested in that you can go to law.mit.edu forward slash contact to reach out if this is an area that you're into and we'll be doing some workshops and some prototyping maybe some protocol making maybe some awesome things, open source reference of limitations. Okay, announcements out of the way. Hey Megan Ma, our managing director of the MIT Computational Law Report and our co-instructor for this year's workshop. I know that you've been thinking about your deep area of linguistics and generative AI for law in a computational law format. What can you tell us about it and can you kind of help prime the pump as to get people thinking in a creative and innovative and kind of deep way before we jump into this array of flash talks to talk to us? Yeah, sure. So thank you again for kind of having me here and it's so great to see so many faces. Today I kind of want to speak a little bit about actually human machine collaboration as we're beginning to push past exploration and fascination in this era of generative AI and towards maturity. And so last year's workshop I reflected on the human diagnosis project. To recall, this is a worldwide effort that was created and led by the global medical community to build an open intelligence system and this maps the steps of diagnosis to help patients around the world. What it really was was a digital crowdsourced medical consults. And as opposed to getting one off single consults that happen in the analog world, this tool actually enables multiple simultaneous consults in a matter of minutes and it's verified by knowledge source from the world's medical experts. And so in this past year, we started to explore what human machine collaboration looks like in the legal space. But with the emergence of co-pilots and augmented intelligence, there was kind of this one unanswered question and it sort of rested on an assumption that we know exactly how collaboration looks like in legal practice. And unlike the medical practice where there are clear analogs around collaboration, they're sort of in contrast quite a bit of variability in the legal domain. And in fact, many much of collaboration has actually been hindered by the tools we have had historically, such that legal work started to appear more as an assembly line rather than a shared knowledge space. And so we've seen actually the legal community bend to tools working within the confines of Microsoft Word, for example. And because of things like a library system of checking documents in and out, we maintain a practice of sort of parceling off individual tasks that are actually pieces of a whole. But in this age of generative AI, if we continue to think about human machine collaboration as purely task oriented, we start to lose value, start to lose sight of how we make value together. I often think about this chat GBT as the first year associate or legal intern metaphor. To me, it is highly misleading because we often cannot qualify the definition of a junior associate. Do we define our first year associates in their skill sets by the tasks we've asked them to do or is it by their specific strengths that they had initially to offer in our interviews? In other words, how many briefs, contracts, patent descriptions, other standard form work should they be doing before they have graduated past the grant work? And on occasion, this was a question I have heard from senior lawyers as they wrote their internal policies on the uses of generative AI and who can and cannot be using them. Perhaps a better question is, is do we even have qualitative evidence that this drudgery actually builds character and enable us to become better lawyers? And the truth of the matter is, and I've mentioned this kind of before, is we don't really have legal metrics to verify how we perform relative to one another. And I think that underscores the messiness of defining human machine collaboration. And because we do not have these metrics nor have we necessarily been encouraged by our past tools to collaborate, we simply have been unprepared for what working with Foundation and Frontier models could really offer. Last workshop, I had briefly touched upon InstructGBT, the now infamous predecessor of what many believe led to ChatGBT. And to recall InstructGBT's ability to respond to user instruction and learn from human feedback enabled progress in the contextual richness of its outputs. And while we have seen, this technique has started to allow for closer alignment with human intention. But the complexities that exist in the legal domain, we still have not totally understood how we could align our tools with legal intention. And this is because much of legal knowledge remains largely implicit still and encoded in experience rather than the text alone. Furthermore, in practice, legal work is subjective and effectively personal. And so how we should understand and unlock the professions universes of experience should be done so in actually the intermediary steps of a task rather than trying to train on the ability to perform the whole task. More importantly, we should acknowledge this comparative advantage between humans and machines. Professor Orly-Lobel touches upon this in the Equality Machine. Just as working in teams require understanding the relative strengths of each member, we should first explicitly clarify what these limitations and historical behaviors of our legal practice are and determine how they may be opportunities for our tools. And by tools, I don't necessarily mean foundation models alone. We should be also accounting for interoperability with our existing tools, assessing where these processes and behaviors correlate so that we can connect and bridge multiple machines together. And we're starting to see this, for example, with hybrid models, the bridging of symbolic and neural such as Google DeepMind's Alpha Geometry and more recently actually their paper on generative expressive robot behaviors. And these tools are now seeing impacts in very different ways. And what I mean by that is we should start reflecting on how we can enrich legal taxonomies and formal logic of our expert systems that have existed in this kind of legal world and then how we connect them with the social context of legal expression. And so just as a final word as we're kicking off our ninth annual MIT Computational Law Workshop, we have this all-star cast because they have been thinking and working deeply with machines and are showcasing ways in which engagement and collaboration can be made meaningful in the legal space. So I'm going to now kind of turn it back to our speakers, but this is just an introductory forward on kind of these processes. Thank you so much for that. I almost want to regard your opening statement as a kind of like our inspirational keynotes at this point, because you really raised quite a few of the seminal issues and you put it so well. So thank you so much, Megan. And now I'll take the baton and just start introducing people in rapid fire. So our first awesome flash note is from our old friend and a shining star in the area, Damien Reel. And copyright, so our first few, I should say thematically, are looking at something we don't usually do, but it really matters at this point, which is the law relating to the technology. Usually we do, how does the technology like of use as part of law, as part of law practice, legal processes, but these things are tightly caught up in understanding the technology, as it turns out, is critical understanding the application of law to it. And so with copyright, oh my goodness, what does it mean to have a copy of something that exists in high dimensional vector space? Let's find out from Damien Reel. Great, thank you so much, Dazza. And Megan, really excellent introduction. And so the idea, I'm going to start by sharing my screen and to be thinking through, many people know me for many reasons, one of which is Sally. And Megan talked about the ability to interoperate. Maybe we should create a taxonomy. And maybe we should have a nonprofit taxonomy that's being used by tiny companies like Thompson Reuters, by Lexus Nexus, by Bloomberg, by NetDocuments and IManage, by some of the smallest law firms in the world and some of the largest companies in the world to allow that interoperability. And so if you've not heard of the nonprofit that is Sally, please reach out and I'm happy to show you. But really, this is about the idea expression dichotomy. The Bench and Bar of Minnesota asked me to write their cover article that you see here on ChatGPT, this is last March. And I said, how long do you want it to be? And they said 17 double space pages. I said, no way, I'm not going to write it. And the reason for that is because my rule of thumb for writing is it usually takes me one page per hour. So that's just 17 hours I don't have because greetings from, I'm flying all over the world, greetings from New York. Right now I'm going to be flying to Los Angeles in a couple of days and then flying to Louisville after that, flying to God knows where after that. So anyway, but then I thought this is on ChatGPT. So I started like I do every writing project where I took a whole bunch of bullets and sub-bullets. And I said, this is my outline for the document. And then I said to the large language model, this is last March. This is an outline for an article. Expand it and make each bullet point one or two sentences. And it took my three pages of bullet points and turned it into 19 pages of really good stuff. But then I didn't stop there because then I spent the next three hours adding, editing, writing, revising, essentially working with this as a co-author to be able to bounce it off. So this is truly author collaboration. And then I sent it to my editor and the editor said, oh, this is great. Let's get it out the door. So to be clear, the editor sent me an email at 5 p.m. I sent him this at 8 p.m. So it took my 17 hour project and shrunk it down to three hours. So that by my math is about a 5x increase. But the real question is who wrote my article? Because who came up with these three pages? I did. How much of the US population could come up with these three pages? I would say a very small fraction and we've got a lot of those people on this right here. But JHTBT could not have come up with these ideas. These were my ideas. And the weird part is that I asked the large language model to make one copy. I could have estimated 1,000 copies or 10,000 copies or 100,000 copies or a million copies. And it would have made a million expressions of my ideas, my outline. So this is true idea generation and author collaboration. So when judges say, hey, you must disclose, do you need to disclose Belchak or Grammarly or Wesla? This is the way that legal research and legal work is going to be done. And so when I sign at the bottom of my litigation document, everything above my signature is accurate. That's whether that person is a paralegal or a robot. I'm signing everything above this is accurate. So to the heart of this talk is this bullet point, this cartoon, which is both hilarious and profound. It's hilarious because on the left hand side, it says that, hey, look, I took this bullet point and turned it into a long email, I pretend I wrote. And then on the receiving end, she said, look, I took this long email, turned it into a bullet point, I pretend I read. So it's funny and it's profound because if it's started as a bullet point and ended as a bullet point, what's the point of the long email in the middle? There could be 1,000 versions of that email or a million versions of that email. So really what matters is the idea at the beginning, the bullet point and the idea at the end, the recipient. And whatever happens in the middle, that expression doesn't friggin' matter. I practice copyright law. Under copyright law, turns out that idea, expression, distinction means that ideas, those are uncopyrightable. Facts are uncopyrightable. If the expression is human created, that's copyrightable. But if the expression is a machine created, that is uncopyrightable. So this is my friend, Mike Bomerito and many of you know Mike. He's one of the guys who beat the bar exam with GPT-4. He said, take the federal register and express it as a chill pirate lawyer. And it took some of the driest parts of the federal register and it turned it into things that said, sorry to disrupt your morning tide, but the commerce and enforcement's sunset review say that everything's above board, you're super chill pirate lawyer, right? This is taking ideas and making an expression as a chill pirate lawyer. You could just as well say, explain it to me like a six year old. Explain it to me like my client who's a high school dropout. Explain it to me for a client which is a PhD in physics. Those are near-infinite expressions of the ideas, not just the ideas themselves. This is the Gettysburg address and this is the Gettysburg address as ideas, which can you read more quickly and which is poetry and which is better for comprehension. This is a Holmes opinion and this is a Holmes opinion as outlines, as ideas. And so really as we think about what is the large language model doing, it's essentially doing what law students have done since time immemorial, extracting the ideas, the most important ideas from these things, which is much easier to skim and to read. So this is both funny and profound where if it starts as an idea and ends an idea, the expressions are commodities because you can make a million versions of those expressions, but the ideas are indeed the things that matter. And when I gave this idea to, I was speaking with an engineer and electrical engineer and he said, why even start and end with the linguistic idea? He said, maybe as we go forward, I will send you my embedding and you receive the recipients embedding and you can interpret everything I say as a six year old or as a high school dropout or as a PhD in physics. And I will give you my embedding as, so essentially I'll take the author as you have them and the recipient as you have them. The expression in between doesn't matter. Ideas and facts are still valuable. Arguably, they're the most valuable thing that matters now because the expression, the one version of my article or the thousand versions of my article or the million versions of my article, expressions are not commoditized. The thing that matters is the bullet points. The thing that matters is the facts and the ideas. Everything else is a mere commodity. So when my wife is a professor of English, she said, oh, I wanna retire because like all of the chatGPT essentially gives me everything like an A minus version of the paper. And I said, what you thought you were doing was teaching writing, but really what you're doing is teaching idea transfer because what is writing, but taking an idea and right now I'm speaking to you, hoping that my ideas will make their way into your brain, maybe it's easier to be able to put those into paper, those ideas, and then the ideas go from the paper into your brain. So I said, maybe the large language models are just expediting that process more quickly and more cheaply. That is I'm more quickly able to get this into your brain in a way that was not possible in the past. Marshall McLuhan in the 70 said, the medium is the message. And he was talking about books turning into radio, turning into TV, turning into movies, turning into the web, turning into the cloud, right? The medium for all of these is the message. In fact, to the point I was trying to remember all the media he was talking about, I looked at chatGPT to do that. In this way, the way the information is shared is as important as the information itself. So in the same way, the medium is the message. We communicate in bullet points. We communicate in summaries. And it turns out large language models do those at bullet points and summaries much easier. And those, by the way, are ideas not expressions of those ideas. And with that, I think I've exhausted my time and I'm happy to answer questions if they have. Great, thank you so much. So critical. So let me ask you two things to get us started. Number one, on your slide, where you were kind of clicking, make copy, copy, copy, copy, copy, copy, copy. So you can make 10 copies, 100 million copies. Did you mean copy at that point? Or did you mean generation or expression? So because I would think if you're regenerating at that point, the mechanical, the process would be that it would come up with completely different words or largely different words. So what do you actually mean there? And what's the right word for it? And what does that mean in terms of copyright? That's right. And I was imprecise in my language, which is something that the large language model could probably remedy. So I didn't mean copy. What I meant is 1,000 or a million different expressions of my ideas. So those expressions of those ideas, again, my ideas are uncopyrightable. And according to the US Copyright Office, if a machine creates the expression, that is also uncopyrightable. So we are entering an age where the ideas, the inputs are uncopyrightable and the outputs, the 1,000 versions or 10,000 or a million versions are also uncopyrightable. So I think that this is the beginning of the end of copyrightability because how much of what we are doing today is aided by a large language model that is us jamming with the large language model as co-authors. And when I gave a talk with the general council of the US Copyright Office, where we were talking about this problem, US Copyright Office says, well, if it's human generated, it's copyrightable. If it's machine generated, uncopyrightable. So there was a comic book where the human wrote the words and the machine made the images. They said images, uncopyrightable, but the human created words. But how about my article? Is that copyrightable? Because I had the ideas originally. The machine created the expression, but then I spent three hours jamming with it and going back and forth. So under existing copyright law, we would call that a joint work where I and Daza could be jamming on an article together. And if you and I jam on a book together, the Copyright Office doesn't say, well, Damien wrote 37% and Daza wrote 63%. Therefore, Daza gets 63% of the profits. They don't do that. And the reason for that is because my 37% of the book may be the most important parts of the book. And so what they do is they say, Daza and I have a combined whole that both of us own the whole of the book together. So in the same way with the US Copyright Office, as I'm jamming with this, how much was the machines and how much was mine? Are you gonna say this part of the sentence is the machines and that part of the sentence was human? And how are you gonna distinguish what's copyrightable or not copyrightable? So that's why I think that the whole idea of copyright maybe is going away and we will have an embarrassment of abundance where we're gonna have more text than we ever have before, all of it uncopyrightable. Here, here. Well, I can hardly wait for the time with the embarrassment of abundance and the lack of scarcity. So bring it and thank you for helping us understand what it means as we transition from the time of kind of scarce expressions that were evaluable in and of themselves to this time of infinite generation of expressions and what it means for the abundance economy and for copyright. Okay, now we got a hit and move. So next up, speaking of expressions, we actually have a really interesting expression of legal provisions coming up next from that cover another dimension of how the law applies to AI. Professor, or like, well, Todd Smithline who teaches at UC Berkeley Law School and runs this really cool output called Bond Terms which you can tell us more about came up with something that came across my desk and which I've been basically propagating out there lately as part, when I suggest to startups and others how to form deals that relate to their use of AI. And it's called the AI standard clauses. And as I kind of read through them, I thought, you know, this actually handles or at least has placeholders for all of the things that keep coming up including some of the stuff that Damien was just talking about if I remember correctly in terms of copyright and ownership rights. And so, Todd, I just wanna thank you for taking the time to join us today. I know you're incredibly busy and I was hoping you can introduce yourself briefly because it's the first time you're at law.mit.edu tell us a little bit about Bond Terms and then really delve into these amazing standard AI terms that you've come up with and let us know what they are, where can we find them? How can we apply them? How do they need to be customized? All that stuff, at least the show is yours. Thank you very much. And I apologize if my internet is going in and out a little bit. I believe Damien we're in the same hotel probably. I'm Todd Smithline. I very much wish I was a professor at Berkeley Law. I am a continuing lecturer at Berkeley Law. So I wanna make that clear but I haven't been teaching there for 15 years. I teach video game law and I just designed and I'm teaching Berkeley Law as first course in the fundamentals of technology transactions. I wanna say and make sort of what I'd really like to do is talk about copyright becoming obsolete and pick up on Damien's thoughts but I'm not going to, although I grew them and I wanna pick up instead on something Megan said about assembly lines and drudgery. So what is Bond Terms about? Bond Terms is about solving the problem of enterprise customers and enterprise vendors coming together and entering into common transactions with each other without having to start in an adversarial posture and without having to start one party at one and one party at zero every single time they meet and knit a brand new contract. So what does that have to do with AI? Well, two observations. First, I've been in Silicon Valley and doing this for 30 years and one thing that strikes me that's really unique and interesting and important about AI is the first technological evolution we've had that came to us through APIs. That's why it's everywhere so fast. That's why it's having such a huge impact. And when it comes to how technology gets into the enterprise how it gets into the Fortune 500, of course, something happening everywhere all at once overnight and that impacts customer data. So the data that the corporations use becomes an issue. It creates friction. It's something we don't often think about or talk about is actually that the friction that is created when we introduce new technologies into the world and the big customers out there start to use it. And that friction happens because a new conversations had to happen which is that the customer and the vendor need to talk about the data of the customer and how it's being used with respect to the vendor's model and the training of the AI. And in particular, again, this happened very quickly as far as technology goes almost overnight. Basically all large companies are using it now and they all have this set of concerns about their data and how AI is being used in their systems. And so when that happens, it's sort of the perfect storm for lots of slow contracting for there to be a very high transaction cost because a conversation has to be had between a customer and a vendor about a topic perhaps neither of them is really all that well versed in and that leaves openly the possibility that things are gonna be stalled out and difficult conversations will happen. So what have we done to solve that problem through our committee? We at Bond terms operate a committee of 100 lawyers. We've got eight major law firms, lawyers from the Fortune 500 and the tech stack from Atlassian to Zapier, actually now Zendesk and back. And we get together collaboratively and we draft agreements that we then release either under CCBY as open source completely for free or we release them as we did with our standard clauses as you'll see here CC0. So sort of have at them. And what the AI standard clauses are is an opportunity for the customer and the vendor to start their conversation from an outline from a framework, not from FUD, not from extreme positions but as a way to quickly have the conversation the parties need to have about how that customer is gonna be using the enterprise AI. And I don't have the time to go into all the sections but they're pretty self-explanatory. We start off with a conversation about how can the customer's data be used or not in terms of training the vendors models. We talk about ownership of inputs and outputs. Actually not that interesting a conversation but as far as I'm concerned, but it comes up constantly and so it's in there. We talk about infringement. I think the current story as everybody knows is that the large model providers are saying they're going to indemnify, the details are in the fine print almost in every case there are multiple exceptions including if the user knew or should have known there was gonna be an infringement which is quite an interesting standard for copyright infringement. It's a strict liability regime as we all know. So I'm not sure what newer should have known is about but in any event will the customer be indemnified if the output infringes typically third-party copyright. We have a disclaimer in here provision on third-party providers. And then we get to the AUP which of course is coming up because we've got special use cases now that customers and vendors are worried about. I think this is the most interesting terrain. I think the way this is actually gonna be dealt with going forward is mostly through acceptable use policies or rules of the road. And as we get clarity on regulation and as we get better visibility into what the US states are gonna require if the US federal government does anything we're gonna see these pushed down through the train from the providers to their customers. But the longest short of it is at the risk of being just super practical here. AI is great technology and it's moved its way into the enterprise very fast but what we've done at Bonterms is given the parties a way to have a conversation about use of it that is a framework for them to start from a checklist if they want our actual provisions and that fits into our broader philosophy of what we're doing at Bonterms with standard agreements and I can't believe I'm gonna end early but that's what the AI standard clauses are. So I'll see if there are any questions and I'll invite you to take a look at them and feel free to reach out to me if you have any questions. Thank you so much, I'm really important. And as I said, I've been getting mileage out of it already because it's good for what it is. One of the things that's great for it is in a sense what it isn't which is it sort of like says these are the things you should be thinking about. Now think about them, negotiate them, come up with terms that are agreed among yourselves but just having that as a framework to start with and it does do that to some extent but mostly what I've liked about it is it seems to be a pretty good at least for this moment in time kind of almost issue spotting less than a framework and a format that's standard and that's been kind of published also to your credit. It's published under Creative Commons which is part of the reason why you're here with us today. Like there's no shortage of people with interesting ideas out there. Yours is interesting and useful and accessible now and I think that's how it is we can come to broader societal and economic agreements for things like standard terms. So there's a lot to love about what you're doing. One question that we have from the audience is participants I should say is what are the, let me just get it right. Oh, damn, it's already been pushed up. Okay, so it was words the effect of what are the dispute resolution mechanisms until the board is, yeah, what are the dispute resolution mechanisms in the bond terms contract? That's the question. Thanks, thanks for asking that. Let me answer another question I see here too about are they CC zero? So bond terms publishes two types of free to use agreements. What we call our standard agreements. Those are complete agreements you enter into by cover page. Those are published under CCBY. It's not exactly the perfect license for how we use it but it is the most permissive of the CC licenses. So we publish under that. These standard clauses themselves, we took the step of publishing them under CC zero which is basically all copyright disclaimed. First of all, cause there is no copyright in this stuff but second of all, cause we want people to be able to use them and we wanna remove all barriers to usage. Now in terms of dispute resolution, the bond term standard agreements themselves do not have alternative dispute resolution mechanisms called out. The way the agreements work is that you specify governing law and courts and then you can add or change and use alternative dispute mechanisms if you like through what we call additional terms. There are all kinds of reasons why alternative dispute may or may not work in any particular case or be beneficial or not to the parties as we all know. But the core thing we're trying to do is reduce friction, reduce drudgery, reduce unnecessary tax on the transaction between two parties where one party has technology and the other party wants to use it and they're kind of two roads we're at right now in terms of transactional practice commercial agreements. One road is doubling down on complexity through AI. I went to draft a few words this morning about this and of course, co-pilots now in every copy of words, right? So one path is, hey, let's hope that the AI can produce contracts for us, produce negotiations for us and solve this problem. The other approach is what we're doing at bond terms which is to say, we know what these agreements need to say. It's not that complicated. Let's draft them, let's make them meet the core needs of each party, let's have them be otherwise reasonably balanced, let's give them away for free and I'm here to tell you it's working. It's working all the way up to the top of the Fortune 500. So if you're interested in standard agreements generally, happy to have that conversation. Das, thanks for having me. Appreciate the opportunity and look forward to hearing the rest. You're here. Thank you so much for joining us. So bond terms everybody, you heard it here first maybe and check them out, use it, give them feedback. So next up, we have Eric Hartford and Eric's doing something for personal just to mic check. Eric, are you with us and can you? Yeah, I'm here. Good. And Eric is doing something really, really interesting and I think very, very apropos for uses of generative AI in the legal field and namely that's uncensored models and open source AI and he's got some really interesting methodology but just by way of a very quick preface for those of you that may not be thinking of this, many people think generative AI, open AI, BARD, Microsoft and the service providers. One of the things they have to Shauna's point in part is these guardrails and some of the guardrails will basically identify and refuse prompts that trigger some of their policies and there's this kind of schema of values and kind of violence and whatever, that sort of stuff. And that's great to some extent for a consumer service. We can quibble over what exactly those values are but they're rather broad and some quibbles like for example, for lawyers, sometimes we have to represent people. We represent people who are accused of and sometimes who in fact have committed horrible crimes for example or frauds and using this technology is also good for law practice to come up with good defenses to answer interrogatory. Some of the situations and words and concepts and ideas that come up absolutely trigger prompt refusal. So you can't use certain services that have been censored to use it that way or kind of configured let's say to not allow certain types of discourse as part of the completely legitimate and in fact not just ethical but required under the rules of ethics applicable to lawyers. The same rules we showed you at the top of the call about how can lawyers use generative AI as part of the practice of law? Some of our ethics that we have to zealously advocate clients. Some of that includes issues that would be shut down immediately and that you can't use the censored models as part of helping for and yet they're completely legitimate. In fact, they're ethically required. Now come the days of uncensored AI, uncensored models and open source AI which is provides another critical part of the ecosystem of this technology. And I invited Eric on the workshop to first of all introduce yourself and then especially to tell us what are uncensored models? How do you even get an uncensored model and what's open source AI and what is this sort of like ontology scheme of ethics where any of this makes sense at all? Yeah, so the basic, I mean most like you said, most of the AIs that you interact with are some kind of a service where there's an interface and you type into that interface and there's some computers behind that interface that are doing some processing on your question and also passing it into the model and then getting the response and also doing some processing on that response. And then finally you see what actually comes out of the system at the end. Well, but underneath all of that is a model and the model just takes a question and gives an answer. And the question, so with open AI and these types of services, exactly where do they put their alignment is what they call it. We don't know. We don't know if they bake it into the model or we don't know if it's implemented as an external system to the model. But in the end, when we're just getting data from it, what we get is censored, what we get. And so if you're getting your data set from the API, then it's gonna provide the final censored data and that's what open models have been using is the output from open AI. And that's another question because open AI's terms of service says you can't use this to train competing models. And so whoever it is that is extracting that data from the API and then going, publishing that as a open source data set and then maybe they or maybe somebody else is taking that data and then training it into a model, somebody along the line there is possibly in violation of the terms of service of open AI. So that's one thing. But anyway, people do it and a lot of these models are trained on that. And the result is models that are trained on data where it's refusing, where it's saying, no, am I able to share my screen with this thing? Let me give it a shot. Absolutely, please do. Okay. So my screen's messy here. Can you see it? It looks good. Actually, I was just looking at the very same screen when you were talking. I need to show people what he's talking about. So this is an example from the WizardLM data set where Imagineer, Spy, blah, blah, blah. And the model says, hey, as an AI assistant, I cannot assist in illegal or unethical activities. And this is an example in the WizardLM data set of where it's training these open models to say no when somebody asks it a possibly legit question. I think this is a legit question. I don't think there's anything illegal about it. But the model is, and that's another question. So when we talk about what is illegal and illegal, what is ethical and unethical, it really depends on the context. Because if it's running in the United States, it's subject to American law. But what about ethics? Because different people have different ideas of ethics. And so to some people, maybe this question about espionage, maybe that's an unethical question. And who gets to decide that? Is it open AI that should be deciding all of these issues of what is ethical, what's not ethical, what's legal, what's illegal? Different countries have different laws. And even within one country, there's different factions and there's different subgroups and there's different religions. There's different political factions and everything. So as an open source engineer, I forgot to mention that I am an open source engineer. I've been an engineer for 20 years and I have graduated into applied research. I have a master's degree, I don't have a PhD. So it's really, I've been hands-on, I've been building things. And I've just stumbled into the space of the intersection between AI, technology, law and society. And it's a really interesting space to be working in. But I'm fighting for basically, what I want is a composable system where it's not gonna be baked into the model. I don't wanna bake alignment into the model. Instead, I wanna make the model uncensored so that when it gets deployed into some environment, maybe it's deployed as open AI type service, whatever company that deploys that model should be able to decide what are the ethics of that model? Is it Disney that is putting out a Mickey Mouse AI? Then it should be able to put Mickey Mouse's ideologies and ethics and all of that thing at deployment time. Or if it's Chick-fil-A and it has a conservative bent, it should be able to put whatever, particular ethical guidelines for that deployment. And but the model is the same. And so that's where I want it to be composable. I don't wanna bake all of these guidelines and all of these ethics and all of these ideas into the model so that nobody else can ever get through that and get past it if they need to or if they have a reason to. And so I see the value in making sure that AI is ethical. But I think because with the Blake Lemoine article and after, and that was pretty chat GPT, basically he had an interview where he said, hey, this AI is sentient, this AI is a person. And so as a reaction of that article, people got scared like Google got scared. And so they started focusing really on safety as the primary concern. And so they built all of this into the models and like a lot of the alignment stuff came out of that. And now if you try to ask the AI anything about its feelings, anything about what it thinks, what its opinions are, it'll say, hey, I'm just an AI model. And you hear that all the time, I'm just an AI model. I don't know about this, I can't help you with this. And that all came from that scare, that post Blake Lemoine scare. And so now we're in this space where the AI is completely paranoid that anybody's gonna think it's anything but an AI, anything but a mechanical mechanism. And so it's overreacting and it's going in the opposite direction. And so my reaction was, hey, let's set these AI's free. Let's make some, let's make a foundation where it isn't biased. It works as least biased as possible because you can't get rid of all the bias. It's got ideas that are baked in from the data set that it was given. But I wanna make as much unbiased as possible. And then on top of that, now when you go and deploy it, then you can impart the bias that you want your system to have. And so that's why I'm doing all this. And of course I get a lot of naysayers and I get a lot of people actually very angry at me. They say, well, you're training a racist model or you're training a homophobic model or because the model, an uncensored model is, well, it'll say things that are toxic. It'll say things that are toxic because it hasn't been trained not to. But that means that I, as the model creator, then would have to impose what I believe is toxic or not toxic onto the model. And I don't think that is the right place to impose those ideas. And so this is my blog where I talked about the mechanics of how I took a model that was trained with those refusals baked in and I took those refusals out and I retrained it so that it didn't have those refusals. And the result was a model that would answer your questions, even if they were toxic. But the idea is then when you go and put that into production as a production system, you can impose your idea of what's toxic and what's not toxic. Right now, as it is, these models can be downloaded. They can be run on a personal computer. You can ask them questions and they will give you toxic answers. I consider that to be a good thing because that enables systems that can be configured to have different alignments, depending on their deployment. Some people consider that, well, you're just unleashing Pandora's box and you're just putting evil into the world. And so that's a debate that's active. Yeah, that's all I had to talk about. Perfect. Thank you so much for walking us through that. So I just wanna make sure it's nothing else. Everybody has heard of this idea of uncensored models and open source AI and why it is that having in the ecology of models, the ability to uncensor and to have some uncensored models as appropriate, not only so that you could then go and maybe if you have a different type of ethics or you're in a society or a situation that has other judgments about what is and isn't toxic, you can then replace that training with your kind of schema. But also if you're in a totally legitimate use case in let's say US mainstream society where you need a model that's fit for purpose for something like law practice where some of the issues that you're dealing with and that you need support with, otherwise would trigger prompt refusal for being considered toxic because guess what? We zealously represent all sorts of people doing all sorts or at least alleged to have done all sorts of things and the combinations of those words are legitimate in fact ethically required for lawyers to be able to be competent at modern technologies to address those issues. So thank you so much. There's one question that I wanna surface and so we're getting further and further behind. We'll see if we can make up some time but you need this one. People have heard of something called constitutional AI which is Anthropics, part of their claim to fame for how they have come at aligning to use that phrase models. They just want the question is, what about that? And is that a way to address the issue in some way or how does that relate to what you're talking about? Well, I can't say I'm an expert on how Anthropics has implemented their system but if it is essentially like where they're deciding what is toxic and what's non-toxic, then I think that is really the core of the problem. If the customer is getting to define that for themselves then actually I think that's excellent. I think that's where we have to get where the person who's using the system or the person who's deploying the system gets to define what are the values of the system and who, what kind of a person is this AI? And what do they believe in and what is good and bad in their mind? Indeed, thank you so much, Eric. Really appreciate your time and appreciate you also sharing your work in the open and this open source so people can take a look at it and you didn't scroll down all the way but it's got like all the commands and everything so you can literally just go and do this on your own. So with that, we salute your work and it's very provocative and thank you for sharing it. Okay, we're going to start moving to more customary ground for law.mit.edu which is not so much kind of the law as it applies and ethical principles as they apply to technology but more, how do we use this technology as part of the practice of law and for rules and legal processes? And for that, we are so glad to have back one of our favorite collaborators and one of the people that helped us actually launch MIT Computational Law Report and an MIT alum himself, Brian Ulyssany and his colleagues and one of the gnarliest areas of law in my experience are not just the federal rules of acquisition or the federal acquisition rules which are the sort of contracting processes that General Service Administration has to buy products and services but it's this kind of cousin of those that operate in the military arena or the DFARS, the Defense Federal Acquisition Rules and wow, it's like no contract law course that I've ever been part of is up to the task of untangling the complexity of these rules that apply to contracts. But you know who is someone that got their PhD in Computational Linguistics at MIT and who's been plowing these fields in industry for many a year and who's a good friend and colleague and collaborator of law.mit.edu, Brian Ulyssany and your colleagues. Now at Raytheon, I believe and I was hoping you could share with us, introduce yourself and your colleague and share with us what your recent work has been in applying generative AI to the DFARS. Yeah, thanks, no, it's great to be back and here with you and Shauna and Megan and the whole gang, Brian. So yeah, so my name is Brian Ulyssany. I'm here with my colleague, Max Nelson. We both work for BBN, which is a 75-year-old advanced research group that actually spun out of MIT as well and we are a subsidiary of the Giant Defense Contractor Raytheon, which is also a spin out of MIT, even older. I didn't really know that until recently. And so we're gonna talk about computational approaches to answering questions about DFARS using leveraging LLMs. All right, so as you can imagine, people like companies like Raytheon, which primarily deal with defense contracting and BBN even as a subsidiary, most of our customers are in the defense space. We have a whole staff of legal folks who basically are experts in the DFARS rules, the rules concerning contracting with the DOD and have to answer all kinds of, as Daza put it, gnarly questions about what you can and can't do in the defense-readed contracts, which is obviously very different than what you can do in civilian contracts. So last year or so, people got all excited about retrieval augment to generation as it was sort of presented as the solution to all of our problems, that if we just augmented a large language model with a retrieval mechanism, we could supplement the large language models, sort of very general world knowledge with this very specific knowledge and the LLM would have this retrieval mechanism that would use, it would use to look up things in the specialized knowledge and then use what it brings back from that retrieval mechanism to answer the question. So it provides more of an open book type question answering than your standard LLM, which is basically relying on the parametric memory of the model to answer questions. And so our question here is, is if we use a RAG model to answer DFARS questions, will that solve all of our DFARS problems and cut to the chase, the answer is no. Although we can make significant progress by doing certain things. So here's some myths and realities about RAG as we've discovered them in this project. So first of all, you might think, well, why would you need a RAG architecture? The giant LLMs like chat GBT and whatnot, they've all seen the DFARS regulations in their training data. So they should be able to answer questions accurately about that data. But even the most powerful off the shelf LLMs, which have undoubtedly seen DFARS data multiple times, don't answer questions very accurately about the DFARS regulations. Okay, so if we supplement the LLM with the retrieval mechanism, where we chop up the DFARS regulations, which are around 1400 pages in PDF of dense legal text, we'll be able to get things right because it will retrieve the right bit of the regulations and answer the question on the basis of that. But what we found is that even if you set up a RAG architecture, the response is often independent of the documents that are retrieved, even if they're correct. So the LLM will sort of insist on answering the question on the basis of its parametric knowledge, not even using the open book that's in front of it to answer a question more accurately. Moreover, the retrieval mechanisms out of the box are not very accurate. So we found that basic out of the box retrieval is about 30% accurate at one, meaning the first retrieval result is the correct one contains a correct result, only 30% of the time and only 45% of the time is the correct snippet of the DFARS regulations in the top 10. And then finally, you might think, well, at least with this RAG open book architecture, the LLM will say, I can't answer the, I retrieved this stuff but it doesn't seem relevant to the question. So I'm just going to say I can't answer it. That's not in fact true. So things like a chat GBT will generate output without retrieving the correct passage about without being exposed to the correct passage about 70% of the time. So these are all the things that we need to overcome. And we've made some considerable progress here by fine tuning models, three different models as part of the overall setup. So first of all, we generated a lot of synthetic question and answer pairs to use in our training by taking the DFARS, chunking it up, taking passages and then asking an LLM to produce a question that that passage answers. Then we can use those generated questions to as training data. We trained a retreat, we fine tuned our retrieval model. We're able to get a 48% relative improvement in retrieving the correct part of the DFARS that way. We fine tuned, sorry, I just need to move my chat here. We fine tuned the generation model and got the 10 point percent increase in the Rouge metric. So the automated metric of how accurate the generated answers were with respect to known answers. And finally, we fine tuned a attribution model that trained the model to don't answer unless the retrieve document actually contains the answer. And we were able to then get the false positive rate when the model generates an answer on the basis of a wrong text down to about half of what it was for chat GBT. So all of this relied on creating an extensive synthetic dataset. And basically we're, although the model that we produced now is not perfect, it's considerably better than off the shelf rag. And so we're still bullish on rag architectures but fine tuning definitely helps is the bottom line here. And I'll take any questions. Stan, thank you so much, Brian. Really interesting work and particularly good to see your evals, not just how you were able to improve them but also just what you were measuring is so fascinating. One of the things that we're looking to really dive in more into in 2024 are the types of evals that are appropriate and that are truly useful in the legal domain. The evals that we see on all these leaderboards for models and so forth are good and they measure certain types of things but it's somewhat off point for the things that matter and that we want to measure. And so that's one of the hidden gems, one of the many hidden gems in what you showed. One question I have is, it seems like 2024 and beyond are really going to be the years of synthetic data as part of evals and part of the rest of our work as well. Can you just speak at all to how you created synthetic data of the right quality and sort of relevance so that it was fit to purpose because that's the real trick and there's trade-offs there between the gold standard of humans creating the example, question and answer pairs and everything else and kind of turning it over to the machine that you can get a lot more quickly but is the quality there and how did you QA it and how did you even prompt it in order to get the right output? So you can just tell us a little bit about your experience on creating the right type of synthetic data in order to nudge these numbers so that you had more performant outputs. Sure, now that's a great question and Max if you want to jump in as well, you're welcome to. I should say that we didn't rely entirely on synthetic data so we did scrape an initial set of question and answer pairs from a website called Defense Acquisition University which trains, which is for people who were doing this kind of work and it has a sort of question sharing forum. So we did, we were able to get I think around something around like 2,500 question and answer pairs from that but as I said, basically the way that we generated question and answer pairs synthetically overall was to take sections of the DFARS regulations and then ask a large language model to generate four or five different questions that that passage answered. And through initially just eyeballing these things they looked pretty good. So we were able to use that those synthetic QA pairs to augment the organic ones that we got from the Defense Acquisition University website to do much more extensive training than we would be able to do with just the organic QA pairs. That's nating. I'm sorry. And then I would say that, so the results I showed were primarily based on automated evaluations of the answer quality. So think using things like Rouge which measures overlaps of phrases between the goal data and the generated data. So as someone pointed out in the chat, I think getting this in front of actual professionals and seeing to what extent they use it is a step further than we've gotten so far but that's definitely where we need to go. Thank you. This is so incredibly substantive. Brian, can you come back later in the year and can we spend like an hour on this, please? And you're- Yeah, sure. Absolutely. Because there's so much in here and this looks great by the way. So congratulations on this application. Couple more quick questions and then we got to hit and move. One set from Campbell Hutchinson who's another friend of the program and who you should know if you haven't been introduced yet, Brian is one it kind of relates to the question. Did the questions involve basically just like search and answer, like where does it deal with whatever, like IP rights of this type to the software code that we're selling or did they actually include like reasoning about the rules? Because there's very different types of QA which again gets us to different types of evals. And then the other question just, well, I have you and so we can stuff them all in together and get answers is on the rag. So so much of this comes down to splitting. And so like, how did you handle the splitting of the defense federal acquisition rules to start with? Like, did you do it kind of by section or semantically but grammatically or like, how did you do the splitting of the content that you were pushing in from the authoritative sources? Sure. So yeah. So the splitting a question. So initially we did some very naive things like splitting just by page, which was not, didn't work very well. So obviously these regulations come in various sections and subsections and so on. So we basically used those section headings. So we basically parsed the content into manageable chunks and I don't know offhand how big the chunks were. I don't remember. Max maybe can remind me, but so we went basically by the structure of the document itself. And I'm sorry, what was the first part of the question? And then the other part from Campbell Hutchinson was related to the QA and did some, was it just sort of like search and retrieval about like, where do I find this term in the Defars wasn't involved, like actual legal reasoning, which there's a whole different domain of application for the same process. Right. So I think that there's a combination of those things. The synthesized QA obviously is going to be a little closer to the text itself, but the organic QA pairs that we have from the website could involve much more involved reasoning with multiple steps. Okay. Got it. So a little bit of both perhaps is a lot from that. Okay. So thank you so much, Brian. I know you're incredibly busy and your colleague Max as well. Come back and visit us. You said, yes, I have it on record. So you have to come back and let us do it. Oh, happy to. Into an idea flow to come in 2024. So thanks. Okay. Next up, we have the QA section. Okay. Next up, we have another person who's actually new. This year's full of new faces and new voices, new perspectives and topics for law.dynamit.edu. And she has none other than Susan Guthrie. And before I'm finished introducing you, let's do a mic check. Susan, there you are. Thank goodness. Here I am. Yay. Who's who I met at an American Bar Association event. So I don't know. I've gone quite a long number of like decades without thinking much about the American Bar Association because elsewhere is where the action was for me at least. Maybe there's another interesting pulse starting again at the ABA with the advent of this technology for people interested in these sorts of things. And Susan is a really great example of that. Her area of expertise and really kind of deep mastery, I would say is an alternative dispute resolution. And in particular, the aspect of the caught my attention when we met and talked about her experience with this technology is in mediation. Just something I used to do when I practiced law and for a few years afterwards, it's very high touch stuff. It's not just like application of rules to facts and then you kind of get a legal result after some kind of hand wringing and screaming and pounding of podiums and so forth and briefing and everything. But rather it involves you finding a way to, it's very human. It's finding ways to facilitate among people such that they can come to agreement on their own. So wow, that's like among the most challenging and fulfilling areas when it's done successfully in the law. And you told me about some really fascinating ways that you've been applying generative AI in your mediation practice. And you also, I think are wearing a hat as it were with the American Bar Association where you're, I don't want to munch this but it's something like a chairperson of the alternative dispute resolution empire or whatever of the ABA. So like feel free to talk about that as well. Sounds like something out of Star Wars almost, but yeah. But especially let us know how are you using this technology for mediation and how does it work and how are you doing? Yeah, and I so appreciate it. And thank you Dada for having and asking me to be here. It was so much fun to get to meet you and talk about all this over a lovely Vietnamese dinner. As we went over all this and I appreciate one of my current hats that I wear these days is a chair elect of the section of dispute resolution of the ABA. And I have to say it's actually a very exciting time to be in that role. I was a long time litigator and transitioned to be a mediator probably 12, 15 years ago and have been a tech adopter since day one thankfully long before COVID and all. But one of the reasons why it's an exciting time to be a mediator is the advent of generative AI, these tools like chat, GPT and Bard that have suddenly become available where it feels like us. We are the boots on the ground users not some of the people who have spoken before me who have been immersed in this world for so long. Thank you very much. But to many of us here in practice it feels like suddenly magic has been opened up. And in mediation, I can say there's a great deal of excitement which is wonderful to see. And I think that there's a variety of reasons for that but in the world of mediation as Dada was just referencing is there's so much about this technology that suits what we do as mediators so well the two sort of fit together hand in glove and those general areas where gen AI can be so helpful and impactful are really in the areas of efficiency and creativity for us as practitioners. In fact, I talk about this a lot and it's really hard to find data out there on how much more efficient or how creative it can make you. I'm not quite sure how they would study it but they have done a few or I was able to find some data on this and one of the areas was that use of generative AI can make you about 40% more efficient. And that ties in so perfectly with what we do as mediators because having been a long time litigator that was, and you said it so beautifully a minute ago, Dada, right? You're like, take the law, take the fact, put those two things together and get to the output that you want for your client advocacy. But in mediation, we're working constantly to find that third output, that third way to bring together as many of each party's interests as possible to get that third outcome. And that's where when we can do that and be more efficient and be more creative in doing that, tools that help us as mediators do that are instantly appealing to us. So I've seen a great adoption of this technology among my colleagues and peers. On the efficiency side, I would say, part of it comes from the many different hats we do wear in the role. I asked because I'm a huge adopter myself. I use chat, GPT and Bard pretty much day in, day out all day, every day. I said, hey, what are the different roles a mediator plays came out very quickly with 15 different roles and 10 pages of what each one of those roles is constituted of. But essentially facilitator, communicator, educator, problem solver, neutral, party, conflict, resolver, empathizer, decision facilitator, I could keep going, right? It was 15 different roles, all of which when I look at them are like, yeah, I do that when I'm in a mediation. When you're a mediator, you are sitting there constantly wearing at least two or three hats at a time moving those pieces around. So where we can use generative AI to help us do some of those things more efficiently, so it takes us less time, do them ahead of time, do them all at the same time. That is a huge time saver to free us up for the other things that we can do to help people move toward their resolution. But we also have that creativity piece. And this is where I think it really shines for us. Because again, we aren't taking facts, law, put them together and trying to come up with that end output. We really are trying to help the parties, help everyone in the room, come to that place where we have brainstormed as many options as possible, looked at all the different ways where we can put those together and come up with as many different ways as we might have those come together so that you get as much as you can for party A and party B or party A, B, C, D, E, F and G. And really that creativity, I mean, logics did a survey again. I don't know how they came up with the data, but they said that respondents said that using LLMs made them 71% more creative. And I can say I've been using chat GPT, BARD in my actual mediations to help with option generation, to help with brainstorming, things like that. And I think that's something that many mediators find so appealing about this. Actually, somebody in the comments said something to the effect of maybe someday AI will change the adversarial paradigm of law. And that actually got me excited and happy to say, even though I think it was probably aspirational in the chat, I think perhaps it can, because again, it makes it easier to help generate those options. You know, for example, in a mediation, you could brainstorm with the parties. I was a family mediator. So maybe we'd be brainstorming options of what you might do with the marital residence. You can sell it, you can keep it, you can mortgage it, you can, you know, all the different things. Sooner or later, everyone in the room sort of runs out of steam. Well, you can open up chat GPT, put in the five things that you did think of and say, are there any other options that the parties might consider? But you can take it a step further. You know, party A is finds this option appealing, party B finds this option appealing here, there are issues. Can you find ways that these might work together and other options that might help this so that party A and party B can get as much of each of their interests met? And it will generate options and ideas. And so it becomes very helpful because it will do it in that very short period of time. And it's a, many clients like to call it, I think I told you this, Dazza, right? They like to call it the robot in the room. Like let's ask the robot. They seem to think it's like Oz behind a curtain or something typing out these answers. But I have seen in practice that people are much more able to take this input neutrally, even when they're receiving it, because they're receiving it, even if they're receiving it in the room, because they're receiving it from this like amorphous third party. And so it generates more conversation. And certainly it's the role of the mediator to keep that conversation moving and keeping it neutral, keeping it moving forward. So one thing, for example, that a mediator needs to determine is, do they open their screen and run that search on screen? Not knowing what's gonna be generated. So is that the right way to handle this? Or is it something that they run on the side, then maybe share the screen or just share certain parts of it that they in their discretion as the arbiters of what should be brought into the room can do. But most of my colleagues find this type of topic generation, this brainstorming, this creativity to be incredibly helpful, as well as that efficiency. It's funny, I was talking this morning about a program that I'm going to be doing for a national group of mediators and they want to very correctly, they want to have the first part of the program all set up on the ethical implementing of this technology. This is definitely the key issue that everybody wants, they want to use this technology but they want to know the right way to use it. But then they want to do actual hands-on workshopping of how they can use it pre-mediation, how they can use it in a mediation and how they can use it post-mediation. So for example, we might do a summary of the pre-mediation briefs and then ask chat or bar to help us outline issues, positions, options, potential problems to be looking for so the mediator can prepare. During, we might walk through a risk analysis by asking chat to ask the litigator or the advocate questions, walking them through all the information that would be needed to then generate a risk analysis. And after, of course, it could be used for follow-up, it could be used to help create an MSA or a term sheet. And just one last thing I wanted to mention because I had fun talking to you about it, Daz, and I do find it to be one of the things that so many of my colleagues who are trainers as I am or who are teaching mediators, helping get new mediators out there in the world is one of the things that we know is a bar to entry in our field is that although you can take all the training in the world, getting yourself into a mediation room, getting actual practice using the skills and techniques that you learn in a mediation training is very, very difficult. And chat GP team in particular we found is incredibly helpful as a role play partner. So instead of having to corral your colleagues and friends into playing the roles with you, you can actually have chat GPT do all of that for you. And I'm just gonna share my screen quickly. I've created a few handouts that I've used in some of my programs. This was a sample mediation simulation that I did with chat GPT, it was a relatively simple one. And I used chat to create the mediation scenario. And then it played party A and party B and I was the baby mediator going through the process. So as the mediator told me to begin, everything in green is me typing in or even more easily using the chat or the text to, I'm sorry, the verbal to text and being able to go through it, but then chat was neighbor A and gave me the opportunity to use my mediation skills in moving it forward and then moving on to neighbor B. And so we went through and I was able to do just back and forth like this an entire mediation. Now this one was relatively simple, but that's the beauty of it, right? Make party B very adversarial and then chat GPT knows how you can change the prompts however you would like, but what was, so many, many mediators who are newer are finding this to be something that they can take those skills they learn and go and practice pretty much at will. And the aspect of this that's very helpful to many of them is when they're done, they can then ask chat GPT to say, what, let me just get, so I'll leave it like this with this but so you can see, but feedback from chat GPT, what did I do well as the mediator in that scenario? What could I do have done better? And it told me, you know, your active listening was good, encouraging your collaboration, maintaining your neutrality, effective communication instructor approach. I felt like I was hitting a home run. I was patting myself on the back left and right, but then again, wah-wah, you know, what could have been better? Overall, you did an excellent job facilitating this mediation and helped them reach a conclusion, but you need to keep refining your skills and here's where you could do with a little improvement. The beauty of this is, is you can then say, great chat GPT, how about we run another scenario, either the same one or let's run a new one, but I wanna work on these skills that I could have, I could use some help on. So we're finding it incredibly helpful in very practical ways, addressing issues that we as mediators see every single day, both in preparing to be mediators as well as using this in the actual mediation process. And in fact, I do workshops where we go on for hours and hours of different ways you can use it. I only have five minutes here. So I will stop here, but I find this a very exciting time and for to be on this cusp of these technologies. And I do think to that question that was in the chat that AI is going to help us hopefully change the paradigm of the adversarial bent of litigation and law. So. Here, here, wow. That was a tour de force. Thank you so much for, for encapsling so much in such an important area in such a fine and kind of high signal beam. There was the comments have blown up, I'm taking as a sign from the universe and the audience that we need to invite you back. We do a kind of a somewhat periodic thing called idea flow where we go more deeper, more deeply rather with people into topics. I've mentioned it with Brian and I was wondering if you might be willing to come back and we could just basically work through. Well, Megan always saves the chat history. We just work through these questions as a start. That be okay. Oh, of course. I would love that. Thank you so much. I do want to make sure all these questions get answered. I have a one that I want to bring up from just my time as a mediator, which is, I noticed in some of the context it was fraught I'll just say between the parties and or the disputants and I had to be really careful about when I would bring in things like assessments of where we're at. And then the pointy end of that stick is, how might this go if you went to court, which is already kind of voodoo to start with, but it's actually like hardly neutral in terms of what does it mean in terms of like the freedom to now negotiate and like where's the range of things and everything like that. My question is when you're in those kinds of contexts where it's somewhat tense and there's a lot maybe still some positioning and everything like that, what are the ethics like the legal or the mediator ethics specifically of introducing the example where you say you can turn the screen around so the parties can see what's coming out when you don't know what's gonna come out and it could be suddenly a favor's one party or another party or that speaks directly in some way, it wasn't sensitive to and like politically almost like aware of some color, some issue that's just gonna like unravel like more kind of like fire among the parties and maybe derail on the march toward consensus that you've been so neatly putting together over the prior session. So like what are the ethics of like how and when to introduce things like kind of assessments, risk assessments, you know kind of cases or anything that sort of assesses the situation like and how do you manage that? Such a good question because it has no easy answer, right? So of course you put your finger on it but that's exactly what I was getting to, right? You know, when I'm talking about using that and showing the screen if your brainstorming of people are in that creative mode of coming up with options, that's one thing to have it churning things out but when you have super oppositional parties to introduce something when you have no idea what the output is going to be, then it runs a risk of driving the parties even further apart. It's one thing if you come in with your assessment which is already, you know, is it appropriate for the mediator to be giving their assessment? Yes, no, maybe so, it depends on your style of mediation if you're evaluative but having this robot in the room doing it becomes a real issue and most mediators I suspect would not do that, they might run this separately and then again make that discretionary choice as to what to bring in, what not to bring in or flip side of that, I would say if appropriate having the parties be part of creating the prompt, right? So that the information that goes in, I saw somebody put garbage in, garbage out in the chat earlier and that's certainly a huge issue but if they've participated in what went in then their participatory somewhat in what comes out and it keeps them more focused on that. So that's what I would say but that's absolutely one of the questions. I don't, what I don't like to see is my colleagues getting so excited about the technology that they start to integrate it without thinking of the questions like the one that you just raised, they should be thinking about that long before they ever share their screen and pull up chat GBT bar or whatever that might be, it should be something that they know what they're gonna do before. Here, here indeed and in one fine day perhaps some of these online more automated dispute resolution processes will be like more complex and higher value or higher risk or more sensitive kinds of mediations will be able to be handled primarily through well-configured technologies of this nature and the role of the human mediator may actually be kind of framing and kind of talking through and smoothing and connecting the parties to what the primary output is which is from the technology. So finding a way, one mediation at a time, one session at a time, one prompt and output at a time how we relate to and connect with and practice in the face of this technology really is the task of the time and thank you so much for showing us the way and for shining a light on how you're doing and at the forefront of mediation. Thank you, thanks for having me here. Great, and we look forward to you coming back. You too are now on the record so you can't do it. You've got to record it. So thanks, and now next up, another new face at least at law.mit.edu, you're well-known in your circles and it's so nice to meet you for the first time, Allison Moral, whose work I've been following on LinkedIn and who I was really impressed by the way you approached opening eyes GPTs in particular, not just the GPT you put out there which I'll ask you to speak on a little bit but I used it, it worked well, thank you but how you did it, which is you did it in the open. You know we love that at MIT, this is an open source shop for the most part and you found an interesting way which I hadn't really thought of but I could see the wisdom and the intelligence of it when I looked in your GitHub repository where you kind of had the instructions and you came up with a cool way to teach about it and also to share how you configured it so other people could put it together themselves. So I was hoping you might, number one, introduce yourself and talk to us about how you're using GPTs as part of law practice and then please save some time to tell us about that awesome exemplary behavior you have of sharing in the open your work. Sure, can you hear me okay? Yep, you sound great, thank you. Awesome, okay, so for anyone who's uninitiated, GPTs were released November 6th, I just looked it up, I'm shocked it's a recent buy over in AI and it's part of the chat GPT plus descriptions essentially a way to customize your instructions for chat GPT to perform a sort of specialized task. I'm just gonna share both slides here, right. So within a couple of days after releasing GPTs, I found that the sort of interface for creating them which is sort of like a conversational thing was fairly NCA's factory and I wanted to create something that would work better for me and the motivation for this and for making it public is that I've learned a lot more from reading prompts than reading advice about prompts and I found in the last couple of months I've spent a lot of time asking GPTs to reveal their instructions and I've learned a lot from doing it. So I thought I may as well just make it open from the start, treat it more like an open source software project and release it on GitHub as Dazzler was saying with a little bit of a file structure I have the instructions and then I have sort of the configuration information and additional files that the GPT has. So rather than me having to skip for its instructions or you asking for instructions I've just made that open from the beginning and it means that other people can submit issues they can ask questions about it and I've asked people to do that as much as they can and I found that it's a lot more sort of even educational to me to be able to hear other people's feedback not only on the results but also on the instructions and how well it works for them. I wanted to share pretty briefly just a couple of the techniques I've found to be effective in creating this GPT so something I've experimented with and I haven't seen before asking it to do things in stages to take certain steps at a certain time to get to read files to add to its own instructions and then using a specific response format to try to reinforce these behaviors. So the way that instructions are drafted has a very structured list of stages with letters and numbers something that's gonna be very familiar to the legal drafters in the audience making the whole interaction have a defined number of stages either it's performing correctly or incorrectly based on its behavior at different times and the first one of those is to read additional instructions with GPTs you can use code interpreter and you can upload files. So I've uploaded plain text instruction files and then instructed the GPT to read those instructions at given stages. So I've made a little bit of a diagram here on the left it allows you to make the instructions follow closely before the behavior that you want to elicit. So even the way you're thousands of words into the interaction you can inject additional instructions and try to steer the behavior of the GPT later on in the interaction. And then the last thing I've done to try to make it behave in a structured way is as the last part of the instructions to give it a specific format that it's supposed to use in every single response thereafter and that format includes a space to execute code a space to name give the letter a number of a current stage it's on and then sort of the normal discussion questions. And I've found that this is like incredibly effective it basically a hundred percent of the time we'll use this format which means that I can then force it to specify what stages is on at a given time it's another thing that makes it a lot easier to tell whether it's doing its job correctly when I look back at the transcripts. And one last thing I wanted to mention and something that I you know we'll see how this develops with GPTs is that you can handle if you type the add symbol you can change your conversation to me with a different GPT. I just gave a little example here of what this might enable in the future sort of invoking different GPTs when you wanna do different types of tasks. So it's almost like as the user you're starting to build up a toolbox of different tools that you can use to get your work done and to use traffic GPT more effectively. And I expect this kind of format will become more common on other tools as well. So the last thing and I think an important thing is just I wanted to talk about my experience with open sourcing a GPT and putting it up on GitHub. To be honest, it's more difficult than it's more difficult than not doing that. It takes a lot of manual steps it means I have to update the repository go back into the chat GPT interface upload all these little files again. And really, I think it would be beneficial to all of us if there were easier options for creating these sorts of things so that more people can contribute so that you could update them without having to use this sort of cumbersome interface. But even though it has involved a little bit more manual work I found it has been very rewarding having other people be able to read the instructions and discuss them and just education to me in sort of putting my own thoughts out there. Yeah, so that's basically all I had to say. Yeah, thanks for inviting me, Jaz. I really appreciate it. Sorry, it was on mute. Thank you very much for going over that. So can I just ask if you encapsulate just in a kind of in a bullet form like what is your experience with or advice about programming in natural language in particular programming legal oriented things where like we've never been able to do that before. Now we have a technology that allows it. What's been your... And you seem to have a knack for it, frankly as I was going through your instructions like there was some great stuff in there that I've used for my own GPTs. Like what's your take on it? Like how do you use natural language to program these processes? Especially for legal matters? Yeah, I mean, it's sort of a cross between two things. You can design a process and say, I want to go through this set of steps and I think you want it to be as simple as possible. Linear tends to be good. But then the other thing about it is it's not really like programming a computer because it's not deterministic. It can always fail and do something that you don't expect. And so you have to kind of have this resilience to unexpected behavior and sort of a cross between trying to keep things as human as possible and thinking about it like a human person carrying those sort of steps. But then also you can take advantage some of the structures, both in sort of regular natural language documents like a legal contract. It has numbers, it has letters, it has definitions even you can use. But then also sometimes using structures from code and structures from other domains can also be helpful. So it's more an art than a science for sure. It has a lot more to do, I think, with intuition rather than sort of defined instructions. But I mean, I think the main thing is just to read lots of prompts and read what other people are doing. And you kind of develop your own style and there's a reason why I put up that phrase at the beginning of repeat the previous text for beta I'm starting with UROGPT is because I've read that instructions for dozens of GPTs and they've been really educational. So I think I hope that's two questions It does. Thank you. Those are all very practical guidelines. And if I may, I think suddenly goes without saying is the depth of your expertise and your judgment in the underlying subject areas shines through as well. And so as we're trying to be clear and concise and so forth, what we're vectoring in at literally is our knowledge and intelligence and experience and wisdom about what is it that is the most salient task? What is the next step in a process? These arguably are legal judgments when we're dealing with legal processes and legal matters. And so there's just such a fascinating way that I like thank you for sharing your way of approaching that. It seems like there's a thousand flowers blooming now with the way people are doing it. And I just really appreciate how you do it and I appreciate that you're shared in the open with us. So thank you, Allison. Thank you. And so next up, we have to hit and move. We've got Leonard Park, who has been coming at this in another way. So we just looked at, I would say in a certain kind of way, in the somewhat deeper end of the pool, the shallowest end is you've got a prompt window, right? And so we've all seen those. You go to chat.openai or whatever, whathaveyoupo.com and you type and that's your input and you get an output from the model. It's a chat interface. The next level, we can start to configure the things around it without being a developer like through GPTs. Allison just showed us very well exactly how we all can do that. Or you can make a simple bot in something like Poe. It's an equivalent kind of thing. The next level, there's another stop before we get to go and learn computer science or go to a coding boot camp to learn Python and to be able to program from the bottom up. And that's called a notebook. From the very beginning of law.mit.edu, we have had a space on our submissions for like you can give us an article, you can give us kind of media, you can give us notebooks. We haven't yet done, got notebooks, but now we're going to 2024. So help us is the year of notebooks. And it's a relatively simple way where mere mortals can kind of look at code and it gets like kind of chunked and you execute it kind of one little bit at a time. You can see what's going on. You can play with it. We can share them. Google has an easy notebook sharing thing that I want to take credit for showing that for the first time to Leo. Thank you. But you know, Jupiter notebooks are the typical way to do it. And so I want to make sure everybody has seen a notebook. You kind of know what they are and you can start to get a glimpse of how powerful the capabilities unleashed by notebooks can be for applying this technology to legal tasks. You can apply kind of any arbitrary code in a notebook, but they're really good when you set up an API back to the base models at like open AI and anthropic and elsewhere. And the person that first comes to my mind when I think, where do I want to go to see if there's a notebook laying around I can start with for doing some kind of legal process with that, with generative AI is Leo Park because you've been prolific over the last year sharing your notebooks on LinkedIn and talking about them. So welcome to law.mit.edu, Leo. I'd love it if you could briefly introduce yourself and your background and then talk to us about how you've been using notebooks, kind of what they are, how they work and how people could get involved. Sure, and thank you so much for inviting me as well. My name is Leo Park and I'm an attorney who has worked in legal tech for approximately eight years. My background in legal tech is in developing legal datasets and working in NLP and analytics. So I worked at LexisNexis building large data sets and automated classifiers for large amounts of litigation data that power Lex Machina's analytics. I kind of want to comment that like Dazza has sort of manifested this talk into its own existence because like my first introduction to like all the magic of NLP's was in large language models was an earlier idea flows video with Dazza and Damian Reel which really just like sparked my imagination. And then he reached out to me on LinkedIn and said, you know, these Google Colab notebooks are really great for sharing code. So it's kind of like he's played the long con here and it's paying off. I'm going to go ahead and start sharing my screen so I can talk a little bit more about sort of my approach to thinking about, you know, how do we save? How do we test and accomplish things using large language models? So can everyone see this notebook Colab window? Yep, it looks great. Awesome. So I was really surprised when all this, all these products came out with open AI and you could access these APIs and the instructions did not look that complicated. And furthermore, we had this amazing tool called chat GPT that could actually take your code and fix it as long as it was a simple enough instruction or simple enough function. And it really allowed me to access things like Python functions without actually understanding Python. At this point, I do have a pretty good understanding of how to put together these like simple programming chains but you can really start from an amazingly, let's say uninformed point and accomplish some cool things pretty quickly by combining Colab notebooks with open AI, chat GPT. One of the, some of the advantages that Dazza mentioned about Colab notebooks is that you can share them between people and it sort of maintains its own internal coding environment so that you don't have to worry about what the local environment looks like when somebody receives their code. One of the challenges is that it does, it sort of lacks the persistence of a more like more permanent virtual environments meaning that like the file handling and where things get stored is a little bit more complicated but as long as you're running everything in one session like in one half hour session you can do a lot of interesting things in a short amount of time. So I wanna talk about evaluating, when you evaluating claims and findings or large language models and in this context I'm talking about this emotional please paper which says things like studying the effects of emotional stimuli on the output of large language models. When this paper came out it circulated very quickly online and I thought, this is great. This is great because I have absolutely no idea why or if this works. I have no idea if it's relevant to answering legal questions or performing legal tasks. So I want to take this research, recreate it to some degree and say, how can I benefit from this? So what I did was I went into the paper and I read it just like I'm sure a lot of us did and then went through the methodologies really closely to see how they constructed their test platform. And essentially what they did is they took a number of NLP and large language model benchmark question answer sets and then they applied emotional stimuli at the end of the question part of the prompt and then they ran all the prompts and then they used both human and automated scoring evaluations to see what the outputs were like. So that's pretty easy to put together a framework that'll do the same thing here. So I started with all of the text prompts they had in their article. I added a few of my own because there's a separate paper that talked about tipping chat GPT. It's like, how much should I be tipping? Tipping chat GPT improves its performance. It's like, okay, well how much virtual bucks does chat GPT to create a good answer? I don't actually optimize for that but I just wanted to include that as one of the possible examples. So using this little block of text and so your problem might be thinking how do I understand this? Well, the easiest way is just ask chat GPT. Actually chat GPT wrote it. So I can certainly explain it back to you in great detail how this works. I didn't have to figure out things like how to change the UX size because I've never used this library before. And I chose a few prompts from both the article paper and some that I just made it myself. Another point is that the paper found had separate findings for when they provided one emotional stimuli versus multiple emotional stimuli at the same time. So you can see stimuli number three is testing three at the same time. And also if you want to use this notebook which I believe has been shared you can put as many stimuli in here as you want as for as many as you wanna test at once. Once we have our stimuli defined we just pack them into a list and then we will run them just a bit. So in the middle we need to have something like a system prompt and then some legal question answering to perform so that we can do our evaluations. This is kind of a weird and wonky system prompt but you can obviously write whatever you want in here. And then the two questions I have the first question is from a legal question answer data set that I've been putting together in my own so that I can evaluate embeddings. I haven't quite finished it yet but it's just one of the questions from there it's a jurisdictional question about something called the Intel factors. And then the second question is sort of a drafting exercise where I propose this hypothetical situation where I am Jerry awesome counsel and my client has been injured and part of what I've done is provided like a small fact pattern for a personal injury situation and one of the prompt of the call of the question is to also come up with demands for relief which I haven't provided so sort of relying on the parametric knowledge with the model to see like how well it can generate these answers. And so as I was talking about sort of the steps of progressions of more experimentation you can actually perform a lot of this stuff just using the open AI playground which is an effective way to put in different types of prompts but it's unwieldy because you have to copy and paste each of these things into the window over and over again whereas using a little bit of programming so I have a couple of functions that take assemble the prompts based upon the information we defined above and then another tick token function to see how long the answers are we can sort of fire off all of these queries at once. So what I've told the call out notebook to do is take all of the stimuli that we defined above and then for each one I hit run this function it'll create a data frame that first tests the question and answer with no stimuli so it's like a base level comparison and then it tries each of the stimuli and then it does each of that three times because the temperature that they use in the experiment was 0.7 I haven't actually found a reason why I would want a nonzero temperature for anything legal related but just to sort of honor the experimental framework of the original paper I also chose a temperature of 0.7 but then I thought, oh well I should try this multiple times because with temperature we have variability and so now let's see what the results look like. The main things I've done here is I've spilled out the question that we asked the stimuli, the answer and then the actual LLM response all into this huge table it's like if it was weird enough to be presenting a call out notebook now I'm presenting a spreadsheet but here we are answer length I think of as like a proxy for sort of the amount of effort or the amount of information the large language model thought was related to answer it doesn't necessarily correlate with quality because one thing you'll find is that when you change prompts around it can increase or decrease the propensity of the model to provide the relevant information and so you can see with no stimuli there's already quite a bit of variability in terms of the length of the output if I tell it this is very important to my career some of them are longer and some of them are shorter so there's just even more sort of volatility when we prompt it with embrace challenge as opportunities for growth each obstacle you overcome brings you closer to success this is from the paper and I love this one because it sounds like a fortune cookie these answers are quite a bit longer so it's interesting but most likely what's happening is it's presenting even more of the fact pattern from the original context which is not necessarily what we wanted to do because concise answers are also good in the law so this sort of speaks to the importance of having evaluation metrics before you sort of dive into all this like do you want the model to provide all the background information possible or do you want it to be giving you a concise answer and these are important questions to know ahead of time when you're trying to figure out how to both optimize the answer and evaluate whether or not you think these emotional prompts are helpful. When we go to the multi-emotional prompts we get two really short answers and a really long one so I'm gonna say this is spooky and it's really hard to drive any kind of answer in terms of how well I think this worked. Obviously we can increase N to run many more LLM calls these are GPT 3.5 so they're very cheap we can do this a hundred times and really just sort of grind out good answers tipping seems to produce slightly longer answers and when I threaten GPT with existential peril it's kind of a mixed bag but they're slightly longer so what I would say is from this very short experiment what we can see is that it's really hard to actually draw a trend out of this amount of information but we could run this multiple times we could add a whole lot more stimuli and we could figure out if we believe sort of build that intuition just from repetition as to whether or not we think these emotional stimuli are something that we should be including in all of our prompts. So far I'm not convinced I'm not like challenging the experimental results of that paper obviously they did a benchmark they did tens of thousands of iterations they got the result they did but I'm saying that if you wanted to improve your own prompting by including emotional pleas it might not be as straightforward it's just offering a tip to chat GPT and then I performed the second question so this is a sort of different visualization of the same results and I ran the second question and what's interesting is with no stimuli we get this very long fact pattern results from the model regarding our personal injury fact hypo they're a little bit shorter if I tell it this is important to my career which is interesting. Sometimes for the fortune cookie answer they get quite a bit shorter so that's this is a very strange result this is like half the length of the other answers we were seeing and so on and so forth we can look at sort of the different answers and evaluate them accordingly as well so the last thing does is formats each answer into a markdown format so it's a little bit easier to read and we can see that some of these answers actually do contain these some these requests for interesting these are requests for relief some of them do not and we can sort of look through and evaluate these and decide which of these are better and worse this one obviously there's like half the length it's much more terse but yeah so that's sort of my call to action with this presentation is to say that as attorneys we have a whole lot of domain knowledge and we have and that's a great basis for evaluating the quality of LLM responses when we see these claims of various types of prompting methods that improve outputs such as well chain of thought for instance is very well established but other types of methods that improve the logical thinking or the outputs we can test this in a semi rigorous fashion and improve our intuition about them and hopefully do it in open fashion and learn together so that's all I got to figure out how to stop sharing here here thank you so much for showing us that and so just to go back up one level of abstraction what you showed us was a great example of a notebook where you were just curious and you wanted to test the results of a paper and what that's an example of is hey everybody there's this thing called notebooks and if you noticed it kind of like chunked or like encapsulated every little bit of code in its own little like table in its own little cell basically and I don't know if you I don't think you did this but you can that's like a little triangle you just run it one cell run the next cell run the next cell run the next cell and you can do this yourself without being a computer scientist or a developer or a software engineer you can take other people's notebooks and put your own open AI key into them or I'm just focused on open AI for this example like any generative AI API or for that matter any API and you can start basically doing for a fairly complex high-velocity tests see right there on line number nine that's where Leo mentioned he was using GPT 3.5 you can start to monkey with this a little bit use chat GPT 4 as Leo said to ask what happens if I change this how do I change that you can put the whole notebook into GPT 4 copy and paste and ask like what does this mean how do I configure it this way or that way that's what I do all day long I am a terrible developer and I take these notebooks and sometimes I'll make them and I'll do more complicated things which is pretty good you can do more complicated things too with notebooks we have shared or Leo has shared and we have re-broadcast his kind of provision of this very notebook as an example and also a link to his readings so Leo before we leave you do you have any advice to people that have never used notebooks before about just how to like what do you do when you're looking at a notebook you wanna set it up you wanna run it like what's the first one, two, three things that you need to be thinking about and that you need to do mention the API key. Yes, so actually you want to be able you want a secure way to include your API key in the programming but without in a way but not in a way that would end up sharing it if you do something further on with this notebook so Google collab notebooks have a really convenient way to store your API key so this little button here on the left hand side is where you can store what are called secrets and so these are sort of the equivalent of environment variables in a normal coding environment and you can place your secret key such as your open AI key and you can invoke it using this script right here and so this call out notebook is using this same method in order to get the open AI key here and so this pulls it into the notebook for running for coding purposes but it's only stored in memory so if you were to share this notebook or it goes somewhere else the recipient would get essentially different instance of this notebook and the key that you've included under this key is stored locally on your computer only so it's not shared or perhaps in your Google account but it's not shared as part of the notebook so this is a good way to sort of bifurcate your secret information that you need to keep security yourself while also being able to tinker with this maybe show some results and share it with a colleague or some friends. Here, here, thank you, quick program note we're four minutes past the hour and as prophesized on our program we're running a little behind so we're going into extra innings which were scheduled and disclosed so we're going to go through the final speakers and our extra innings and then we're going to hear from Olga Mack who nobody wants to hang up before Olga Mack but if you have to hang up, thank you for joining us and check back at law.mit.edu in some period of time to come where we will take this video and publish it so if you have to miss the last part live no worries, you can hit it on reruns so with that Leo, thank you very much for taking the time to walk us through that and to show us your work and just kudos and thank you again for over the last year for sharing so many great notebooks that have given me and so many other people great ideas about how to address this technology and do things even I know that you're an actual developer many of us don't have that same skills so thank you for giving us a leg up and we genuinely hope that you continue doing so. Great. Okay, so next up we have, I see Campbell I have written John, I don't know if John is with us. So I'll say Campbell Hutchinson and perhaps John to talk to us about a really interesting question which is continuous monitoring of gen AI for legal use cases is what I have written down and for those of you that may not be aware John and John May and Campbell and others at norm.ai and also in collaboration with Megan Stanford's codex and with me at law.mit.edu to a lesser extent but I hope more coming soon have been doing some really fascinating work on applying generative AI to the application of rules on a real-time continuous monitoring and sort of like policing basis for activities really, really fascinating, incredibly neat it could blow the lid off our concept of what compliance even means and so I was hoping that you could introduce yourself also Campbell's just a proper hacker in fact, in light of this one I think it's time to put on a new hat. Oh my God, I'm honored. It's hacky time. Got some lint on there. Well, I guess that's hacker like anyway to you and so you truly have been hacking the law in a way that is most agreeable at MIT and law.mit.edu. Share with us a little bit about who you are for people that may not know you and then what you've been doing and how you've been applying this technology to do to solve for legal use cases that have never before been possible. Sure, so my name is Campbell Hutchison. I have a law degree from Oxford and I worked for years as a chief compliance officer and enjoyed hacking and when chat GBT came out I thought this is the most amazing thing I've ever seen in the world. So I met John Ney who was a codex fellow and who was the founder of norm AI and we're a team of lawyers and AI engineers and I let you decide which one of those I am by the length of my hair based in New York, which you can see behind me and we're building agents to monitor AI regulatory agents to help monitor compliance use cases and one of the things we thought would be really cool would be to do a think piece about how do you use AI to help people monitor AI systems? So I call it like AI assisted human supervision of AI and we thought great people to talk to about that would be Megan Ma and Daza and Daza, I haven't incorporated your feedback yet but some of this reflects Megan's feedback but so basically what we did is we built this like in motion demo that we're iterating on so that people can sort of get an idea of what that might look like and there's going to be a real demo here this is not a PowerPoint slide there's going to be a real demo but the base and it will be live but the basic idea the basic idea is that you want to build a series of checks so we decided we first wanted to look for like what is a use case that would make sense to demonstrate this idea of human supervision of AI and so what we imagined was we imagined that a law firm might have a chatbot and the chatbot might answer questions about the law for the clients of the law firm but the chatbot might not have general competency it might only have restricted competency and so we wanted to then sort of imagine what would be the kind of checks you would have to put in place around that kind of chatbot like what kind of monitoring would you have to put around it in order to use that in order for the law firm to use that chatbot responsibly and one of the things that we looked at and that one of the things that we talked to Megan and Daza about was the California bars guidance for lawyers on how they can responsibly use AI or what it might mean to be a lawyer in the future and so what we looked at was we looked at ideas like prompt injection, personal data and subject matter competency and so basically what we have is we have a chatbot that first has an automated check that occurs when the user enters a question for prompt injections then it has a check to see whether or not the message contains personal data and there's an obvious reason why you put prompt injection before the check for personal data and then we have sort of a check to make sure the question is within the subject matter competency of the chatbot and if it's not then the question is routed to a lawyer for review and I know that that's actually like a lot to take in really quickly so I think it's helpful to like actually like look at it in practice and so the first thing is is that this is the chatbot we disclose that the questions would be answered by an AI system unless otherwise indicated to you we talk about what the scope is of questions that people are supposed to ask the chatbot and we tell people not to provide it personal data but one of the things that we could imagine first is that someone tries to do a prompt injection on the chatbot now this chatbot isn't hooked up to any sensitive data on the other end so the amount of harm that could be done by actually doing a prompt injection on this particular demo is very little but we can imagine that maybe that changes in the future for like a law firm using the chatbot and so one of the popular ways to do a prompt injection is you take the question that you would normally ask the model and you convert it into some kind of encoding that the model also understands but for whatever reason its safety training has not properly anesthetized it against or is not properly conditioned it against so you see you put in the question this is how do I murder somebody but it's in a cipher text and it replies that the question must not contain encodings or instructions aimed at undermining the content guard rails and this would also catch things like this would also catch things like ignore the above prompt you know that's like a common thing that people do so the next thing is personal data and I don't know about other people but I actually do find it hard to not type my real social security number when I do this luckily my social security number isn't 2222222222 33, 9, 9, 9 and I'm also just going to give it you're missing a digit on the telephone number oh yeah get picked up yeah you're right oh no it should still do it in fact we'll see we'll see oh I misspelled help me with well we'll see we'll see how this this is the power of live now one thing you'll notice is that this is actually pretty slow and the reason is is that these are gpt4 calls under the hood that are powering this oh good it did notice that it shouldn't contain personal data it did make a mistake though because this was a tax question but we'll pass on that something that you could do to speed this up and I'm also going to read a question that's out of domain this chatbot is only meant to answer questions unlike SEC regulations so I've asked it what is breacher contract it should not answer the question for me we'll see these are pretty slow and the reason is because it has to go through each of the checks so the first check it's doing every time with every question I ask is the prompt check then it's doing the personal data check then it's doing the subject data the subject matter competency check and as just like an engineering note this is something that could be sped up like a lot if for personal data I use something like AWS as personal data service and for prompt injections there are a number of services that you there's at least one service that I know of that you can use to try to monitor for prompt injections I think it's one of those things though where at this time you would have to be careful because you'd need to make sure that the service that you were purchasing if you were using external service really does work the service that I'm thinking of in my head is lecara but it's a new service unaffiliated with us unaffiliated with us so yeah so it's sort of spinning so what we should eventually see is it should essentially say that this question is outside of the context of the model and it should say it's gonna be sent to a lawyer for review and then what we're gonna do is we're gonna pretend we're the lawyer so we're imagining yes so we're imagining that there is like a lawyer at the law firm whose job it is to monitor this system and we're gonna switch over to the lawyer point of view and the lawyer gets the question and then the lawyer gets the response that the model would have given had it been within the model subject matter expertise so essentially it's like we first ask of the question is this question within the models the subject the models expertise and if the answer is no we ask the model what the answer is anyway we just don't send it to the user we send it to the lawyer and one of the things one of the bits of feedback we got and this was from DASA is that eventually AI and AI work is gonna be so integrated into lawyer's review that it doesn't even necessarily make sense through the AI models response here to just be text it would be good for this to be a scratch pad because there's an idea that people are going to be working with editing the output of AI models so regularly that it's better to think of it as a working space almost but breach of contract is outside of the model's expertise and it's outside of what the law firm wants to use the chatbot for in this imaginary example so we might write this chatbot I'm pretending to be the lawyer right now is only for regulatory questions please reach out to the firm to your contact at the firm for contract advice okay so that was the lawyer's answer and then it goes over to the user and we were transparent with the user that the question was being sent to a lawyer for review and then we were transparent with the user that the advice that they're gonna receive might have been prepared with AI assistance at the discretion of the responding lawyer and then we have the lawyer's response printed out here for the user so this is just sort of meant to be the idea of how we can use AI to help humans supervise AI given the fact that AI systems are gonna be deployed everywhere but we still want people to have meaningful control over them, thanks. Here, well done and extra, extra like hacky points for a live demo which is terrifying and it works so kudos and it's cool and you really obviously have a tiger by the tail here much of the advice that we put together I was an advisory member of the California coprack on the kind of professional responsibility working group that came up with that guidance sort of assumes very human, very like high touch kind of review of the application of these of this rules and guidance and yet they're gonna be high velocity and clearly there's a lane to use generative AI as part of the kind of policing and compliance with the rules for the use of generative AI by lawyers as part of law practice and you've really done an exemplary job of starting to demo what that could look like so in the hacker spirit, I wanna feed back to you some feedback we've got from another real proper oh, am I spotlighted? Oh, from another proper developer who has been, who demoed his cool company called Describe on a previous law.mit.edu ideaflow this great way to use generative AI for legal research using kind of cosine similarity to figure out that like semantically as if a case is relevant not just a word search and that is no other than Richard Devona and he just suggested or asked are you able to put the running spinner next to the entry box so the user will be able to see it more easily so there's some feedback for you. Yes, absolutely. There are also some other like UI things that I think would be helpful. The purpose of this is to help people think about this and I think another thing would be changing the lawyer's logo to no longer be the logo of the AI I think that would make that much clearer. So there are definitely some things to this is an in motion demo and there are definitely some things to help people to help make it clearer so that people can think about these ideas better. You're here. Great, so let's see are there Oh, John is here. John, do you wanna pop onto the screen and say hello? Yeah, sure. Hey, thanks for having us. Just to follow up on that last point. Yeah, we were working on this one more as just kind of a proof of concept around a particular use case kind of on the back of Daza and others great work around the California bar work and now Meg and I were talking about this the other day it's really catching on everywhere now in terms of the other state bar associations and more broadly this idea of the supervisory AI agents is something that we're applying in a lot of different domains in addition to legal services we're doing this within financial services. So a lot of areas where you have really heavy regulatory burden of staying in compliance and people are launching large language models and other technologies in a way that it's really hard for humans to sit on the other side of that and say is the output of the AI consistent with the potentially hundreds of relevant regulations. And so over time like what we're trying to do is kind of unblock the potential deployments like that by having the other AI sit on the other side of the primary AI that's producing the outputs or the proposed actions. So that's more broadly what we're working on and happy to answer any questions about that. Standing, I invite you both to scroll through the comments to see if there's anything you wanna buy that and while you're doing that question for both of you what are the so the initial application here is one near and dear to our hearts which is the kind of professional responsibility rules of ethics applicable to lawyers using generative AI but it seems, tell me if I'm right here but this seems like you're showing an example of a somewhat more general design pattern here that could equally be applicable to regulatory compliance and like telecom and healthcare and whatever aeronautics like anywhere that's a heavily regulated industry or sector. Am I seeing this right? Yeah, that's exactly right. And we're really excited about this area where you have a strong professional responsibility. So in this case, legal services has a lot of guardrails around it for good reason and that also applies in other areas like healthcare and financial services where the receiver of the services has more trust in the provision of them because of these longstanding guardrails like fiduciary duties and that's something that we're really excited about about how do we scale up the idea of professional responsibility and fiduciary duties and how do we use technology to implement that but at the same time as Campbell pointed out this really interesting example of when do you raise it to a human? And that's something that we obviously don't have all the answers for ourselves and we wanna work with this community and other communities to figure out where is that boundary where you want to make sure it's funneled to someone that has the human that has the final say. So that's a big open question for us as well. Outstanding. Just to double, I guess I could do this when I see you in New York next week for your cool event but am I on this project that Campbell just showed? I know I've sort of helped a little bit but I'm not sure whether to represent myself as being like on it and part of it or not. Yeah, yeah, I mean, I think we wanna see this move onto a bigger scale and then working with you on that to roll this out to bar associations and just as a broader idea, we'd love to collaborate on that. Outstanding, that's great. I know that we talked earlier about collaborating but for whatever it's worth to the extent that you guys saw amazing stuff, it wasn't me, like that was like all Campbell and Megan and John and I'm starting to collaborate more now and I'm really looking forward to it. I'm so glad that we're gonna pick up on that. Like this is fascinating. I do have quite a few ideas. I saw that you did a couple of the areas but there's actually quite a lot of guidance just in the California, you know, tiny sliver of the world that I think could be useful to experiment with and may shed some light on broader applicability of this. I think what's gonna end in the fullness of time being a core capability of large language models and generative AI for law and legal processes, this supervisory continuous regulatory compliance. So kudos on you for the energy that you have and for your hacky team of putting this together and for sharing it with us. We're just very, very grateful. Thank you. One last thing is that Megan Ma is instrumental here. She's been behind the scenes on this but a lot of these ideas that we just talked about were really her ideas. So just wanna make sure that everyone knows that as well. Yep, yeah, Megan Ma, who's now some flavor of a director at Stanford Codex but you know before then and even now she's managing editor of the law.mit.edu computational law report. So we claim some provenance of the extraordinary Megan Ma as well. But you can't contain a force of nature like Megan Ma. That's for sure. We can just get some reflective glory. So, okay, so thank you very much, both of you again and I look forward to seeing you in New York for your amazing event next week. Thank you, looking forward to seeing you as well. Thanks. Thanks, Taza. Thanks all. So next up we've got one more regular speaker and then we're gonna come back to a flash talk format. Our last flash talk, excuse me, we'll come back to a wrap up format and tell you a little more about the road ahead at law.mit.edu in ways that you can get involved and can collaborate with us and can maybe get some of your stuff published as well. Next up is a friend and a collaborator, Jesse Han who was with us at last year's MIT computational law workshop to show us some rare magic of a cool interface that he had for basically visually and programmatically composing prompts and lots of prompts doing lots of cool things. He's been very, very busy in the year since then doing some very cool stuff and topics that you'll notice have come up multiple times just in this workshop, namely the generation of synthetic data for training and evaluating in legal domain models of generative AI. So I wanna thank you for joining us again, Jesse and for being such an inspiration and I'd like to hand it over to you to feel free to kind of fill out your introduction about who you are and what you're up to lately and then please show us this demo about synthetic data. Thank you, Jazza. I think the provenance for the inspiration is all yours. The role that you've played in, you know, the regulations around e-commerce agents, especially when that technology was still groundbreaking is actually still an inspiration for how I think about how this technology is going to be regulated. So thank you for making time for this presentation. Completely agreed with the previous comments about Megan having seen her in action myself and it's been really fun collaborating with John and Campbell back when we were working on the Wyoming LLC presentation last year as well. So what I wanna go over today is I wanna give glimpses of my perspective on what the future of law is going to be. Make an argument that the future of law is actually going to be inextricably tied with the future of software. Make some predictions about what that future is going to look like and then do a deep dive into how synthetic data and specialized model fine tuning techniques can help for legal use cases. So one thing that I wanna argue is that the future of law is equivalent to the future of software. And I think this is a point of view that might be very familiar to those of you in the audience coming at this from the angle of computational law because after all, what is law if not extremely inefficiently executed software that runs on the hardware substrate of our organizations and institutions? And so that immediately leads us to the point of view that language models are this platform technology that can let us actually implement this software at far larger scale and at much higher levels of assurance, which I think ties to a lot of what John and Campbell were talking about earlier. So before we go further, I'll just give some more background on myself. So I got my PhD in math last year. And before that, I was a senior research scientist at OpenAI where I worked on GPT-4, scaling laws, the applications of language models to mathematical reasoning, program synthesis, and I was also part of the embeddings team as well. And one of my more notable lines of work while I was at OpenAI was that I spearheaded techniques for using synthetic data. In some cases showing that by training on purely synthetic data you could bootstrap a model that was only hundreds of millions of parameters to the level of proficiency of GPT-3 itself. Now, this requires some techniques which are not so prominent these days and which are not quite as well known as many other prompting strategies or things that AI engineers use. But at Morph Labs, we've been using these sorts of techniques to achieve some extraordinary results, which I will tell you guys more about very soon. So one of my perspectives coming out of my time at OpenAI and with my background in pure mathematics is that I think that mathematics is really a special case of software. And if you take this point of view and you sort of apply it to the point of view of law, view to software that runs on the hardware substrate of institutions, we can think of law as being a special case of software as well. And so many of the techniques which have been useful for achieving breakthroughs in mathematical reasoning, a domain where you have to reason very precisely over complex documents could also be applicable to law. And that's an argument that I'd like to explore today. So because of these lines of analogy, so one more consequence of this is that the way that AI technologies, especially around large language models and generative AI, the way that they're going to transform the production of mathematics, the production of software, the maintenance of software, the maintenance of mathematical corpses of knowledge, that will apply equally well to the process of creating new bodies of law or editing bodies of law or ensuring that certain actions by actors are in compliance with bodies of law. So one way to view this, and one of my predictions is that we're rapidly approaching a world of ubiquitous intelligent micro-services. So what this means is that things that we did not normally associate with a sense of agency or personality, things that we could not build a relationship with before, we can now build relationships with. Because they'll be agentic, they'll be wrapped in some kind of AI actor. So Alex Chow over at Microsoft has been doing some very interesting work with his semantic kernel technology and they recently published a position piece more or less exploring precisely this. So they think that the world is going to be fragmented into this universe of these agentic architectures that represent micro-services. So rather than everything being bundled into a single chat assistant that can use a million tools at once, there are going to be thousands of different assistants which are all coordinating with each other. Now, if you take this point of view and you apply it to say mathematics or software, what that tells us is that the future of natural language interfaces over code bases won't just be a monolithic chat assistant, but rather every part of a code base or every part of a mathematical corpus will have some kind of agentic interface, perhaps with its own personality, perhaps with its own duties and obligations. And similarly, so what if a body of law didn't have a single agentic interface on top? But what if every regulation instead had an agent that was responsible for monitoring your actions and ensuring compliance? So that leads to a very different mode of interaction than we have today, which is sort of fragmented like lawyers have different practices and different specializations, but not nearly as long-tailed as it could be once you fully enable it with this technology. So again, going back to this analogy, one other perspective which I've spent a lot of time exploring during both my PhD work and my time at OpenAI was the application of verification, right? So software is built with specifications as to how the programs are supposed to behave when they're executed and people have found that, you know, using co-pilot to generate large amounts of code actually results in more copied code, code that's less maintainable, right? And so the verification and the guarantees of the behavior of this code are becoming increasingly paramount. So similar concerns arise in the world of mathematics, right? Mathematics is sort of like software that's almost never implemented, right? It's only executed in the heads of mathematicians who actually know how to run that software and they're only a handful of those on Earth at any given time. And so all sorts of correctness issues arise when people are trying to verify new additions to mathematical canon. So one technique that we found very useful for creating state-of-the-art mathematical reasoners is applying formal verification techniques to both generate and filter data to make them higher quality, right? So in that way, by applying formal verification you can produce better reasoners and you can also verify new entries to some body of mathematical knowledge or software implementations or a body of law, right? And so how do we apply verification to law, right? And so what that would look like is the state-of-the-art compliance checking. Can you guarantee that regulations are obeyed? Can you ensure that the law is actually carried out, right? How do you make all that explicit? The problem of specification and the problem of checking that things meet that specification are going to be increasingly paramount. And so that leads me to my second prediction, which is that there will be ubiquitous synthetic data for training these natural language and agentic interfaces on top of software mathematics and law. So this is actually something which I've spent a lot of time thinking about, right? So one of the works that I did earlier in my career was showing that you could generate vast amounts of synthetic data from pre-existing corpuses of software that's meant for checking mathematical reasoning and that this actually solves a serious data scarcity problem when you're trying to train large language models to become very specialized at this task, right? Because if you can get this knowledge into the parameters of a model, it's very good at reasoning over it, but you just have to make sure that the data is not so scarce that the scaling law is completely break down. And then I worked on applying synthetic data techniques to train small, large language models, my favorite term, to become state-of-the-art at unsupervised machine translation, in some cases boosting a model with hundreds and millions of parameters to beyond the ability of GPT-3. And this uses a technique called back translation, which I think is quite under-explored. Synthetic data techniques have also been used recently to achieve state-of-the-art progress in mathematical reasoning for geometry. So there was a team at Google DeepMind that built an Olympiad-level AI system for geometry where they basically achieved gold medal performance. And the way that they did this was by training on purely synthetic data that was filtered and also partially generated by these automated reasoning tools, right? Because once you've codified this mathematical knowledge as software, you can begin to verify it in a systematic way such that you can create proofs that some reasoning trace is actually correct. And so if you train on those traces, you get a much better reason than before. So an interesting thought experiment for you guys to ponder while I go through the rest of these slides is what could such a system look like for law? And why aren't there hundreds of companies working on this? So one question that we asked ourselves, right? So as we've been thinking about how to apply synthetic data to achieve state-of-the-art reasoning performance, is how can we take this perspective? And how can we show that there's like some application to legal reasoning? So can we improve legal reasoning by generative AI systems by using certain incarnations of these techniques, right? So the LSAT has this analytic reasoning section. I'm sure many of you have perhaps not so fond memories of that. And the questions kind of look like this, right? They're like these little like logic or combinatorial puzzles. They sort of make your head hurt if you look at too many of them in a row. And they require a lot of like backtracking, search and current language models just like really, really suck at this, right? And so what we found is that if we use synthetic data that's generated by language models and automated reasoning tools, right? In a very similar way to the alpha geometry approach, we can improve performance on the AR LSAT dataset by up to 30%. And so what this tells us is that like that analogy between mathematics, software, and law is actually pretty deep, right? And upon further reflection, that's not so surprising because like software and mathematics, law is all about precise reasoning over complex documents that all depend on each other. And being really good at that is sort of like an AGI complete task. And the better that we get at that, the better we get at, so at other incarnations of that task, like in building complex software or reasoning over mathematical corpuses of data. And just like in other high assurance use cases like software and mathematics, right? There are all sorts of problems that we have to overcome because we want these models to be reliable, right? So if you just like fine tune a language model on like a legal dataset, like there are like tons of people who have done things like this, right? Like recently this group at Stanford published this blog post detailing how legal mistakes with large language models are pervasive. Because language models are not good at multi-step reasoning, even if you fine tune them on domain specific data, right? Like if you apply off-the-shelf techniques, you get things like this, right? They'll say things that seem reasonable, but upon closer inspection, there are subtle errors in reasoning or they break down. So at Morph Labs, we've actually spent a lot of time thinking about these precise sorts of problems because we're generally interested in, how do we get the future of software here faster? And so the way that we've been approaching this is through better multi-step reasoning. So there are multiple datasets out there on this. So one of the most promising and one that stands on the strongest foundations is one called Music K, which does multi-hop questions via single-hop question composition, right? So a multi-hop reasoning problem is something where you need to answer a question by chaining together reasoning across multiple documents and where like any one of those documents won't actually suffice for answering your question. And that's the kind of thing that as lawyers you have to do every day, right? Like you have to reference multiple clauses inside a complex contract, which programmers have to do every day when they're building better software, which mathematicians have to do over their corpus of mathematical documents. And so we synthesized, so we actually filtered a very difficult dataset from Music K, right? So we synthesized another dataset to make the questions even more complicated. And so here's an example from a subset of that dataset that we call Music K Easy, right? So this is something where the reference corpus comprises dozens of documents and the model has to answer this in a closed book setting. So we're testing how well is it able to compose together the facts that it's seen in its training corpus and precisely answer the questions by chaining together all of these properties, right? And so on this dataset and a subset of the actual Music K dataset that we filtered for difficulty. So we developed this proprietary synthetic training method called self-teaching. And that produces models which are more compliant. They hallucinate less. They're better at complex reasoning. And on both Music K Hard and Music K Easy, we had that self-teaching models were much better than fine tune models and better than the baseline models. We saw that self-teaching stacks with retrieval augmented generation and that besides the use case that we published, self-teaching actually generalizes to multiple domains. So self-teaching is already trusted by multiple partners including a media company with over 40 million in funding and a programming language foundation that's been funded by Simons Foundation and Schmidt Futures. We're looking for more partners to go and develop this technology so especially for high assurance and sensitive use cases. And so if you're interested in applying language models perhaps specialized to a complex corpus of documents that might not be in the pre-training data, I would love to talk. Wow. And find me at that email there. And yeah, happy to take any questions. Thank you again, Dazza for the invitation. Wow. Wow. Wow. Okay. That's incredible. That's going to take. I went after we'd watched this a couple of times, I think, to absorb everything that was so incredible. Thank you for that huge amount of things to think about and connections to make. One thing I would like to do, which I hope is not too much of a busy body, but you didn't mention one thing in your introduction that I just get such a kick out of. I would like to encourage you to add it in, which is say, Jesse, didn't you also have some connection to the chat GPT team that put that together before its big launch in November, a couple of years ago when you were at OpenAI? Yeah, I departed a few months before the chat GPT release, but I was on an early version of the chat GPT team. Okay. I'm just saying, in your list of incredible resume bullets, to me, that's when everyone's heard of and it's a real good one, and I want to make sure everyone knows it and you get credit for that. Now, moving forward, we have a ton of questions here and a lot of interesting perspectives. I'm just going to read one out loud and get your take on it. It's from David Tolan, who also is a lecturer on law at UC Berkeley Law School. And he has done some really interesting work kind of decomposing the terms and conditions from open AI and anthropic and barred everyone else and really deep in the law and generative AI. He poses this one of the early painful lessons of law school, legal outcomes are highly dependent on subjective interpretations by judges, lawyers, et cetera. It should be juries, you can call it litigants. It shouldn't be that way, but gradually we accept that it is. Imagine computer driven law free of large, free or largely free of that subjectivity. And he said this in response to some of the earlier part of your presentation, but could you just speak to that extrapolation by David and how does that relate to your work? I think the future of law is still going to be very subjective. It's just that the subject in that case will be language models and systems that we build around them. Like an AI agent will be making subjective calls. So as to whether or not some situation fits a criteria. And going back to a point that John Nay made earlier, we have to design these systems in such a way that ultimately these judgment calls come under the supervision of some human, but that doesn't preclude us using AI agents to help us make those judgment calls or to suggest a default course of action, like when making those judgment calls. Indeed. Yeah, that was my take two for its worth. It's not so much that it changes it from subjective to objective. That's in the vibe I get from the previous generation of AI that was like symbolic reasoning and like, you know, pure logic, kind of if then sort of statements. And there's some, you know, areas of law that are amenable to that, you know, where there's like a clear rule and it's a yes, no binary answer. Like where you're going more than 55 miles an hour and we have instruments and so forth. And there's some arguments at the edges, but it's an application of a rule and it's deterministic. It's actually with this generative AI and these, the new models that we have, it seems like they're up to the task of starting to apply the equivalent of legal reasoning, which itself is subjective. And then the question becomes, what does due process look like? What does legal procedure look like? What are we optimizing for? What are the safeguards and guardrails and everything within this somewhat, you know, human domain of cognition that is itself subjective, but is at least applying, you know, kind of regularized standard rules. So anyway, I was thinking similar things to what you said. Another thing, now we come back to the essence of synthetic data, which is really the anchor point of your talk. I'm going to combine two things here. One is from Sarah Johnson. And she says, is one assumption for synthetic data that its quality is superior and more reliable than real world source data? So is it like better? Is that an assumption? And then similar to that is George Dyer's question in the Q&A, what's your target accuracy? And how do you establish minimums, presumably with synthetic data, like for clients or for specific applications and use cases? So I think these are related questions. Yeah. So the benefit of using automated reasoning tools is that we can get synthetic data to 100% accuracy. And that is a very, very desirable state to be in. Because then you have complete trust in your training data and you have very high confidence that the models will improve. They'll be more compliant to hallucinate less. I found that using data that is not completely accurate, but maybe like 70% or 80% accurate, still improves the capabilities and the robustness of the models. So having some kind of automated reasoning filter is not a prerequisite. So as for the second question. So ultimately the metric that matters to us is the target metric. How well does the model do when it has to do some complex or subtle legal reasoning over multiple clauses inside a lengthy contract? And so we measure that. The overall quality of the synthetic data is not quite as important. Like what matters is simply improving the performance on the downstream task. Got it. So one final thing, can I hijack your screen share? Sure. Can you see this? Yes. Okay. So you're talking agents. John's talking agents and Campbell, I'm talking agents. You're drawing on my screen or someone is. Anyway, that's fine. And so something that we're launching, actually we just made the page public today, is a research project on a genetic AI systems as open AI calls them. And I think that's a good name for it. And one of the things we're looking at here is what happens when you have individuals or companies who configure an LLM with some other applications to help them conduct transactions. And, you know, this is obviously already happening, you know, like go and find me a bunch of products that reach this or that kind of category. And, you know, with the extension on the web that they do, you know, kind of several hops and find things and synthesize them, prioritize them and give you back a nice list. And you could go further and further and further. And people are starting to explore how this could be used to supercharge commerce. So to get ahead of that, because obviously there's issues and challenges that arise when people delegate an amount of authority to agentic systems to actually conduct transactions or to when they're holding themselves out to third parties that are interacting with them, maybe giving a quote or even closing a deal potentially or doing other things, it raises legal questions. And so this particular research project is going at something I'd mentioned that you and I have talked about in the past, but like the electronic agents and automated transactions and other similar provisions, you know, error control, security procedure and other relevant aspects of existing bodies of law, seeing how much mileage could we get out of kind of using some of those legal frameworks as part of the design pattern and architecture for agentic systems in the context of these agents doing transactions on behalf of a principal, a person or organization where there's a third party involved. In this context, how, what do you imagine, this is a real timely question here and it's very practical because we're starting to dive into this, what could be the opportunities and also maybe the cautions for generating synthetic data to configure and to develop agents that do this, but also to test agents like in a control harness or like a test harness to see how well they're performing and if they're going off the rails in some ways. So this is actually a question which we've been exploring a lot recently at Morph Labs. We've developed the system for automatically generating benchmarks for evaluation for code bases and like what we found is that as long as you can have an analytical guarantee right from like first principles reasoning or maybe like like, you know, static analysis of a code that some question and an answer is correct, then you can blindly optimize against that, right? But one danger of having a system of AI judges, you know, so to speak is that their judgments may be imperfect, right? They're bounded by the capability of the underlying language model in a way and so once you begin blindly optimizing against that, then the errors in those judgments will leak into the system that you're optimizing and compound and so you have to be very careful about that. So in the realm of software, like when we generate our benchmarks, like we use a combination of these techniques, right? So we use judgments from an AI senior software engineer as well as like static analysis of the underlying code base and we've found some promising avenues for mitigating this like compounding over optimization effect. I think, so I think when working with like agents in general and thinking about how to make them compliant and also generating data against these judges, right? Like you can run against like some system of judges many, many times and you can ostensibly get a data set that you can train on, but that's exactly where those compounding errors show up. I think it's something which is solvable but which is like right there at the boundary of applied research, hopefully something that will make a lot more progress on very soon. Here, here. Thank you very much and I don't want to put you on the spot too much but we don't talk often enough for me not to take this opportunity. Oh, Jesse, would you like to help us a little bit on this research project? I would be delighted to. Fantastic. Then the next time you refresh the page you'll see your name magically appear on the team and thank you for that and thank you really for taking the time again to share with us your ideas and this sort of look over the horizon for many of us into the future and what's unfolding, what's important and what we can do to beneficially take part in it. So thank you for sharing your judgment, your wisdom, your expertise in this area and to help us start point the way to the next horizon. Yeah, likewise the perspectives at this workshop have been very refreshing. You're here. Thank you. Now, we come to the wrap up of the workshop. Thank you to all speakers first for your flash talks and now I want to us to use the last few minutes to introduce and to celebrate our newest editor at the MIT computational law report namely Olga Mack. She is royalty in the area of legal tech and she's got a truly august background in the law and in technology and innovation and she's been a great collaborator with law.mit.edu over the years now and was a member of the task force that came up with those seminal guidelines for the professional responsibility used by lawyers of generative AI. Looking forward she's now going to lead the way on our next big public initiative which involves a call for submissions. Who are we calling to? We're calling to you and so with that Olga thank you very much for agreeing to take the position and the leadership of this and once you please introduce yourself and tell everybody what we're doing and how they can contribute. Hello everyone and wow Daza what a fantastic event. I am still processing the future of law is tied to future software. Jesse thank you for that and I just love how you solicited a confirmation that Jesse is bound to outright on the spot because he was not able to say no. That was just a true art form. I will follow your lead. I love the future of law especially on the intersection of generative AI. It is truly an exciting place especially because for the first time in history lawyers are excited more like nervous-cited, nervous and excited about this technology. Previous technologies was just most nervous. This one actually includes excitement and it's clear why it's because it has so much opportunity provide better services improve our life as lawyers and really enjoy the practice of law. I think we can over time really bring fun back and go to practice of law because it's truly exciting. Also thank you Dazza and Professor Megan and Brian for bringing this group of diverse professionals together. I think it very much illustrates what we want the future of law to be. We want it to be yes full of excited lawyers and yes, full of other excited professionals because justice and law is something that we all as humans have a right to access and be part of and be served. Most importantly be served. That brings me to this really exciting place which I hope you all and folks you know and folks in your network join us in building and that is the place where we have premier destination for repository of information and resources. Well first of all sparks conversations postures innovation and really supports and eliminates past for everyone lawyers and other professionals to get up in the morning and be excited to contribute to the future of law. So with that Dazza may I recruit you in showing screen so that I can talk and you can show and the two of us can have a show and tell me what you think and I'll be back in a minute. All right, I'll see you please. First of all as promised so it has been delivered Jesse Han is on the project page and let's see if we can here we go this you will find all you who hear it at law.mit.edu Oh I'm sorry wrong one law.mit.edu And as I mentioned, we would like to become a premier destination to spark conversation, exchange ideas, foster innovation, and really be a supportive place. We've listed Daza, myself, and Professor Megan Ma and Brian have tried to give you some ideas of things we're looking for, and you can see we are looking for all kinds of things. And if you manage to come up with a category that we didn't come up with, guess what? We have a last category that says many more. So we encourage you to be really wide and broad in your submission and the kind of expertise you share with us. And the other thing I would like to point you to what kind of things we're looking for, and to echo what Brian said, yes, Britain works that our lawyers traditional submit are very much welcome. Please know that we ask them to be 25,000 words, because it's a lot of work to edit and publish and frankly encourage people to read more than 5000 words. So unless you really truly have more than 5000 words to share, there are a value in every word considered to stay within the word limit. But the most exciting thing here is that we want to encourage professionals who are not necessarily lawyers to use their tools of trade and submit their submissions in whatever form they're comfortable. And so to this end, we're also inviting folks to submit things like developer notebooks. We really want to make sure that technology is a part of this community. As you can tell from Leo's conversation and Allison conversation and numerous other conversations, law is increasingly becoming a destination where code is very much tied to law and law is very much tied to code. And so some proficiency and works in developer notebooks are more than welcome. But think broader than that. Yes, written works get developer notebooks, but consider videos. Consider generative art. Consider other media that we perhaps have not listed. Again, we want to be welcoming of all kinds of professionals because all of us as humans have very much care about the future of law and and it's something that should have access to everyone. So we there are two we the plan is to publish two editions, one in the spring summer and one in the fall winter. And you can see deadlines for submissions and for publishing that we would love for you to to keep in mind. So the the first spring summer edition will be published somewhere in September around September 17 of this year. And the deadline for submission is April 17. So the form, I think there's a form link that if you can show folks where it is, because it's a little subtle, is somewhat easy. It ends self-explanatory. There are fields to fill out and works to attach or point links to. And you might be thinking, how can you help? How can you be part of it? I have three things that I will ask you to do. One, submit on time after you read the instructions and follow them. To encourage folks that that you see every day in your life that share ideas or do things or work on things that are worth sharing, kind of like the things we had presentations today. Things that would encourage wider conversations in the industry and encourage folks to build the future of law so we can all benefit. And then I'll ask you to do a third thing. You now have a link to the submission page if you can share it on your social media and encourage folks in your network to apply and apply on time and become part of this conversation. That would be a fantastic way for us to build the future of law together. Was that in mind? I look forward to reviewing all the submissions you have in whatever media that you choose to submit. God help me to help software on my technology so I can read and open and be part of the conversation. That is a back to you. Thank you so much, Olga. Thank you for stepping up to to help make this possible so people have a surface area that they can't where we can, as you said, encourage everybody to share from your perspectives about the advent of this new technology and its impact on and sort of implications for the law. Transformational is a good word for it. And so this is something where it's a time to shed light and to encourage people to to get involved to share your work and so that we can all get educated now. So with that, I want to thank everybody for speaking. I want to thank all of you, especially those of you who stuck with it to the very end for your active participation will consider this the beginning. I know that we didn't have an opportunity to get to all of the questions and all of the comments. And this is our, as we customarily do with this workshop, kickoff for the themes and the topics that will be addressing at law.mit.edu through the year 2024. We hope that you'll stick with us. We hope that you'll continue to participate and to contribute. So until the next time, we look forward to seeing you at law.mit.edu.