 OK, welcome, welcome everybody. Hi, I'm Dazza Greenwood from MIT Media Lab and also executive director of law.mit.edu, which is the convener of today's workshop, the eighth annual MIT Computational Law Workshop. I just want to start by saying, having done these things since actually the late 90s at MIT on this topic of law and technology, I honestly believe this is the best program yet. And that's owed largely because of what it is we're talking about and the speakers that we have to help elucidate what has been a breakthrough with widely accessible generative artificial intelligence and its applications for law and its impact on law and legal processes. So put your seatbelts on. This one is going to be a doozy. So with that, let's get right into it, shall we? Hi, everyone. So we are so thrilled to see the incredible turnout and excited that so many of you are deeply engaged in this topic. We'll hear the thought-provoking discussions from professors Dan Katz and Michael Bomarito, as well as the exciting work that Dr. Jesse Han and his team at Multitech are doing. It seems we are at the cusp of serious advancements in human-machine collaboration. If we are to consider, then, the roles and subsequent possible use cases for generative AI, how could we see machines as partners in our legal processes and practice? As we hear about the experiments in ways in which generative AI are finding integration and accelerating our everyday workflows, how do we appropriately account for and mitigate risks and harms of use? On the other hand, how could we be evolving our skill sets to not only enable more efficient practice, but also unlock more creative and critical capabilities? If we could move ahead a couple of slides. Yes, sorry, stop at the human. Next slide. I frequently think about the Human Diagnosis Project. This is a worldwide effort created and led by the global medical community to build an open intelligence system that maps the steps to help any patient around the world, in effect, a crowdsource consult. For those of you unfamiliar with this reference, consults are typically a term used to describe conferring with multiple doctors at once for their opinion on whether or not this may be, indeed, the right diagnosis or treatment plan. The Human Diagnosis Project mirrors this process. But as opposed to a single consult, their tool enables multiple simultaneous consults in a matter of minutes and verified by knowledge source from medical experts at the world's leading institutions. Interestingly, the key driver behind the technology is fundamentally the steep collaboration between human and machine. That is, the success of the project is owed to contributions of human expertise to continuously refine the tool's competencies. Widespread positive testimonials from users have shared how the system improves their diagnostic reasoning, not only allowing them to produce differential diagnoses more rapidly, but their ability to think more critically and across highly disparate cases. Evidently, I share this narrative by means of illustrating that we may find inspiration from the Human Diagnosis Project at the advent of generative AI. Perhaps many of you in the audience can agree medical and legal do share a few similarities, in particular, that knowledge management plays a monumental role in the success of the practice. A direct correlation in this specific sense is the idea that we form legal diagnoses, whether it be the act of redlining and contractual review, argument development in case of termination, and discovery. The notions of issues spotting, fact-finding, risk analysis altogether contribute to a diagnosis. A key difference, of course, is the importance of language as a core element of the field. Next slide. And so at the wake of GPT-4, I have been reflecting on what it means to have a conversation with machines. More importantly, what can we learn from human to human communication that can be applied to human machine communication? Linguists have been reflecting on notions of communicated meaning through the lens of pragmatics. Pragmatics is largely regarded as extra linguistic considerations relevant to conversational appropriateness. What is meant may be inferred by what is said on the basis of principles such as cooperation, informativeness, and relevance. Next slide. And so the introduction of cognitive pragmatics or a cognitive system view disrupted the broader field of pragmatics by considering the mental inputs and outputs of communication. Cognitive pragmatics is interested in the structure of dialogue derived from a shared knowledge of an action plan. Next slide. Bruno Bara, a renowned scholar in the field, describes how cognitive pragmatics manifest through conversation games. He defines a conversation game as a set of tasks that each participant must fulfill. In short, this translates to party A produces an utterance. Party B builds a representation of its meaning. The hope is that this representation is a reconstruction of party A's communicative intent. As discussed, conversation games are intended to be communal, a simultaneous effort to build together. It predicates on some form of a mutual shared premise. In an ideal game, the speaker can predict how the receiver will reconstruct the meaning of the utterance and the receiver comprehends the speaker and is in fact capable of reconstructing its meaning. However, a key element to conversation games is that the receiver will always react and respond to the speaker, even if the receiver does not necessarily understand them. Accordingly, a conversation game will continuously reset until a congruent representation of meaning is achieved. A conversation game can then be highly ineffective if no shared understanding ever exists or can be reached. In order to mitigate issues of interpretation, the idea is to create a collective belief and to use utterances that are elocutionary acts. Elocutionary acts are a term put forth by J.L. Austin, a philosopher of language to describe words that express what is done and to be done, so actions. Some examples include assertive, interrogative and directive statements. In legal context, elocutionary acts are no stranger as the notion of lawmaking frequently relies on the use of directives. Next slide. So why does this matter? There is a powerful analogy to be made between conversation games and how to speak with machines, otherwise engaging with large language models for dialogue. Folks in the audience are likely already familiar, but one of the significant steps that has led to the release of chatGPT is owed to its predecessor, InstructGPT. InstructGPT applied reinforcement learning to fine-tune GPT-3 to better understand written instruction. Its ability to respond to user instruction, learn from human feedback, enabled progress in the contextual richness of its outputs, though still far from perfect, a much closer alignment to human intention. Similar to the conversation game, the fine-tuning of GPT-3 to human instruction can be regarded as a parallel to the active use of illocutionary acts to mitigate misinterpretation in human conversation. Therefore, it is no coincidence or surprise that when speaking with machines, we have been perfecting the art of illocutionary acts, namely directives or instructions. Moreover, with the onset of increasingly powerful generative AI models, came a rising interest in prompt engineering. This is seen in the development of publicly available prompts to test and experiment with various competencies with chat GPT, such as these browser plugins that help discover, share, and import prompts while using the tool. One of the clear patterns that have emerged in prompt engineering is representational nature and the use of embodiment. Numerous prompts begin with act as or pretend to be. Other prompts come in the form of specific requests. In either scenario, we see behaviors that are highly performative and illocutionary. So returning to our initial ask of the workshop, what use cases do we see for generative AI? The patterns with which prompt engineering have emerged suggests that legal tests most amenable involve those that are related to execution, a first cut, a first draft, generating existing boilerplate. Next slide. Yet more complex raising tasks, such as issue spotting, enabling creative, multi-perspective construction of arguments, collating and inferring meaning at scale, full-short if we are limited to the use of illocutionary acts. We will require additional fine-tuning beyond user instruction, but to user negotiation, user critique, user perception. Complimentarily, prompts must sufficiently account for why we communicate in addition to how we communicate. And so, how then could we fine-tune our models such that they reflect the forms of written legal communication embedded in the interactions of the field? In particular, those that can reveal the strategic uses of language. And with that, I'll turn the floor over to Dazza to discuss practical questions about legal prompt engineering. Next slide, please. Great, so just to maybe go backwards a little bit and ask the higher level question, how could generative AI tools like chat GPT be used in a legal context? And that's really, that's the underlying sort of assumption that gives rise to all of the observations and questions that Megan was just raising. So just to sort of lay it out, you could use this type of tool for a contract. Well, wait a second, what kind of contract? A first draft of a contract. And actually, is it the previous slide, Ryan? Could you find the standard warnings? I thought I'd put it up front. It might be a few slides ahead. Let's just start with a warning next. Yeah, that standard should probably be in parentheses, but it does not go without saying. And I think it does not go without emphasizing that this class of technology is not perfect. In fact, it's deeply flawed in some ways. It provides inaccurate and false information and that has a risk of relying on it too much. And just using the first draft as the last draft, for example, it also could raise other legal higher level policy issues with misinformation. It also has prejudices and biases that were brought in through the training set. So beware of those, of propagating those which can be deeply embedded within the results. And biases is particularly interesting in the legal context, which I'll come back to in a moment for fiduciary duties, but attorneys are one of those roles that owes fiduciary duties of loyalty to our clients. And that means putting the client's interests first to the extent that the training data includes prioritization of corporate interests or a consumer interest or some particular government or cultural kind of interest which can seep in as part of the bias, that may or may not be the same as the client's interests that we need to put first. So becoming aware of and a savvy consumer of these outputs as an input to us doing our job is critical. Okay, standard. Oh, and the last thing I would just say on this is something that I'll put it right in the chat here because these are really words to live by. This is a quote from Sam Altman who is the head of OpenAI that provides chat GPT. Chat GPT is incredibly limited but good enough at some things to create a misleading impression of greatness. It's a mistake to be relying on it for anything important for now. It's a preview of the progress. So part of why, so now having said that, let's go back to the first slide of my segment please. The reason these warnings are important is because stuff is amazing. Oh, like I was saying at the start of the workshop we've just experienced a sea change like as a major threshold moment in terms of the capabilities that are now widely available and that have particularly good application for legal use cases. Can we get the next slide please? What kind of applications? I mentioned contracts, only the first draft, statutes. If you've done the pre-reading you will have seen my back and forth with Chat GPT on fiduciary duties and it came up with what I've written a few federal statutes in the US in my time and it came up with a very good, I would say first draft of a statute for the particular context that I provided it. A complaint in a judicial context, deposition questions, a brief, basically anything for a first draft but it's not just drafts of documents and if lawyers were very frequently document paradigm oriented, there's also processes and I think that the biggest ones might be with legal processes. So for example, legal triage is something that Suffolk University Law School has been doing where individuals can speak in plain language and the AI can figure out what the relevant context is can surface the legal issues and then get people to the right person to help them. Consumer rights, we're gonna hear from Joshua Browder at the end of the session is doing remarkable things with interactive live real-time usage of this technology integrated into things like chat bots by companies but his tool is representing the consumer interests getting into a bot versus bot context there and so much more. Next slide please. One of the really interesting things here that and some of the Megan and I've been working on a lot and you'll hear more about it in 2023 is the late what you might call the latent knowledge or the capability overhang that happens when you take all of this text and all these core more than one corpus is a corpora from all across humanity and put it and you vectorize basically the words and the phrases and the concepts like in linear algebra. Interesting patterns emerge that we're here to for unknown and that could is a new source of knowledge and it can be used very productively in lots of commercial and academic and governmental and other use cases. There's so many possibilities. We've got some great speakers so I'm just going to skip across this for now and let's go to the next slide. There's a lot on that last slide though we'll come back to later this year. So legal engineering meet prompt engineering. You know we love legal engineering at law.mit.edu and you can look at our media page to find some deep dives into what we think legal engineering is and why we think it's so important. Prompt engineering is a phrase you may have heard Megan just went into some of the details of it. We think that there's a subset of prompt engineering that is particularly useful in a legal context and when I talk about a legal context that really gets us back to a concept which also resonates in law from evidence which is relevance. And so one of the critical things to get great results from a prompt in a legal context is to design the prompt so that it provides the relevant context. So by way of an example and actually do you mind if I screen share for a quick second, Brian? Yeah, just a moment. Let me get out of that. Yeah, there you go. You got it. So here's just an example. So for deposition questions you could ask it, give me deposition questions and I put a link to this in the chat but if you tell it things like the purpose of the deposition the specific cases in the parties involved some of this other relevant context here in the context of a deposition it will give you much better ideas for questions that you could ask. Similarly, when I said a draft of a contract you could say give me a draft of a contract to buy a used car and you'll get back something that's pretty good but if you were to ask it I want a draft of a contract for a used car by individuals in the state of California and include the make and model, the purchase price there were any warranties, et cetera, et cetera include this simple plain language in the prompt then you will have composed the prompt in a certain sense legally engineered it to make sure that that relevant context is supported and reflected in the draft that you get and that'll make that draft all the more valuable. One of the things that I, the reason I posted this on our workshop GitHub repo is because when I was writing this and I was thinking what do I say to the workshop participants about what's relevant in these different contexts one of the things I did which is a new go-to for me since a month and a half is I went to chat GPT to ask it what context would you need in a prompt in order to get the best contract these were actually answers that I got from chat GPT saying the context for these different things and I'll tell you what it was better than my draft like these are all twice as long as the examples that I had provided and they're all quite good. Anyway, you can, so let's go back to the slides here actually do you mind taking back the helm, Brian? Yeah, I'm taking over. Great. Okay. Do you have that? Yep, looks good. Okay, go to the next slide. Oh, we already did the warnings. Okay, and so that at a very high level are a few things about prompt engineering. The last thing I'll say actually go to the previous slide this one's kind of chilling. Yeah. The last thing I'll say is that prompt engineering so mostly what I was just talking about was prompt construction or just prompt grammar and syntax and semantics in a way which is important. You could say that's kind of legal engineering we craft words. The deeper engineering here is and we'll see with Jesse Han and the other people have been doing this is to be able to sort of integrate the prompts as part of a workflow that can be automated. So there's inputs at certain points. We get an output that becomes an input for another part of a process. You can actually engineer generative AI at certain points in a sequence of a workflow. That's even deeper concept of prompt engineering. And then the deepest is something I've been calling prompt plumbing which is at a much lower layer of the infrastructure. You can use approaches like Lang chain which does some really interesting it kind of takes summaries of the big blocks of text vectorizes it and carries the context forward. You can do things with much greater amount of information than what happens just there's an interface like chat GPT where you run out of tokens. We'll get much more into all that later in the year. Okay, so I wanna hand the baton back to the next. Oh, okay. And now as promised, Dan and Michael can you come off mute please? Hello everyone. Greetings from Chicago land. Welcome. Mike, I believe are you, Mike, are you on your tracker right now in Michigan or what's happening here? I think I've got to unmute him. Hold on. And well, let's have There we go. Can you hear me now? Welcome. And it's by way of introduction we really look to Michael and Dan as pillars in this emerging space of computational law. Dan arguably kind of coined the phrase before we started focusing on it at MIT. And Dan, I just wanna recognize you again and thank you for being a member of the board of advisors of the MIT computational law report. So with that, can you please show us how on earth you got GPT to pass in part the bar exam? Also, Des, can you add Michael as co-host so he can do video if he wants? If you really want, if you wanna see me you don't have to obviously. We'll do. I'm not on the tractor if that's what you're hoping for, but. Folks are seeing this here. Yes, yeah. Okay. Well, again, the greetings from Chicago and Mike is joining us from Michigan near the campus of Michigan State University. I guess maybe I can, well, I'll just keep it here for a second. Maybe I take you a little bit back, back a little bit for us. We've been working in this area of large language models on the academic side for a while now and more recently on the commercial side. Years ago, we had a company called Lex predicts and we did a bunch of things in that company, including things like litigation prediction and contract analytics. And we had a library in the library, one of the libraries, well, it's several libraries of one was called Lex NLP. And it was focused on, you know, what I think will now be called classic NLP in this area, classic NLP, which is, you know, the historic workflow in which people undertook NLP tasks, which is now increasingly being displaced by sort of deep learning as the base, as the kind of base method. And so, you know, unfortunately, this is just the nature of things. You know, the libraries that were built back 2016 to 2018 have been eclipsed by other methods. And so I'd say, you know, you could still use some of what was done before, but I think it's kind of, you know, unfortunately fallen by the wayside. So last year we, on the academic side, we worked with this Pan-European group on something called Lex Glue. And this kind of was an opportunity for us to really work heavily. This was a benchmark analysis of several leading large language models on a wide range of tasks, including BERT, Longformer, BigBert, so forth and so on. And we got the paper in the ACL conference, which is probably the best conference or one of the best conferences on this topic of natural language processing. So in November, when we got out of our non-competes, we were once warned to the breach. We started another company. And so, you know, it was in the context of doing that work. I mean, we're doing a bunch of stuff to build out this company and we won't talk too much about that today. But that kind of set the conditions for us to be thinking about, okay, you know, we're building a bunch of these core tools. And we've been telling folks, hey, you know, there's been a material increase in the quality of these large language models and, you know, but we could not come up with a great way to show that to people. And then of course, November 30th, about seven weeks ago, ChatGPT enters the fold and, you know, and here we are. So I was doing this, I run this MOOC at a Busserius Law School in Germany, along with some other schools, including SMU in Singapore. And the very last session, we did this introduction of Richard Soskin using ChatGPT. And we sort of said, okay, you know, giving these intros is always difficult, you know, you want to show sort of the proper amount of fealty and what have you. And so we said, you know, we're going to outsource this to ChatGPT. And I'll say it gave a pretty high fidelity introduction and even thanked Richard for his presentation. So right before Christmas, I called Mike and I said, I think this is it. I think what we should do is try to do the bar exam. There have been a few efforts, a couple of people had shown a few things online, but I said, you know, we need like a rigorous systematic treatment of this, not just kind of like plug stuff in and see what comes back out, but like, can we kind of go through this in a more systematic manner? And so, you know, we got done with Christmas, we put our heads down and a few days later, we had kind of version one and now we're on to the second version of the paper. I guess I'd just say this. I mean, language is the coin of the realm in law. And if you had to kind of say, and most roads in law lead to a document. And that document is expressed in natural language from a historical and anytime soon going forward perspective. We have had subsequent waves of legal technology. Most of the tools that have been built today, including anything we've built and any other tool and I'll stand by this, really have not had a very good account for legal language. There've been clever hacks to work on these problems, kind of in an indirect way, but never a frontal assault on the problem. And the problem is that there's a lot of semantic nuance in legal language in general language, by the way, and in legal language. And now we've seen this kind of material increase in the quality of tools. And so this kind of brought us around to say, okay, can we work on a problem that would help demonstrate to people the nature of the capabilities and the increasing the capabilities. And so we started on the bar exam. So I'm gonna pass it over to Mike and I'll be the, I'll be minding the slides to kind of talk us through. But I just wanted to set us up with that over to you. So, yep. So we did what you would kind of hope we did. Or at least I think it's what you'd hope we did. We went to the source of the exam in any sense, in any sense that there is an exam in a singular correct sense, right? It's the NCBE's model exam. And there's different components to the exam. Some of those are obviously better suited to something like GPT, for example, the ME or the MPT are probably things that GPT could do. They might be things that GPT could even do at an adequate or passable level, but we chose the MB portion in particular because there's not really any degree of subjectivity. It both features complex syntax in the questions, questions that are, if we're being honest, purposely written to trick people, both with the length of the sentences, the complexity of the sentence structure and the nature of the presentation of the facts, extraneous adjectives, all this kind of stuff. And there's no question as to whether Dan and Mike graded it correctly, right? We don't have access to all these NCBE or state bar graders. And so were we to do the ME or MPT, there would be questions about whether we had faithfully reproduced an assessment as the actual students sitting for the exam would do. None of those questions for the MBE. So is it only the MBE? Yes, but does that obviously allow us to speak more objectively? Yes, so here's an example of what we got. This is from the NCBE's public documentation about this. We can't reproduce, info all of our questions because they assert copyright, but you can buy them for 200 bucks. And you can see, I think this is, let's see, there are one, two, three. This isn't actually so bad. These are what, four different sentences here. Sometimes these questions are one to two sentences with that many words. And the question is a four part multiple choice question. I'll point out just to be very pedantic here. The question is asking for a binary answer, but of course there are not two choices. There's actually four. So the prompt, if read literally, which is what GPT will do sometimes and some people do, is not really aligned with the question. And this is obviously just a part of dealing with natural language. So while, if you want to be really pedantic, you'd say the questions are poorly written by the NCBE and trick even GPT. It's also just like this is, this is the way your client's gonna speak to you. They're not gonna be that precise. So deal with it. Next slide, Dan. So again, the baseline to talk about the students sitting for this, the rates at which students correctly answer questions are presented in that right most column in this table. And if you've ever procured legal services and you're not an attorney who sat for the bar, those numbers might not instill a lot of confidence, right? Like you don't wanna know that your counsel forgets rule 34A and gets you into a spoliation situation cause they only got 59% on the bar, but that's the way it works. So these are the numbers quote to beat, if you will, or these at least represent the efforts or abilities of people who spend a lot of time on this. Yeah, another key point here is chat GPT is kind of the name desure for what OpenAI offers. They offer and have offered a number of models. Some of the models are multi-modality models that do different things. Some of them just do one thing. Text DaVinci 3 is the best model that we could get to answer the questions. There's also a Codex model that has got larger token windows and supposedly better on some tasks, but Text DaVinci 3 was the best and largest model that actually responded, which is technically different from chat GPT as you experience it, but supposedly the foundation. So with that detail aside, we get to the meat of this. And I think it was great Megan and Daza, you guys talked it a little bit about very related concepts, right? So the degree to which the prompt can impact the model's response is some sense, Megan, like you said, not much different than humans. In many circumstances, the way we frame problems, the way that we pose the outcomes, the way that we contextualize which shared body of knowledge or if there even is a shared body of knowledge, all those things have a huge impact in how we as humans carry on conversations. And we see that with these models. Now, we have, I don't know, let's say 70 years of somewhat rigorous psychology that can at least inform human-human interaction. We do not have anywhere near that much longitudinal research on how human-computer interaction in these LLMs works. So what we did is try seven things that you might ask a normal student to do from a heuristic perspective, helping them take a test, or you might just write questions this way if you've ever written questions, is it professor or whatever. So what's the answer? What's the answer with a justification or explanation? Then some variation on that with rank ordering two, three choices. In our follow-up work on the CPA exam, we did a little bit more with source elucination and source constraints, which I think you touched on, Daza. But for this paper, we just did these seven prompts. And when we did that, as Dan said, we wanted to do this in a very rigorous scientific way, not just like a coffee-paste couple bar hero questions kind of thing into this. So we tried just about every switch and flip in dial that's exposed on the API to ensure that one, the results were robust. This wasn't just like some kind of local optimal API parameter value where it magically worked. So it basically did within six or 7% across every setting that we tried. And the only thing of note probably here qualitatively is the temperature in some of these parameters have to do with how random or how reliable and deterministic the answers from GPTR. If you're doing anything where you really need to explain what you're doing or site that you did something at a certain time in a certain way, you should be careful about your temperature values because the only way to deterministically record something to the best of our abilities with GPT is to set the temperature to zero. So we tried all these different things. And like I said, the short answer was it didn't really matter. And everybody asks, did you fine tune it? Answer was yes, to the extent that we had a couple of hundred test questions and no, it didn't help. And no, we don't know exactly why, although we have a lot of theories and there's some other research about how fragile some of these models are. And the question is best answered by just not using GPT, which is something we're working on. So as far as the results, I imagine a lot of you've probably seen it because it's been kind of hard to avoid in the press lately, but I didn't believe it at first this kind of the short answer because of how hard the problem was and how prior research, even from like Thompson Reuters with a lot of effort had been, let's just say not anywhere near close to these. So the model does worse than the students, but not much worse in a handful of categories. And the models top two responses are very much correct relative to what it would have gotten if it had been randomly guessing, which suggests it's very close to doing even better than what it's doing right now. And yeah, I mean, I don't know which section you hated most for those of you who've taken the bar exam or those of you who have kind of practical experience with the law, which of these you think you actually still live today, but a lot of the questions in the exam are difficult. Some are more fact specific, some involve like information that might be deemed to be outside the scope of what the contextualization like con law, for example, a bunch of the con law questions have to do with, let's say foreign relations or stuff that may have actually been harmed by the contextualization prompts, but it did better than anyone expected ourselves included. I think it's safe to say. And Dan, if you wanna go to the next slide, I think it's clear that something sometime, I think we said in the paper, zero to 18 months from when we published will likely meet the threshold for the NCBE's kind of estimated passage rate. When that'll be, I don't know. I think I'm leaning towards the under now on that range and not the over based on the acceleration that we're all seeing in the market. And I don't know whether you wanna talk more about what that means for the bar exam or what that means for attorneys who practice or what that means for public policy or what that means for clients, but any and all questions I think are obviously relevant and salient right now and in real questions to ask. Maybe I'll say one thing about this. This was not in the first version we put out and we thought it'd be very, we'd kind of done it, but we didn't really, it'd be very helpful that again, we wanna show people kind of progress is like, let's just go back and run kind of the historical gambit of GPT models to give people a view that even 2019 in GPT two, which people have used in papers to show like, do things like draft patent applications and things like this. It's not even able to process the question. So it's a 0%. And then eight of one, go ahead, sorry, Mike. And we've been using some of the commercial stuff like AL and AI models, the Bloom models, all of these kind of models out there have been testing for a variety of tasks. And the prior generation of models or models that could run on 48 gigs of VRAM before some of the latest eight bit or compression techniques, like these things were struggling again, even to respond to the prompt, right? You give a four multiple choice questions with a 500 token intro question and it just wouldn't even work. So something has materially changed even in the last six to 12 months in terms of the state of the art. I feel it too. That's partly why we've dedicated so much of this workshop and why we're gonna be focusing on this through the year. Something big is happening right now. Something has changed. There's been a major breakthrough. So glad you're both on it. I'm starting to interrupt, but I just wanna emphasize that point that hey, everybody listen up, things are, this is different than it was even just 12 months ago, nine months ago even. Yeah, and I think that this chart is pretty much the proof. And I just show it to you with one other example that is this saying we have the same result you see in the bottom corner should have made a larger version of the graphic but you see the same story. That's the CPA test. Now it gets clobbered on the math part of the CPA. You can read this paper, but like, but it's the same basic story. You see this material jump between GPT 3.0 and 3.5. Bottom left corner, as you see it. So, okay back over to you Mike for anything else? Yeah, and I think that the biggest point like if you think about what is the bar exam really test Dan as you said earlier, it's mostly a test of syntax. There's some test of legal theory and some practical in the MBE at least that the kind of thing that you at least see in law school and the state bars care about, but I mean, honestly, I think a lot of many practicing attorneys, especially as they lean corporate care more about the things that are tested in the CPA exam from a concept perspective then let's say whichever question California decides to throw under the exam this year. So the CPA exam is an interesting semantic or conceptual counterpoint to the syntactic performance of the bar exam. And to me viewed in kind of compliment to each other they show this isn't just a syntax capability quantum leap. This is also a semantic conceptual awareness that was also previously not either present or able to be exposed. So there we are. I know I think I saw a couple of questions come in. Yeah, we've got a few. I can help surface them for your convenience. Do you want to please kind of pick and choose? By all means. So one question that's kind of seminal it's high level and then we'll get into the nitty gritty is what does this mean for the future of the bar exam? Like if generative AI, let's say in the next revolution or evolution passes, you know, overall passes the bar does that mean that our bar exam or CPA test should evolve and how and can I just offer one provocation to that which is and I've been thinking about this a lot lately when I've been trying to grapple with what does this mean and how do we adapt to this make the most of it and not fall under the bus as well? You know, when the motor vehicles kind of came along that was a big change, right? But so people could go a lot faster and a lot further that didn't mean we changed the rules of the Olympics for running or for a marathon. So we have things that humans do. We have capabilities that machines provide that allow us to extend our reach and our power and our vision in certain ways. But it strikes me that the most important thing here is to look at the technology and not necessarily judge it solely against human intelligence but let's take a look at it for what it is. Now, having said that, let me ask you guys, what is it? What does it mean for us and for the bar exam? Dan, do you want me to answer because I have less to lose in my faculty? Well, I don't, yeah, go ahead. I mean, I'm not any big like defender of the bar exam so go ahead though. I think the question is why does it exist, right? And there's a degree to which it exists in the absence of a regimented system with transparency and you could talk about things like econ, lemon law, information or you could just acknowledge that there might be long-standing gatekeeping dynamics that are a part of this and that the NCB itself has adjusted the difficulty of the exam solely to reduce passage rates, which doesn't strike me as necessarily relevant to the qualification of practitioners if they're just changing it to make it harder so there's fewer people to pass every year. I don't carry a bar card so I can kind of say what I want on this front but I think the question is again, we said this I think in the intro because it's where I truly am on this. There is legal demand. There's kind of an uncontroversial quantity of legal demand in the market. Lots of people have tried to measure it. Access to justice is kind of existing solely because we don't have either supply or access or whatever kind of lens you wanna take. People aren't getting the legal services that they need for one or more reasons. To me, as long as we have this unmet volume which is not an insubstantial volume of demand for legal services, especially among people who are probably if we're being very blunt not getting access to the best attorneys anyway, then we truly do have an ethical responsibility regardless of what the hell the state bars tell you but you have a true ethical responsibility in an absolute sense to try to figure out how to use these tools to help people. And does that mean give them chat GPT and say, do exactly what it says and I'll bill you for it. It absolutely does not mean that, right? But so long as we have so many people who can't afford or access services, the ethical obligation is to figure out how to solve that and I don't see anything that can scale and get anywhere near as close as what we just presented. Is it ready? No, but is there any other system capable of scaling to the total volume of questions that people ask in our legal systems? Happy to see it. And speaking of the total number of questions, we have another question here about what that corpus should be and the question is how much of GPT DaVinci's performance relates to the lack of the training set for specific texts in the legal context the judicial opinion and so forth. And what do you think about the next version of GPT in terms of maybe honing it or doing post training kind of fine tuning just in this or if we just make it big enough, will it be able to surpass these barriers? Yeah, I mean, one of the things that nobody knows because it's kind of a closed model despite the name of the company is the provenance of the model. So there have been a number of publications that are peer reviewed although peer reviews limited in situations like this and they say that they're training on a set of data. The best open analog we have is called the pile and used in a number of models like the ones from the land community in Bloom. And the pile includes a large volume of material that includes been unlimited to the free law project in NOLO. And in our CPA paper, for example, we explicitly ask the model to include a source or an authority or reference to the authority for its answer. And frequently it will show you a URL for something like NOLO or LiI or a similar source. And so I'm kind of going back and forth on this as we've collected a little bit more information. I do believe that GPT and many of these models have in them most of the public law, if you will. Do they have every complaint in PACER? I don't think so. But do they have most of the public law that you would think would be required to answer these questions? I think the answer is yes. So then it comes down to the architecture of the model, what data was used in reinforcement with what pre-processing or post-processing or other models are in the pipeline that we experience as this singular model. And that, for GPT, I don't know how to tell you the answer. I do know the other models out there seem to know about source material. Source hallucination is an issue at large and there's techniques to handle it. And I know you referenced Lang chain too, which is a great way to control some of this stuff. But it's an open question. I guess it depends in the most appropriate answer for this community. Well, one thing that should be said though is that we pick this test because it is not really available on the internet. That's important because otherwise it's sort of your feeding thing it's already seen. So I mean, obviously there's a concern we do it again and maybe it's being gobbled up. I mean, this is always an issue, but. The answers for this exam were never sent to GPT. We just took, we only took the answer back. We're trying to keep it clean, but you worry with some of the other, there are bar materials out on the web and it's probably been gobbled up in this kind of vacuum cleaner that they used to put into the file or what have you, or a common crawl or whatever. So that was what we were trying to do is, we can't, because nobody knows absolutely for certain, but this is not generally available on the web. That's what we can say. We're going to need to start to segue. I know you're incredibly busy, but I encourage you to stick on for a few more minutes if you can, Michael and Dan, because I want to show you, I want you to see what Jesse's come up with in his startup by way of a new modality that lawyers and others can use for prompting. And the last little bit of color, as Megan and I are in the midst of a research project trying to probe what these models can do vis-a-vis fiduciary duties, one of our thoughts or one of the things I'm starting to work on with Gabe Tenenbaum and with Jonathan Askin and others is to get faculty and experts in fiduciary duties to help us come up with completely new fact patterns and cases that have not only not been published, have never been thought of before, so we can really finally have confidence that there hasn't been leakage in the training data. That's one of the ways, it's like an extraordinary measure, but at some point we have to just put our foot down and come up and be absolutely sure that we're at least getting performance on things that are novel. We did that with the CPA exam. So maybe we can share that more. We created DeNovo questions from the curriculum. You gotta talk. Okay, so now with that, Jesse, are you with us and can you come off on video mute? I am, thank you for handing over the stage and for setting the sub-dozo. Very excited to share with you guys that I've been building. I think you'll find it very helpful for some of the legal use cases that you've been considering. If you will allow me to share my screen, I'll jump right into a demo. Absolutely, go ahead and hit screen share and let us know if you have a problem. Great. And Jesse, by way of introduction, am I correct in saying that you previously worked at OpenAI and were involved with ChatGBT? Yes. So I'm the co-founder and CEO of Multi, which is the startup that is responsible for this tool, which I'm about to demo to you. This was started back in May. Prior to that, I was working at OpenAI. I worked on large language model infrastructure, did some early work on ChatGBT dialogue systems and grounded question answering. I also achieved a new state of the art in machine translation using large language models and was a major contributor to OpenAI's Theorem Proving Release, where they applied large language models to mathematical theorem proving inside of proof assistance. So I wanted to echo a sentiment that I heard in the last presentation, which is that we need systems that are gonna be able to deal with the unprecedented amount of scale that these large language models are going to enable. And that's not scale in terms of parameters, but scale in terms of like the volume of text processing that has to be done. So I think that that same consideration applies equally well to all knowledge work in general. The kind of intelligence that we see in large language models right now is kind of like, so it's like an alien sort of intelligence, which is a pretty good approximation to like a somewhat unreliable teenager. And as these large language models become more and more commonplace, we will begin to see large amounts of knowledge work, not just knowledge work inside of the legal field, but knowledge work elsewhere, which involve processing large amounts of text, which involve operating software tools, begin to be more and more automated. And so the problem becomes, how do you orchestrate that kind of knowledge work automation at that kind of scale? And that is the problem that Multi is trying to solve. We are building a software platform which anticipates this future where there's this massive abundance of near human level intelligence and automation. And our first product, MultiFlow, which we released back in November, provides a visual intuitive and low code interface for people to assemble AI first workflows. And let me give you an idea of what I mean. So I'll give you guys a bit more background on myself. And this kind of showcases some of the features that we brought online in MultiFlow recently that might be especially relevant for people working in the legal field. So we recently added a way for our users to upload PDFs. So this contains an uploaded version of an unupdated resume of mine. So this is a pretty long document. And with resumes, it's kind of hit or miss whether or not they'll actually fit inside the context of a large language model. But we've gone to the trouble of doing the kind of plumbing that Daza mentioned during his talk. And we can actually handle question answering over documents of arbitrary length. So we recently added a document Q&A block which takes in a document and a query and provides a bullet-planted list of answers which were extracted from this document. So we can ask, what schools did I go to? It kind of says here, but let's pretend it's further down in the document. We can also ask something like, well, what was Jesse's SAT score? And we can also ask, what are Jesse's notable publications? So what's happening here is that I am building a program in a visual programming language for which MultiFlow provides both a user-friendly front end and also a runtime that keeps track of all the state. And we can step through the execution of this program by clicking Run. And this is all powered by large language models underneath. And the amount of abstraction which is inserted between the user and the actual large language model API calls is entirely up to the user. I'm operating a fairly high level here, but later on we'll see an example where we actually get a bit closer to the metal in terms of how much we're actually micromanaging those large language model calls. So let's take a look here. So it correctly extracted the fact that I went to UCLA. I got a bachelor's in mathematics in 2015. I got a master's in math in 2018. This resume is not up to date. I actually got my PhD a month ago. So it also correctly sees that my SAT score is 2280 and it correctly extracts the fact that some of my major work has been in formal theorem proving and the applications of machine learning to automated theorem proving in mathematics. So I'll pause here for any questions. Thank you. I actually use multi-flow to write the PhD. Ah, that's awesome. So if we wanted to say generate some copy. So here's another example where a visual front end for prompt chaining and for being able to recursively assemble prompts and then feed them into language models and use them to produce a complex result might be useful. Suppose that you need to write a short blog post about some arbitrary topics. So in this case, I've asked it to create a blog post about the fact that clowns across the world have decided to go on strike. So what we've done here is we've combined few shot prompting where we show the language model a bunch of examples with the kind of structure of saying, well, now we're gonna few shot prompt an introduction. We're gonna few shot prompts and body paragraphs. We're going to few shot prompt a conclusion. And we use the topics which are extracted from the language model call as prompts for actually producing the actual body paragraphs which will be inserted inside the blog post. And so you can see that once this entire thing is run in fact, we can just re-execute the entire thing. So we see that it regenerates this blog post and it also creates a bunch of images. And these images can be prompted so that they arise in different styles. And this is something which is possible through the string formatting that we have through our text boxes. So as you can see here, so the way that we manipulate text inside of this interface is that we use template variables which are denoted by these double brackets. And so when you surround a piece of text with these double brackets, they become variables which are then exposed as ports and you can pipe input into these. And so all we're doing here is we're concatenating the copy about clowns with some styling prompts. And then this becomes the input to a call to a stable diffusion model. So now let's maybe jump over to a use case that might be particularly interesting for people working in the legal field. So here's an example of a flow which performs dialectic reasoning on some topic. So suppose that you wanted to analyze the pros and cons of some controversial topic and then synthesize them. And you wanted a highly interpretable trace or audit trail of what the language model was thinking throughout that entire time. So what you can do in this case is you can just ask it to create bullet pointed lists of pros and cons and then ask it to argue against itself. So I'm gonna try something different here. So I'm going to try building more housing in San Francisco. My favorite topic. So you can see here that, so what's happening here is that we've created the instruction to create a bullet pointed list of pros or potential benefits of the following topic. And this is input to a language model API call. This is our text generation block. There are some settings here where you can control various parts of the open API. And you can see that here it generates a list of pros. We can then ask it to expand each of these short bullet points into a full-fledged paragraph, right? So you can see that this is piped into this text generation block. And this provides a paragraph which expands on each of these points. Finally, you can ask it to produce a point-by-point rebuttal for each of the claims which are made inside of this essay. And then you feed that into another language model API call and that creates a rebuttal. So this kind of structure chain reasoning is especially important for high touch applications or particularly high stakes applications which involve lots of language like legal because this provides a clear audit trail of the model's reasoning, what the ingredients were that went into each of its choices. And it also gives you the ability or at least the interface that we have here gives you the ability to quickly rewire the way that the model is prompted in order to achieve the outcome that you want. And what's happening at the bottom of this flow here is that we've just gone and reversed the chirality of this. So we've asked it to come up with a list of cons instead to expand that into an essay and then produce a rebuttal. And then from these, we can actually... So one thing that we could do here is we could ask the model to take these two things and synthesize them. So this is actually something that we can do right now. So, okay, so let's pipe in the text above and let's format it so that the model knows that these are two different pieces of text. So this is rebuttal one, okay, and this is rebuttal two. And let's separate these with like some kind of demarcation. Okay, so above you have been given... So this kind of prompting works with the instruct models on the Open API. Above you have been given a rebuttal of the pros and cons of topic. You then a rebuttal, the pros and cons of topic and a rebuttal of the pros and benefits, the cons and drawbacks of topic. Synthesize these into an essay with multiple paragraphs. And then let's do a little bit of prompt engineering and tell it to write an eloquent essay, an eloquent, well-articulated essay. Okay, so with this in place, we see that due to the formatting, which I mentioned earlier, there are three inputs. So we need to pipe in the first rebuttal and then we need to pipe in the second rebuttal and then we have to pipe in the topic, which is all the way back here. So, and finally, we can run this through the text generator to get a final output. And now let's rerun this thing again. So this gives you guys an example of the kinds of rapidly and increasingly sophisticated use cases that you can achieve with complex prompting and a tool like this. And what our front end gives you is the ability to inspect the intermediate outputs to rapidly change the prompting style on the fly. And also to deploy these to an API. So if you're a developer who wants to integrate this kind of technology into your own application, we are actually constructing a function in a visual DSL for programming large language models. That's why you see these input and output text. And so what we have here is a function that takes in a single input, which is a topic and produces an output, which is this like well-argued articulated essay, considering very carefully all the pros and cons of this topic. And this can be deployed to an API that you can call from inside of your own application or also to a web app that you can just create and share and which hides all of the intermediate outputs. We're standing. And so I hope you can all see why it is I wanted Jesse to share this. So a lot of us have got at best a concept of a very flat interface. I mean, the great thing about ChatGPT is that it provided very wide almost population scale immediate access to the technology, but it's hardly the ceiling of how this technology can be configured and composed and integrated into other systems. And so thank you so much for showing that Jesse. I forgot to mention that I'm bending our rule a little bit here on product demos. Obviously MIT doesn't endorse this product. This is for educational purposes to see what's possible. And let me just ask, let me start with, we're gonna have to move to the next session pretty soon, but Dan or Michael, if you have any reactions, questions or comments, I invite you to jump right in. Go ahead. I would just say this is what, when we talk about like why you wanna use an LLM and what the results that we've showed mean, this is what we're envisioning their actual use case, right? Like not a human directly asking GPT for answers, but something like this, let's say the Illinois Legal Aid Online, which Dan works with in his capacity at Chicago Kent, instead of building a rules-based triage system, which is what the ILIO does today to answer questions about, let's say, landlord tenant disputes for tenants in the state of Illinois would replace the rules-based system or parts of that rules-based triage system for people who are trying to deal with their legal problems with something like this. It would be inside of this larger ecosystem or for being honest, a real product. And this is just a component in a product, not a product itself. Yeah, absolutely. I think that's the future that we're heading towards and eventually systems like this certainly become more and more widespread. And we're going to be living in a world that is going to be orchestrated by language model programs like this. She came out. No, I just, this is great. Congratulations on this. This I think is a great follow-up from what we had shown because this kind of shows you where you can take all of this, particularly I like this idea. I mean, just for like an education perspective, teaching my students about like, when we do legal composition, we think about the relative weights or merits of arguments in law. This is a way to sort of enable them to see those kind of, I don't know, the battle royale between these arguments or what have you. I can imagine just in teaching legal writing. You asked this question earlier about what's the future of the bar? I mean, how about we use, how we have a measure of people's performance but you have to, you can use the best tools available to then solve some theoretic client problem and that's your demonstration. And so if this is the tool you have then you get to use it and you get to use any tool. And that's the world I'd like to head to which is you use the best in class tool to solve the people's problems. Anyway, I don't want to turn this into a revival but it'll happen very quickly otherwise. Thank you. Yeah, and I didn't get a chance to show you but the document Q&A feature which I started the presentation with is actually very useful for analyzing contracts. It nails most questions like for purchase agreement what are the obligations to the buyer and seller and so on. So there are lots of use cases. I'm gonna drop a link to the website that I was using in the chat. And I encourage you guys to sign up for the wait list and come check it out. Thank you so much all of you for giving everybody an introduction to what this technology is and what's possible. And not only are tools evolving but the technology itself just put your seatbelt on for Claude from Anthropic and for GPT-4. So this truly is the beginning. Okay, now Joshua, welcome everybody to the special session of fireside chat with Joshua Browder of Do Not Pay. We're so glad that Joshua is able to join us today and this is actually his second appearance at law.mit.edu's computational law report. If you look on our media page, you can see a very interesting kind of stage setting podcast that Brian and I did with him, oh gosh, about a year ago now or maybe a little more. And so much has changed since then thanks to the ready availability of generative AI. And I can't think of anybody who has done more creative and provocative work with this technology in the legal context than Joshua Browder. And I wanna thank you again for joining us and ask you if you'd be willing to give a brief introduction of yourself and Do Not Pay, well, I'm sorry, one other standard disclaimer. Of course, we don't, MIT does not endorse Do Not Pay as a company or any of their products or services. This is educational and I do think it is very informative to see what is possible, especially in the consumer context. So with that Joshua, maybe we could unshare the screen so we could see Joshua. And I'd like to invite you to introduce yourself, your company and to maybe let us know what have you been doing with generative AI and GPT through your company for consumers? Well, thank you so much for having me. It's a shame to hear that MIT doesn't endorse Do Not Pay, but I understand. So at a high level, Do Not Pay is automated consumer rights. We like to call ourselves the world's first robot lawyer. And we've been operating since 2015 and we've had a huge amount of success with templates. So rules-based systems where if this happens, we send an angry letter to the government or a corporation to get a refund or get someone out of their parking ticket. And that has been taken us very far. We've won over 2 million cases just with letters. But what's really exciting is in the past year, the AI models available with companies like OpenAI and GPTJ, which is the open source version of GPT-3 have really, in my opinion, improved by 10X. And because of that, it's allowed us to actually go back and forth with these companies and governments with disputes. So we've done things like automatically negotiated live with Comcast live chat, where our bot talks to Comcast, perhaps they're using an AI talks back and our AI legal assistant negotiates a build-down. We've had a bot phone up a bank and using a synthetic voice to negotiate to get a wire refund. And next month, we're taking it to the next level where we're actually having a physical courtroom introduce the robot lawyer where the bot will be whispering in someone's ear what to say in a speeding ticket case. So we're really trying to push the boundaries of bringing this technology to ordinary people, because typically when there's a powerful technology like AI, it gets in the hands of the big corporations and the government first. So we want to give power to the people and actually give consumers access to this so that they can fight for their rights. The work that you are doing frequently, too frequently when it comes to powerful new technologies, the little guy, individuals and consumers are the ones that are almost subject to it. Sometimes we don't come out as well in the overall deal. It's so great to see you applying this creatively and effectively on behalf of consumers. I want to ask you if you could go back one half step. You made a quick reference to wire fees that I believe you had refunded from a bank. And you kind of said it very quickly, but I was hoping you could go a little, in a little more depth to talk about what I thought was a very intriguing integration of voice to text, generative AI, text to voice, and then how you went from a phone tree to talking to a live person. Like just tell the story of what you did there and what's possible. Yeah, so all of these AI language tools are useless if you can't actually communicate their outputs to where they need to go. And so that's what we specialize with at Do Not Pay and we have bots that go on these websites and do all the clicking and submit this text to get a response. And so we decided to take it a step further. There's an amazing API, it's called Resembo API and it allows you to clone your voice. So you can record five minutes of you talking to the AI and then the AI will replicate you and your voice. So then we used a Twilio bot to phone up Wells Fargo and it was my voice in a robotic form talking to them and the conversation was powered by GPT-3. The voice was powered by a different AI called Resembo and then we actually had other AIs as guardrails because there's huge limitations with this technology which I can kind of go into that the biggest limitation is the AI talks too much. So if a representative is saying like, hang on, let me look at it. The AI is inclined to have a three sentence response and so we've actually have another AI which even decides whether to say something or not because it was talking too much and this is gonna be a problem for our courtroom case coming up. So there's a good test for that. And then the final thing I would say is that the AI exaggerates and lies a lot. With our Comcast dispute, when we sent the AI to get a discount, it said, I had five outages in the past 24 hours or something like that. And that might be a good strategy but from a liability perspective, it's not very good for do not pay. So we've had to prompt, it's all about the prompt, what you're prompting these models. And so we've prompted it to say, stick to the facts, don't exaggerate, and we've managed to clean it up using that as well. Video and was amazed at your demo of how the result of this rather experimental, I would almost call it use of this chain of technologies was that you got your wire fees refunded from the bank. It was a triumph. One little question, I noticed when I tried to find the video, it looks like Twitter has taken it down as a violation. What's that all about? Twitter has a violation saying you can't have deep fake voices videos. And so they flagged it and they took it down. Is voice representation a fake when you are the person who are choosing to use it as a proxy of yourself? Is that fake? Or is that an extension of your real identity? So we have a compliance team that do not pay of real lawyers and they always are very upset with me because I'm always pushing the boundaries. And so that was a good point. We decided that to reduce our exposure of this experiment, it would be better if I was the one who did it. So that's why we did it because at least there'll be a core argument saying that I was just calling them myself and using an assistive technology. Yedda. What, in the media lab, when we think about artificial intelligence, we focus on what we call basically kind of cognitive extension and looking at ways that it cannot replace people but actually expand and extend our capabilities. And so I think I would say that what you did there is right in the center of one of the things at least I have in mind about how this type of technology can help people manage this myriad of relationships we have as consumers with all of these companies and government agencies and other organizations, all of whom are using AI. What about us? Can we use it too? Yeah, so the courtroom stuff is an experiment. It's on the borderline illegal. So we're not making a product out of that but we have several really exciting AI products coming out. One, we released yesterday that summarizes terms and conditions. We have more advanced ones coming out where you can upload a medical bill and the AI will go on the No Surprises Act and dispute the bill. And I think it's really good for people for two reasons. The first is that a lot of people can't afford to get access to their rights. That these big companies have a business model of concentrated benefit but spread out harm. So I think we discussed this in the last call but Comcast can charge a million people $10. They make $10 million. It's great for Comcast but the people being charged $10 or $12, like in my wire fee case, they don't have time to call up Comcast and waste their time over $12. And so that's a great job for software. And then finally, I think it can really make access to justice affordable especially with these more expensive cases like medical bills. The AI is sort of its own entity or like let's say an open AI or an anthropic entity versus the AI being used as a proxy or an agent of a person. And therefore having the affordances of the rights and obligations and the roles of that person and consumer. So let's, but instead of a consumer, let's talk about lawyers. It's law.mit.edu. We love technology and law. So part of that is practice of law. How do you think this is going to play out in a litigation context when lawyers are using the technology and the way you have in mind for this next activity of basically having it provide information for them that they would say, would this be deemed an assistant of the attorney operating under their license or would it potentially, as I'm seeing in the chats, people are using that chilling phrase unauthorized practice of law? So just to give some more context. So in December over Christmas, I tweeted out an offer on Twitter and it said, does anyone want to be the first ever AI court case will pay even if you lose and we're actually going to even throw in some additional compensation for the risk of contempt of court and other things. And the tweet was seen by millions of people and I had 300 different offers for people to participate. And so our team looked through all of these cases and we were really looking for three things. The first is wiretapping laws. So for the AI to even process what the judge or someone is saying, you're going to have to record it and broadcast it. And some states are one party consent states where just one person can record, but other states require everyone who's being recorded to give their permission. So that was the first thing that ruled out a lot of cases. The second thing was around, as you said, unauthorized practice of law. Some state statutes like California are very broad and entities and corporations and anyone can unauthorized practice of law. And so it's a gray area in places like those. But the way these statutes are written, they no one could have ever imagined that AI, there would be robot lawyers. And so in some states, the statutes are very specific to a human being pretending to be a lawyer. They were written in the days when mechanics were pretending to be a lawyer back in the olden days. And so it doesn't really have the concept of do not pay in mind. And so there's some places where it's completely legal and we're not too worried about that. And then finally, there's local courtroom rules, some courtrooms like the Supreme Court ban electronics, other courtrooms you're allowed to have electronics and so things like that. Indeed. So it's funny you mentioned mechanic of all things because we're really fascinated by the potential of what we call legal engineering or basically mechanics of law. And we think in the information age, actually mechanics are gonna be a really good skill to have in the digital economy. So it's a particularly poignant example. So how do you imagine? So I mean, we're obviously in a time with early experimentation. I think it's safe to say you're a leader when it comes to creative new use cases. Can you help me look over the horizon a little bit? I'm sure you've been thinking about this, but after the first wave or two of evolution and adaptation of this technology for let's say legal practice in the courtroom, that's such an interesting dramatic scenario, how do you think that this technology would be integrated as a matter of course? And you know, taught in law schools and have rules of procedure in courts that recognize it as being a place? Will it be sort of like a laptop along with everything else on people's desk? Will it be this sort of real time on speaking on our behalf? Or how might it play out in practice? I think people should have a right to have AI advise them in courtroom hearings if they're a pro say litigant. As of right now, no state allows that, but our goal with this case, especially if we win is to set a point that it is an access to justice issue and it can open it up. I think it's also an accessibility issue. A lot of people struggle to read all of the laws and understand all of the text and AI can help them overcome that on an accessibility front. And so maybe there could be some ADA litigation around allowing AI in courtrooms which I would be excited about. I think that the problem is the people creating the rules, the bar associations have an incentive, unfortunately to keep prices high. And so that's the pessimistic argument. The optimistic argument is that there's not a single lawyer who's gonna get out of bed over a $500 small claims court case. And so this is really an underserved need. And so perhaps it's not even about replacing lawyers. It's all about expanding access. I think that there will be some lawyers who should be very worried like the ones you see on billboards. So the show better call Saul, he should be worried, but others don't really have to create regulations. They should be forward thinking. You're here. And in particular, it raises the question when other sides are using the power of these tools, if you are being artificially restricted from using it, is that in effect the kind of, are you being handicapped? Maybe there's some new interpretations of ADA and new expectations reflected and supported in regulation of procedure that we're going to have to look at adopting. Speaking of that, I wanna come back now to another kind of big picture over the horizon concept that I think your early work has raised. And that is what you did with the wire fee refund in the Comcast bill, at least initially on the wire refund, you had to go through a kind of a phone tree. And with Comcast, I think it was entirely the chatbot on the Comcast side. Do you imagine an ecology where consumers have AI based technology that is sort of the inverse or converse reciprocal service to companies and government agencies in a large scale so that we basically have sort of like general more standard types of APIs and interactions and maybe guardrails or boundaries for the context of certain interactions in some way. Or how do you see it playing out when we have bought versus bought between consumers and organizations? So the AI arms race has just begun. We've seen this for the past few years at Do Not Pay where every action we take has an equal and opposite reaction from the companies where we're going to see things like voice verification. So they're gonna use AI and a lot of banks already do this on the backend. They don't tell you that they're doing it. But if it's not your voice, they make the calls suspicious until you're already on a losing front. So the good news is that Do Not Pay is much more motivated than the average common-cast engineer. And so in the past, we've succeeded at these arms races. Another example is when we started sending in parking ticket letters, the government started ignoring letters that came from Do Not Pay. So we randomized the letters and then they stopped ignoring it because they couldn't be sure that it was coming from us. So there's all these steps that are gonna be taken from both sides. Regarding the common-cast chat specifically, you can't even tell whether it's a bot or not. I think it was a bot for part of a conversation, but then a human being for the rest. And even though it might not have been a bot towards the end, the customer service agents are unfortunately just acting within the script. They have a very certain set of parameters that they can authorize a refund or not. And so I think one of the biggest insults in life going forward will be you sound just like chat GPT. And so unfortunately, the customer service agents already sound like chat GPT whether they are or they're not. And so it will free up the work for them and also free up the work for consumers and the bots will just do the hard part to get the $12. Outstanding. So in effect, maybe you could imagine a kind of a funnel where the consumer can just look at the ultimate result of the question to be asked or the selection to be made and not have to go through all of the rigmarole to get there. Maybe in a dashboard or something like that. Is that what you're getting at? Yeah. And the good news is that there are a lot of rights that people have that are enshrined in federal law. Like for example, if you have an agent appeal your credit report to dispute something on your credit report, just because it comes from an AI, they still can't ignore it under the law. And so because the laws, the way they're written is like, if it comes in a latter or X format. So they can't gatekeep a lot of these use cases and that's also helpful for us. It would be an interesting application of this technology for consumers. I know we've talked about now the sort of help desk context who talked about litigation, especially pro se. I was asking about lawyers, but you very appropriately went to people that aren't represented by lawyers where the access to justice is cases very compelling. What other contexts do you think this technology could be useful for? I think it's all about going from proactive to retroactive. So what I mean by that is right now, do not pay people come to us with a problem. They're like, I wanna get a refund for the in-flight wifi. But in the future, the AI will be so good it will save you money in the background like a true general counsel. Like Walmart has a general counsel just working for their best interests. And I think AI lawyers will do that. So they'll be looking at your bills automatically and figuring out ways to fight back and you can just relax. So you don't even have to think about it. In terms of specific things we're working on, like for example, on the medical bill side, there's this amazing law, it's called the No Surprises Act. And it means that hospitals have to publish all of their prices. But the problem is in typical compliance fashion, they just publish like these obscure PDFs just to comply with the letter of the law and not the spirit of the law. So what we're doing right now is we're having AI go in and crawl all of these hospital websites and take all of their information and make it into a standardized format. So we're actually building an AI hospital price comparison website. So I'm excited by those sorts of use cases. So I think just understanding information, presenting arguments and also just figuring out ways that you don't have time to kind of look at yourself. Understanding. So we've got a question now from one of our longtime collaborators and also an advisor of MIT's computational law report, Brian Ulyssany, who we've actually know as cool Brian. He asked, as he recalls in the amazing Wells Fargo demo his words and I agree. You simply asked for a refund. You didn't provide any arguments in your behalf. Would that have been possible or might that be possible in the future? Yeah, so it definitely is. So on the Comcast example, it provided arguments about FTC statutes around like quality of service. There's a negotiation angle and there's also a legal angle. And if you combine them, then you can have success. We should have provided some arguments for the wire fees. We just wanted to be, we were worried that they could tell it wasn't a real voice. So we limited what the AI could say as well. So there were lots of constraints. But in the Comcast example, we're certainly citing FTC statutes and stuff. And I've been thinking about as you've been speaking, which is this idea of real versus not real, which I think is incredibly superficial and backward looking in some ways and need to a full, a fresh rethink for going forward. So on that, I just want to pose the question and invite you to go anywhere. I could see you were about to say something. So I don't lose that thought. But the question I have also is, would it be useful as chat centers and courts and other processes that are official start to adapt to this technology to have a kind of a recognition that sometimes people are going to be using this technology to exercise our rights and to engage in the systems and to basically have a kind of a disclosure or like a field that we could set saying, this part is coming from my authorized electronic agent, which is this bot technology. And that way we can just dispense with this whole question. It's real. And it also happens to be the bot that I've authorized. I think there will be rules around that. So OpenAI for their GPT-3 DaVinci model and others, they have guidelines that all businesses using their technology have to follow. And one of them is just what you said that if you have a bot, you have to disclose that it's a bot. And that's not very helpful for us because if we say to Comcast, this is a bot, they'll just end the conversation. So the way we get around that is, I mentioned earlier in the call, we use GPT-J, which is an open source model. So we use the heavy lifting on the back end for OpenAI, but we also, we use open source models to actually communicate, to stop these kind of regulations, gatekeeping regulations. I think that there's an argument to be made that if someone says it's a bot, maybe it loses 90% of its effectiveness, like I use chat GPT to write me a thank you note for a Christmas present. If it says this was generated by AI, then the kind of meaning of the thank you note goes away. And the same could be true for these legal cases. Indeed, yeah. So there's a lot more to do in the future as we learn how to adopt and appropriately adapt to the infusion of this technology for consumers and for governments alike. So can I just give you this opportunity as a free swim to just close with any thoughts or challenges or ideas that you'd like to leave with people, including questions you may have for us? I think this technology is overhyped and underhyped at the same time. It's overhyped because chat GPT is really good at holding a conversation. It's really good at writing thank you cards and this generic stuff. But what we found at Do Not Pay is that it's actually hallucinates regarding the law. It makes up laws and things like that. And the reason we've been able to use the technology successfully is because we have all this training data from the past seven years. Instead of saying write a dispute to Comcast, we say based on these 1,000 documents, write a dispute. And the quality is much better if you almost retrain it. So I like to say chat GPT is a good high school student, but you have to send it to law school. So in the context of this discussion and the law, I think it all depends on the training data and making sure you have really good data and wish us luck for our court case next month. That's at this MIT workshop on behalf of everybody at law.mit.edu and all of our participants. We truly do wish you luck and we hope that you'll come back and join us as you've gone through some of these early experiments to let us know how it went and what's next after that. Sounds good. I'll come back to you in County Jail, I'll come back to present. Correct. And well, you should have a bot to definitely defend your rights to stay out of jail. Thanks again, Joshua.