 Good morning folks, I think we're going to get started. We may see a gentleman from New Mexico join us in the front row shortly, but hi everyone. I'm Erica Rickard, I'm here from the Pew Charitable Trusts, and I want to give a quick plug right now. Pew recently launched a new initiative to modernize the civil legal system, and we've just launched our website. PewTrusts.org slash Modern Legal, I hope everybody takes a look after the session. But not during, because we're going to be very insulating throughout this conversation. And I'm really, really excited to be here with my colleagues. We have Sartreau from LSN TAP, David Coleruso from Suffolk Law School's Lit Lab, and Margaret Hagan from Stanford's Legal Design Lab. And what we're going to be talking today is about building the legal help AI infrastructure. Our friend Kate from Australia said recently, AI is no magic bullet without magic data. And that's one of the things that we'll be talking about today. So it is possible to have magic data. What we'll be talking about today, I'm briefly going to go over some of the set of foundation for what we'll be talking about, about access to justice and how AI, specifically machine learning, can begin to chip away at some of the access to justice crisis in this country. We'll be talking more in some detail about taxonomies and creating a common way for us to speak the same language with each other about legal services and legal resources. We'll really dig in deep with David about what actually is machine learning in a very broad overview that is very deep. And then turn to some specific hands-on examples. I'm really excited about this panel. We're talking a lot about AI throughout this conference. In this panel, we're really going to be talking about some concrete real applications that exist today and that we can begin to use together. And with that in mind, after Margaret and Sart talk about these two specific examples, we're going to work together on those examples themselves. So I think here in this room, we don't need to belabor the point that there is an access to justice crisis in this country. One thing I would add to that is that the word crisis implies something new and unpredictable and that we're urgently solving. And unfortunately, that's not the case here, right? Could have been an instance in time is really becoming the norm, right? It's really come over time to be something that we accept that looking at, for example, the LSE Justice Gap Survey, looking at the number of millions of Americans who have legal problems that they're navigating by themselves, over 80% of which receive little or no legal help to address those problems. What we're here to talk about is not going to solve the entire access to justice crisis, but it is one beginning, one starting point to looking at what can we do using new tools to start to chip away at some of the problems that we see. So we're going to specifically be talking about the way that people find and connect with legal information and legal resources, including primarily attorneys, and how we can use machine learning to start to do that a little bit better and start to help people connect more accurately and more quickly to legal information and services. So I'm going to show just a couple of quick examples. This is something that technology is being used in the service of connecting people to resources in a variety of ways from asking people to type in information and be connected by sending legal questions to folks who can respond to them, by creating ways for people to navigate through paths. These are created by legal services providers, by state and local courts, by federal courts, and by pro bono services as well. And one of the things that we struggle with is being able to ensure that people are quickly and accurately being connected to a service or an information that is actually relevant to them. So that's the question that we have at Pew. We're working on legal information and assistance portal project. Many of us around the country are working on different portal projects that are attempting to build new technology components that can better enable people to go from understanding that they have a legal problem to connecting to an accurate resource. It's a big challenge. And one of the things that we're learning is there might be ways that we can use different technology to do that better. So thinking about the intersection between the resources that we have and the needs that we have, so we're looking at this access to justice crisis and one small narrow piece of that being the very first step of connecting to information and resources. Looking at machine learning as something that we know exists and we could maybe better use somewhere in the access to justice space, thinking about classifying and issue spotting legal issues. And then this last category is thinking about the attorney power that is not being fully utilized. One of the challenges when we've talked in the past about an access to justice gap, often it's been described as there's increasing number of people who need legal help and don't connect to lawyers and then at the same time there's all these lawyers who aren't finding ways to connect to provide affordable legal services or be able to use their expertise in a way that is useful to the people who need help. So trying to find a way to find that intersection between attorneys who want to help, people who need help, and then how can we use machine learning as one tool to try to bridge that gap? That's where our project comes in. So we're building some real fundamentals here in terms of both building a taxonomy, building a way to label information and then actually encouraging all of you folks to actually do some labeling of that information so that we can actually have that rich, useful, usable data set that we can start to connect people with. So I'm going to turn things over to Sartre and Margaret to explain a little bit more detail about what that looks like. So who here knows what the NSMI stands for? Acronym, acronym. So that's like a third maybe? Who here definitely does not know what NSMI stands for? And here may have heard it or used it but doesn't know exactly what NSMI means. Okay, so what we're hoping to transition from this big vision of machine learning and new technology, helping to solve this giant crisis, our vision is that we can at this nascent moment of all these new chatbots, new tools, new AI, including all of these wonderful projects that have just gotten funded by LSE TIG. How do we from the beginning make sure that we're being interoperable, standardized, and consistent across these many projects to build this ecosystem? And here's where we think something that seems a little bit geeky, a big list of all the legal issues that exist in the US in the civil sphere, could actually be a really good investment at the front. That it's not nearly as exciting as beautiful predictive models or interactive chatbots, but it's one of these infrastructural pieces that if we invest in now will hopefully lead to this wonderful intelligent interoperable system of the future. So what is NSMI? Sartre, what is NSMI? The National Subject Matter Index started at... So the National Subject Matter Index, because we love acronyms here in the law, started as a TIG project and was created as a list of all of the different civil legal aid issues. It was written by lawyers, for lawyers, and used inside of websites. It was originally integrated both into the Law Help Interactive platform and into the D-Law platform, and it very long list of terms. Now, one of the interesting things about this is that as it was created, it didn't really have a governance structure or a simple way to update it over time. So as things moved forward, we ended up with a static list. We moved that to a simple Drupal install and gave it an interface where it could be easily updated. But this is really the first project that has taken that list, looked at it from more of a plain language and user-focused perspective, and started the process of really updating and modernizing that list. So we had a foundation to work off of and an opportunity to improve it here. Does anyone here actually work on NSMI version one, devoting their hard labor? Wonderful, thank you so much, because it is amazing. It is 2,000 words, it is a great, it's amazing to have that giant list of terms that we could then use as this base infrastructure of how we start labeling and having consistent tags, markup and terms across the internet, across the bots, across all of these tools. That said, it's not the best fit for AI purposes, including the applications that we're going to talk about right now. And I think mainly the point is that because it is so comprehensive, it's not necessarily easy for outsiders to be able to take the terms from outside of the context of kind of the legal aid mindset. It's not always legible or intelligible. I'll show you what I mean. So when we're thinking about labeling for the purposes of AI and what the big list of legal issues could be, we want something that most people, not just legal aid experts, would be able to understand what the heck this term means. So I'm going to give you a little preview of the project that we're going to talk about later, Learned Hands. This is a post from Reddit, from the Reddit Legal Advice Board. And we in our AI project want to be able to know what is the legal issue that we should be surfacing out of this person's post on Reddit. So take a minute, read it. I've been living in my apartment for nearly two years now. Shortly after I moved in, there was a leak in my ceiling that damaged my mattress beyond repair. My landlord grudgingly paid me. I'm not going to keep reading. But what we want to do from the AI perspective is be able to train a model, a machine learning model, to take in this text and then pull out what issues we want to surface for whatever purpose we're using the AI for. And to do that, we need a relatively consistent, illegible term in this place. NSMI version one has a long list of terms, but they don't always necessarily work in this context. So let me show you an example of a legal aid term. Do you see a legal issue around eligibility in this post? Eligibility is used in a lot of different places, a lot of different codes in this first version of the legal aid taxonomy, the NSMI taxonomy. Because if you look at it within the nest, within the branches, it's clear that this eligibility word refers to a specific thing within housing or benefits. But if you take it out and try to use it in this context, who knows what eligibility means? So obviously we need to have a new version, a new update that's legible outside of the context of the long list. The second is a lot of expert language. So do you see a legal issue around 4D, another example from the taxonomy? Does anyone know what 4D means if you work in child custody or other things? It is from child custody, right? But no one outside of a handful. We can't show that to law students and expect them to consistently label that that's the correct thing. So that's the other pass. It's around child support and jurisdiction. I don't know, I'm not a child supporter. All right, thank you, John Grayson. So we need to be able to kind of rewrite these things so that law students or other lawyers who aren't child support experts can label that we can have a larger number of people label. So our motives in doing a review of this taxonomy is not just because we're word geeks, we really wanna make something that can practically make this long list of terms that will then be a consistent kind of infrastructure across many different AI projects. And we want to make sure that these terms more reflect how people talk about their issues as well as how lawyers categorize them to be this kind of Rosetta Stone function. As well as taking a lot of those 58 categories and 2000 terms and streamlining it to something that we don't have to label every single Reddit post with 2000 terms or 58 categories. It's just a lot more work, the more duplicates and categories there are. We really wanna create this as a shared resource. And we're hoping that if you, who here is working on something where a taxonomy where they're using tags or markup or we know the portal project is, we know, all right, there's a fair number of hands around 15. So hopefully we can talk to each other and make sure that our projects are interoperable from the start. So I'll just tell you a little bit of the nuts and bolts of the past year of wrestling with this long list of 2000 terms. So we got all of the NSMI taxonomy as well as wonderful other taxonomies from Illinois Legal Aid Online and Pine Tree Legal Assistance, which had more kind of user centered categories they'd used for their website. We printed it all out and we tried our best to streamline. So it was literally covering an entire conference room with all of these legal terms and then grouping them, finding where we could streamline and make a balance. We removed from NSMI a lot of the management and technology and how you manage a legal aid group that will still exist in NSMI version one, but our goal in NSMI version two is to focus just on legal issues and not on the operation side. And then we streamlined those 58 top level categories into 20 main ones. So we took and combined a lot of things so that there's not as many duplicates or as many kind of branches off. Now we're going from that basic cleaning up to going term by term through all of these 20 different families of issues and we're cleaning up duplicates. We're trying to make all of these terms like 4D into something that's more legible to outsiders and we're trying to streamline a lot more. So we're just going line by line through a beautiful huge spreadsheet of like 2,500 terms and trying to combine and make more consistent. And as we're doing that, we're also looking at models. So, and David can interrupt me here, but he's built this wonderful tool that takes all of these posts that have been made on Reddit that we're using for our other project, which we'll talk about in a minute. And he's done topic modeling of all how people talk about their problems on a legal advice board in Reddit and seeing what are the issue clusters. Each of these, correct me if I'm wrong, David, are clusters of topics that have emerged from people's descriptions of their problems on a legal advice board. So it's these categories that emerge more from the crowd, at least the Reddit crowd, than it does from how lawyers categorize them. So we can kind of check, are there issues that aren't present in the NSMI legal aid taxonomy? And short answer spoiler, yes, there are definitely lots of issues, mainly around things that civil legal aid attorneys don't cover. So tons around roommate problems, tons around torts, IP, online, abuse and harassment, lots of kind of these new issues that are cropping up. Do you want to say anything about this slide, David, before I? No, no. Okay. The other way that we're checking to see what are the missing issues is literally I have been reading thousands of Reddit posts on the legal advice board. And I encourage you to do the same. It's actually a lot of fun. Again, we'll give you the link later, but our game that we're gonna talk about later called Learned Hands, you can see that I have applied 20 2,956 labels and won over 1 million points on our game, come play and try to beat me. But I have been reading. The one thing you should know is she is not always in top. I am number two, yeah. Yeah, oh, there we go, there's. Yeah. So it's possible, you could be crowned. So I have been reading all of these posts on Reddit legal advice board where people are coming to ask for help on their problems. And a lot of them don't have a category currently in NSMI version one. So we're gonna be adding new categories that are more like 2019 as well as non-legal aid, legal issues that are popping up. So all of that said, as we're creating these labels, we're putting it into our labeling game. This is our first and most important use internally, but we're hoping that these labels, these 20 main categories and all their many subcategories can be useful to you if you are marking up a website, tagging resources, creating a bot, doing referrals. So you can come and see our current version of NSMI version two, which SART has generously helped us host right alongside NSMI version one. We're not meaning to replace version one, but have version two be more applicable for AI oriented projects. So we're not trying to obliterate, it's more like having a new sibling to version one. So come and see, you can see all of our high level categories, those 20 main categories, as well as the main subcategories. And as we flesh out all the sub subcategories and sub sub subcategories, that we'll be putting them up. In fact, if you are a family law expert, can I see a raise of hands? Are you willing to put your hand up? I'm gonna probably bother you. I actually have our full version of our family law taxonomy here and I want to get your feedback on it. So I'm gonna bother you later. And in fact, we have gift certificate compensation. If you have any interest in geeking out or helping us test NSMI version two, we can pay you with Amazon gift certificates. Just send me an email. And the other request I would have, going back to SART's comment about governance on version one and how the first version was untamed garden that didn't necessarily stay up to date. If you have an interest in governance and power, do you want to be on a steering committee to help make sure that version two stays lively, relevant and hopefully useful to our community? I don't know, SART, do you have anything else to say? No, I mean, long term, that is really part of our focus here is that as new issues like cyber bullying has appeared in the last 15 years, how do we integrate those things into the new version that comes out? We really want this to be a living document and LSNTAP is willing to do the coordination, the behind the scenes to work with a group of individuals to really govern and keep that up to date a few times a year. Yeah, we might even be able to get you Amazon gift certificates too, but you also have power. Yes, like total, I don't know. It will definitely be less than 2000 because of we're removing all the duplicates and having kind of co-parents. So anything that would have been cross-listed under benefits and under family now has a single code and just lives under both. My guess it'll be somewhere closer to 1200, but I can't say for sure. I can give you the family one, for example, so you can see how, but we're not removing a lot, we're just streamlining more, yeah. Okay, I'm going to then go from taxonomy over to machine learning models. Oh, Mark, different purposes, I think. Repeat the question please for the recorded folks. Thank you, Mark. I was asking about the notion of having two versions of NSMI version one and version two co-existing rather than somehow, because as I understand, I understand that there may be needs to be different terminology for AI projects, but as I understand it, most of the statewide websites and LHI and systems like that are using version one, so if there's going to be some connection between the things that you have problems on and then, hey, how do you go find the resources that relate to them, wouldn't they need to intersect? So that's one thing I didn't mention is mapping NSMI version one codes, that long digit to an NSMI version two code. So that's part of our project too. We don't want to force people using version one to go back and recode. Well, additionally though, as version two is really developed then the community will need to look at version two and see the ways that it could be useful outside of just AI. The plain language focus on version two could be extremely helpful for search engine optimization and for metadata purposes for sharing our resources more. So there may be a need for a TIG project that looks at how that could be integrated better into existing websites and really what the future of version one and version two are. It's difficult to predict that with version two still under development, but six months from now, that would be something that could be very, very useful to look at. Sard, I would think you could write an easy conversion program so that you could map the version ones into the version twos. Yes, there definitely can be some mapping, but if we decide to continue version one, there would need to be some additions because some of them just don't map at all. Definitely. So this is super exciting, although it's also like the taxonomy problem in any field is known to be a wicked problem that any given solution comes with other problems. And so I think we can reasonably expect that through this process, you are going to make maybe even better decisions than the previous process and also still there are going to be challenges. And so I guess one of the questions is, in what ways can this community cope with these challenges over time rather than assuming that this is just like a project that will solve this problem? Both from the changing terminology and also the fact that you may have a group of legal aid students from one part of the country that have vocabulary that would be different from another part of the country and also vocabulary in one particular context is not gonna be appropriate in a different particular context. And so in the field of library science, there's a whole civil war between people who are still loyal to the notion of a taxonomy and others who are like, we should be structuring our systems to simply match people based on the language they tend to use without any rigid controlled vocabulary. And so I guess, like, do you see opportunities not just to do a project, to do a new version, but also to develop the kind of feedback loops that can cope with these challenges that are really not solvable over time? So that, I loved the use of the term of art, wicked hard problem, which is a term of art. So yeah, one of the things we're trying to do, so one of the things we'll see later as we get into this and when we were asking, so one of the things that was talked about was the idea of what if people have different ideas of what a label means and their labeling. So we do have some stuff that we work in into the labeling that we're using for our data, for training purposes, where we actually take into account, we have a, it's not really a consensus algorithm, but it's a method that we have that is looking at figuring out how do we get, how do we agree that this is something? And we base that on a statistical basis. We sort of get a little geeky. I think I can do a little, we assume that there's sort of a bimodal distribution between the issue is or isn't there. You're randomly choosing it from a population and you're doing a Wilson's confidence interval, right? So you're sort of getting a statement about what is the percentage of people in the community that would call this thing that with error bars on it. And then we look for those error bars to get us at the 95% confidence level. And then that's when we call something labeled. That's in the game that we'll talk about in a minute. That being said, the greatest thing I love about the way we're, and you'll see more about the game because we have a text on, I mean now we have labels. We then are asking people to help label text as you saw Margaret preview. We do not at this stage have to have those labels right by five PM at the close of business today, right? So we're keeping all of that data as people are labeling and if we find, and we're gonna go back and we're gonna evaluate that too. So we've got our 95% confidence, this is right. We're gonna take that, we're gonna test that against some experts and we're gonna see how well does that perform. And maybe the Wilson's confidence interval is not the right way to go. There are all sorts of other methods that people use when doing labeling, especially. A lot of really interesting methods have been developed around people doing mechanical Turk tasks to try to figure out how you can avoid the problem of someone who just always hits the right button, right? So yeah, so the point is we have some of those feedback loops and any ideas you have and this is the most exciting thing about this project right now is that it's happening, right? So like, it's not just like, oh, something's going to be happening, we're doing it and we want feedback. So yeah, those are the conversations that I could go on forever but I think we have other. Was this the handoff or do you, yep, okay. I need the clicker, oh, there we go. All right, so moving into wicked problems, right? And unintended consequences. This is the part where I try to put you at ease about the fact that we're talking about machine learning and which often gets just conflated to be AI and most of these purposes and this wonderful, Dr. Ian Malcolm there telling us, you know, we're so busy that it should be actually, I've done it. Star and dinosaurs, they're so cute. Okay, so probably with the feathers, can you imagine? Okay, so I love this quote from George Box who's a statistician. In certain circles, I put this up and everyone would groan because it's like, oh, that, quote again. But I think I'm good here, this is probably new for some of you. And the idea, all models are wrong but some models are useful. And so when we're going forward with this task of trying to imbue machine learning tools with some super air quotes understanding that we're gonna put out in the wild and have them do something, we have to ask ourselves a set of questions to make sure that we're doing that responsibly. And so I take this George Box quote and I like to decompose it into two parts. First, if we focus on the fact that all models are wrong, well, that means that output of the model should start the discussion, not end of the discussion. So it's, and this of course is a sort of sliding scale based upon the potential costs of getting something wrong. So that's the one thing I always wanna say, it's gonna start the discussion, not end of the discussion. And then since we're trying to figure out if they're useful, you have to ask compared to what. So the reality is that a lot of times we're dealing with things that we're comparing to what, the comparing to what is nothing. So the question, if we, and this goes back to Jim Sandman's comment, something is better than nothing, right? And so you have to ask compared to what and then make sure that you're starting the conversation, not ending the conversation. So those are things I wanna do to frame to make sure that we think about how we're approaching this. So just a quick bit on this, if we go back to some of the use cases we were talking about, you can see how this would start to work in a situation where say you were doing something similar to a legal answers online where you were doing a pro bono lawyer volunteer project where someone comes, they write their question and then it goes off to a volunteer attorney, they write back with an answer. So they're doing limited representation work. A lot of those programs are set up by state bar organizations, other different entities. A lot of those, the people who come to the, to use the tool, don't get an answer because they never get matched with an attorney. So if I'm a volunteer attorney and I've been doing housing law for 30 years, I'm not gonna do someone's family law issues and they're not gonna want me to do it. So there has to be that sort of last mile problem of connecting people. So if we had a tool that could just do a little bit better job of getting it in front of someone, knowing that it's gonna end up in front of someone and that someone's gonna go, this is stupid, this isn't the right type of question and then they can correct it, then that's a reasonable use case, right? You're gonna have a check on that. So that's a situation where it's starting the conversation, not ending it and it's better than the alternative because someone didn't have to do that classification. So if you talk about the systems at scale, yeah, sure, a person could just say, oh, that's housing, send it to George, well, if you're talking about thousands or tens of thousands of emails, that person's time who was classifying could probably be better spent doing something else, whether or not that's the user, better spent doing something on their own time or a staff member in the organization. All right, so what is machine learning? I love this joke tweet from Amy Hoy. It says, by today's definition, y equals mx plus b is an artificial intelligence spot that can tell you where a line is going and actually, she's not too far off the point. A lot of times when we talk about these things, we often end up talking about things that are just linear regressions or logistic regressions, stuff you might have learned about in your statistics class. In fact, the compass algorithm, the risk assessment tool that gets talked about a lot is probably just a logistic regression. So this brings us to the elephant in the room. This is also the slide that usually gets the best response. Big data and the law. Some mad Photoshop skills, I'm very proud of that. So what's different? So if I just said that machine learning is really just statistics, what's different? Well, the thing that's different now is now we have data and we can use that to our benefit. Really a couple of things have changed. One, we have the availability of data. We also have machines that can now crunch the numbers a lot better. We have some developments in algorithms or some fun new things on the bleeding edge with neural networks and such, although neural networks themselves have been around for 50 years. Mostly it's that we have an abundance of data and we have powerful machines. And so how can we use that to help us out? What is actually going on here? So the thing we sort of danced around, I don't think we've made it explicit, we have this game we're gonna show you at the end where we're gonna take these questions from our legal advice, which I know there's a lot of debate about, oh, our legal advice. Well, we're not looking at the answers people get. We're looking at the questions they ask because those are real people asking real questions. We actually talked with the moderators over there. They provided us with the information. They have forum rules that prohibit people from posting personal identifying information. The moderators work to help make sure that's the case. And also Reddit, of course, built itself as the front page of the internet. So the expectation of the people asking these questions is that they are posting to the front page of the internet. So, and we talked with the community there and they felt very happy to share that as an opportunity of a text for us to label. So basically what we're doing is we're taking a text and we're gonna turn it into numbers somehow and then we're gonna do something with it. And in this case, what we're gonna do is we're gonna take these texts of people's questions, somehow turn them into numbers and see if we can figure out if we can spot issues. How in the world does that happen? Okay, well, I'm gonna give you just a really, really quick, not as deep as Erica was saying, introduction. So here actually is a, this is from a Google blog on natural language processing using a method called term frequency, inverse document frequency. And that's just the way that it turns the text into numbers. The point is natural language processing, the way it works, you just have to understand, is there's text, you somehow turn it into numbers and then you're gonna draw some lines. Okay, that's really all you're doing. The question is how do you turn it into numbers? So just to give you an idea, this method of turning things into numbers, what it does, it sort of says in the name, the term frequency, it looks for the frequency of terms, so it'll look for certain words and the more often they show up, it takes that as a signal. So what it basically does is it makes, think like making a giant spreadsheet and you make a column for every word. Actually, no, scratch that. Let's go back and think about your email filters. I have an email filter, I have an email box that says free food, okay. So I had to come up with a set of email filters that would tell me when emails came in that we're about free food. I work at a university, so this happens with some frequency. So I look for, I think right now, my search terms are pizza, cheese, wine, or derbs. I work at a law school, which is why I can put wine in. I think that's what's in there now. But the point is there are certain terms I'm looking for and if I get hits on those, I say that's probably a free food email. I should read that as opposed to the other list emails. This is mostly applied to list emails I get. And so that's useful. I get some free lunches, which is nice. Don't let them tell you there's no such thing. Although, well, someone's paying for them, I'm not, sorry. Okay, so you could imagine taking that and basically converting those numbers. You have a column and a spreadsheet for pizza, for cheese, et cetera. And the number of times that word shows up in an email, I'll just put that in the column. So, and then each new email is gonna be a row. So pizza shows up, cheese shows up. Right now under my very simple rule, anytime those show up, I say it's free food. Well, I could do that for every word that I knew from a set of texts. So I could take all the emails I'd ever gotten and I could make a set of columns for all the words in there and I could count the number of times and then I could say, okay, well, anytime pizza shows up a lot, that's maybe talking about food. So the higher the pizza number, the more likely it's food. But that's sort of, it's not quite right, right? And also what about all those words like and and thee and whatnot? So then you actually do this inverse document frequency and basically you sort of demote the values of words that show up a lot. So you can say, okay, well this word and is showing up like crazy and must be super important in these documents but and shows up in every darn document. So it's not super important. So you look at that inverse document frequency and you sort of take out that signal. And so what you end up is with just a set of numbers that count the words and certain words are sort of demoted importance and sort of dropped off as not important about the document and certain words sort of are telling you something about that document. And then what you can do is you can plot that. Like you didn't think this was gonna be a math lesson but you know, we haven't even actually really told numbers. Like counted the pizza twice, number was two. So here is a projection of these were news articles that were taken that were gone through and basically there was a column made for every word in there and sometimes you can also do phrases. So you could do like starry decisis, right? So different loopings of words. Put them together and this is a projection of those numbers into two dimensions. What the heck is a prediction? It's a shadow, okay? So if I have three dimensions and I make a two dimensional projection that's like if I had a bunch of dots and a three dimensional thing, I shouldn't light through it. I hold the paper here. That's my 2D projection. So just imagine you have a space that's however many words you have. Oh my God, I see like 10,000 dimensions or something. Everyone's like what? I'm thinking about the new. The point is you've taken words, you've put them in, you've somehow turned them in the numbers and then you plot them. Okay, we're gonna use a lot of air quotes here because any statistics props out there are gonna be like. Okay, so you plot them. And then this is what's interesting. Right now these colors come from the labels that Google put on of the areas of the news articles. So these came from news sources. And so they either showed up in green is business, the sort of gold is entertainment, gray is sports, purple is tech. And the thing that you notice is that lo and behold, the similar topics are grouped together. That means if I could come, and does this pointer have a pointer on it too, that would be really exciting. It has a little, well, something, I don't wanna hit too many buttons. If I could come up with a space there and a new document came in and I took its numbers and I did its projection and it fell right here, I feel pretty confident saying that was an entertainment article. Right, so all I'm doing is I'm taking text, I'm turning it into numbers, I'm sort of drawing shapes around the numbers, finding borders, boundaries. There are different things that can come on. If you're doing sort of predictions, you're usually making equations like a linear regression and you're trying to predict where the dots are gonna fall, classification problems, figuring out sort of issue spotting, you're trying to figure out what side of a boundary something's gonna fall on. So the important thing is we just take words, we figure out some clever way to turn them into numbers and we hope there's structure there. And if there's structure there, we hope we can figure out some boundaries. There isn't always structure there. That's an empirical question. And there are more clever ways to turn things into words than just counting up how many are in a document. So some folks at Google discovered that if you try to predict what the next word in a sentence is going to be and you do some feedback, actually pick what the word is on either side of something, you can create these numbers which are these word vectors that have really interesting mathematical properties. So like it turns out that there's a num, you can, and we talk about these, making these words and the numbers and figuring out these embeddings, these how they would fit into these dimensions as training a model. And so when you train these models, what happens is you get really interesting mathematical properties of these words. So like if you take the word king, that's gonna have dimension one is gonna be 253, dimension three is gonna be 675, dimension five, and you have like 300 dimensions. And it turns out if you take the number that's associated or the numbers that are associated with queen and you subtract the word that's associated with woman, then you end up with a number that's really close to where king is. All right, so you can do that with all sorts of interesting things. So there are directions in these things that actually have some semantic meaning. And so that's kind of cool. But how do we know it's actually working? You know, we can fool ourselves, right? I usually wait for people to read this one. Right, so we can fool ourselves. And I also gotta watch my time here. Oh, I think I'm doing it. Yeah, I'm doing it. So how do we not fool ourselves? Well, what we do is we randomize things. So like there are some, I thought I saw some folks from the A to J lab here. Erica used to be with the A to J lab. The idea is, you know, ideally you wanna sort of randomize studies. That's, you know, the gold standard to figure out whether or not you're tricking yourselves with numbers. Well, what we do in data science is we do a holdout set. So we might take in a bunch of documents. So in this case, our documents are people's legal questions and we apply labels to them. Those labels came from the NSMI version two. And then we train our model on them, which turns them into numbers that we embed in some space. So now they're in this weird multi-dimensional, about 300 dimensional space. And we draw boundaries. And we wanna figure out if something comes in and it's in this boundary, is it actually the thing we think it is? And so what we actually do is we have the data that's already labeled and we take 20% of it and we set it off to the side. And then what's left that 80%, that's the thing we use to train the data. So we'll use that to make the embeddings. And then we'll come back with our model and we'll take that 20% and we set aside and we'll feed it into it. We know the answers, right? Because we already had that labeled. We feed it into it and then we see how well did it do. And then if we can get results that we like, then we can be happy about that and we can start using that. And the great thing is these systems can learn from repeated use. So one of the things we built into our approach is that once we start to train these classifiers, we're using these classifiers to go back and look at all of the texts that we haven't labeled and say find us the things you think are housing issues, but you're not too sure about. Because that then will take those and we'll put them back in front of people. We'll have people label them and then we'll feed that back to the machine. And now that issue that was a borderline issue for the machine is now it knows what the right answer is. And you do that enough times and enough times ends up being hundreds or thousands of examples and you can actually start to get pretty good models. And what do I mean by pretty good? I usually don't like to call a model good. I like to call a model not bad. And the reason I like to call a model not bad is because a lot of times people fixate on a certain number. Usually people fix on something like accuracy and they're like, hey, did you hear that? I just saw on the legal blog someone made a model that can 98% of the time beat attorneys that doing this or that and they're gonna take our jobs and whatever. Yeah, accuracy is not the thing you wanna be looking at. And I'll use the example I use with my students. We build a model to try to predict whether or not school's gonna be closed. Due to snow days, I work in Boston. So this happens sometimes. And the point is you can make a model that says whether or not school's gonna be closed that is 90% accurate by always guessing, no, it's not going to be closed. So accuracy is not the thing you wanna measure on. You wanna have come up with other metrics and those people from library sciences are gonna recognize things like recall and precision. So the idea is when I say something is a something, how often is it that thing? And if there's something in there, how often do I find it, right? And you can take those numbers and those are metrics that are gonna be more interesting. So let me show you this, this is hot off the presses. This is I think the housing, the top level is something housing or not housing, model that I have right now based upon the learned hands data. And we have something, ROC curve, the things people really wanna look at is the accuracy, so right now it's 94% accurate. Okay, well I told you don't use accuracy. So in order to figure out whether or not it's not bad, right, we gotta say, well, could we have done better by just always guessing yes or always guessing no? Well it turns out if I had always guessed no, I'd be right about 79% of the time. And if I'd always guessed yes, I'd be right 21% of the time. So I'm doing better than always guessing yes and always guessing no. So I call that not horrible, right? The other thing I'd wanna be able to do, I'd wanna be able to sort of beat a coin flip in a lot of different scenarios. So I had mentioned something about recall and precision. So you can actually see the true positives, false positive, et cetera here. And so I got 61 individual true positives. 239 true negatives. So those were two times when the model was dead on right. When the model was wrong, there were 13 false positives and five false negatives. And so you have to start asking yourself, well what is the thing you're interested in your model doing? Are you, what's the cost of a false positive versus a false negative? Do I want a model that's gonna sort of hoover everything up and send people over someplace or do I want a model that's when it tells me something I'm really sure it's gonna have something? And this is where we can borrow ideas from medicine with the difference between screening and diagnostic tests. And this comes back to always have it start to discussion, not end to discussion. So you gotta know what your model's doing and then use that in an appropriate way. So a lot of what we're doing is an empirical question, how well can we make these models? And so this is just with, well you can see how many labeled issues we have there because you can see all of our, we got 61 plus 239, et cetera. You just add those up. That's the number of labeled texts that we're dealing with to get this benefit. And so the precision, the time when it tells us that something is housing and it is housing is 82% of the time. And the time that there is a housing issue and it actually catches it is 92% of the time. So I like it, if those numbers are bigger than 50% of the time, I say it's not horrible. So I would say this model is not horrible. Nine out of 10 times, it's gonna be right. Nine out of 10 times, it's gonna catch something that's there. Eight out of 10 times, if it tells you something there, it's gonna actually be there. And this is just sort of one of our starting high level. And so it'll be very interesting and there's sort of an empirical question. Different areas will be different because different people talk about things differently. So if people are in family law, they might be always mentioning brother, sister. And so they've got that email filter where it's just, oh, I see brother, sister, blah, blah, blah. It's family. But the more instances you have in your training data, you can sort of catch some of those edge cases and you can change things. But if you're in a different realm, say consumer protection, you might not always mention the same words. So you're gonna need more data to be able to catch sort of all those examples. So it's a really interesting sort of empirical question as to what we'll get there. So I just like this graph. I thought I'm not bad model, this is actually happening. So I think now what everyone really wants to do is start getting their hands dirty. So I will pass back over and then we're gonna do this thing which you never do which is pull out your phones, pull out your laptops. Let's go for it. So I already gave you this link. I don't know, has anyone visited so far? Oh, we have visitors. All right, any feedback already on the categories? If in case you can't tell, we're in testing mode. We want feedback. We've done several in-depth kind of expert reviews where we've drafted versions of this taxonomy and then gotten on the phone for 20 minutes or an hour with a family law expert or a housing law expert to have them tell us why we're doing something wrong or how they would recategorize. So we invite you to come and email me specifically with any criticisms, concerns, other things that should be on there or if you think the category levels are wrong, please come and we want the critiques now as we're starting to formalize and get down to more subcategories where we think there will be more controversies or state to state differences, those challenges. So I'll just reiterate, come visit, come send me criticisms and I might also be able to pay you for your thoughts. But from that, unless anyone has any taxonomy-related geek out questions they wanna share publicly, oh please, yes, Claudia. Thank you, I'm Claudia Johnson and I'm gonna use an analogy. The National Science Foundation sponsors a lot of genetics research and you know now cancer medicines are using that genetic database to develop particular treatments based on genetics. They realized that the genetics data that they have collected wasn't diverse enough and now they have an all of us initiative that is going, it came to my area because we have an incredible, I sit on the board of the Federally Qualified Healthcare Center where we serve primarily migrant workers which includes a lot of people that are coming from Oaxaca or Mixtecco, et cetera. And they're going to other parts of the country. I think that they have five or six catchment areas where they're trying to get different racial groups and ethnic groups donate their genetic material so that then can be part of their database. What is the equivalent of that here in the context, for example, of Indian law? I live in Indian nation, right, Pacific Northwest. I'm really close to the Yakima Reservation. There are different laws there and also your data sets are coming from Illinois, Maine and Reddit, which is a complete different population than the 70% women and people of color served by the LSD grantees. So what are the plans to include legal problems that reflect the geographic and historical diversity of poverty in the United States and are you, who is going to fund the all of us initiative so that you can bring that outside data set and feed it because it's being right now where you are, you're creating the math. And if the math is based on a data set that is not inclusive, it's akin to perhaps potentially creating something that will end up harming the communities that are going to benefit. And so in terms of inclusion, not just inclusion in the data set, but inclusion in the analysis of the harm, right? Because that's a judgment call. How are you going to bring these people that have the experience to say this is real, even only it's five, this is what reinforces the systems of power? Yeah, so obviously we have started this project with Reddit because it was available data and but we have been very conscious from the start that this is not a representative group and we do not want this to be the be all end all of how the classifiers are trained or how the models are structured. That's where we are calling on many of you who sit on data sets, whether it's live help chat requests, email requests that come into your court sites, your law help sites, anywhere where people are talking about their problems in their own language and it's written down and captured in a CSV file or you could export it that way and you feel comfortable sharing it with us. We welcome that. We've used other data sets that we're not allowed to talk about publicly to be able to train our classifiers internally because of data sharing concerns. We actually aren't allowed to use those to train publicly available classifiers and this is something that we have to wrangle with with the privacy is how we get data sets that are able to be shown to law students and other lawyers to be labeled by people outside of an organization and also where the organization is comfortable with them used to train publicly available classifiers. Do any of you have data sets that you think would be more diverse than Reddit's largely white male 20s, 30s audience? Like, what are examples, please? Potentially, yes, provided that you have a way of parsing voice. Yeah, we have a call center and we support 40 languages and so there's a vast amount of data if you want to parse it and then illegally. Yeah. Correct. So a couple things, one, this sort of says we just have to get into text, it doesn't have to be into text originally. So I know that in some situations, especially, so when I was in law school, I did one of my summers at the Navajo Nation and there, the language is, although there's a transliteration, people, it's a spoken language. So one of the situations we're looking at, another possibility is to have people go in with workers and with consent of the individuals who are expressing a problem, transcribe what those problems are and then share them with us. Now the thing is, what we want, the people who are making, the question was sort of how do we evaluate the harms? So the point is that's a conversation we want to have with those people who have that data and what we want to do is we want to come to an understanding with them about what are the things that they, that's the conversation that needs to be had with those people who are in the communities who are working with folks towards the data and we really want, so this is really, I mean this is an ask for everyone here, is we want you to help provide us with data so that we don't have data that's only reflecting the questions being asked by white young men on the internet, right, because that is a very different set of questions and in fact we do have these private data sets and you see different, you see a lot of the same problems presenting but in different proportions. So with the Reddit data set, we see things at a lot of housing issues because, and a lot of those around renting and renting issues but in other data sets you see a lot of family issues, family law issues and that tells you something different about those populations and so we want to engage those of you who have access to this data and who are in different communities to sort of be the voice in that community to help us figure out what are people talking about and what we really hope will happen. So one of the things is the Reddit data, the reason we started there is because it was the front page of the internet, we can set up a game like this that lets people label it and we're okay with the fact that people are, anyone's just reading it. If you give us a private data set, what we'll do is we'll come up with an agreement whereby you or your community can use the same interface that we have to label things. So you might just say, okay well privately, we only want our attorneys to see this information so we'll set up and the functionality's there so if we can come up with an agreement between our agencies as to how you want to get that over and we don't want any data with any personally identifying information so when we've been talking with people what they've been doing is they've been redacting things, giving us stuff without PII and then either if they're fine with us labeling that then we can do that or they can internally, we can hand the tools over to them, they can label the benefit is that we can then put all that text together, we can train a model on it and then we can share that model in a way that is responsible because it doesn't share the underlying questions people were asking and now reflects a more diverse community. So that's really the big thing we'd like is to be able to act as a storehouse, as a trusted partner to help do that training. When we have public data sets like the Reddit data set we will make that available with the labels people have put on it so people will be able to use that to train things and to do benchmarking and other things that are important but we never want to hold data that we never wanna share data that's gonna compromise clients that's where we're coming to first. And so another thing is you might be thinking oh okay well now when you build a classifier which we're gonna hopefully release under an API so that any of you can use for free so you could ping us with some text and we'll shoot back the NSMI version two number. Okay well what's gonna happen there and what's gonna happen there is we're gonna have a set of responsible use constraints on the API so that those people who are using it have to abide by certain like really simple things like if you're using this in something that's a chat bot tell people it's a chat bot. If it's something that's going to be a lawyer communication we might come to the conclusion that we actually don't wanna store any of that data at so ever so we'll store it just long enough to do the classification and then we'll pass it off to you and then we'll forget about it but if we can find a situation where all parties are happy with us storing that data a little longer then we actually can use that to improve the system and that's the other way we can get at more diverse populations as this actually gets deployed in real world situations and dealing with real clients we can use that influx of data to better train the system and again we could set that up on private so that only people within that agency are working on the data from their agency and so we're gonna be really thoughtful hopefully about how we store that data what we do with that data as much as possible we don't wanna have any PII so that if there was a compromise that's not an issue obviously there's a lot of PII in the context of the issues people might be asking so we wanna take all the responsible things and a lot of times ideally I would love that to be something like how Signal does it like if they get a subpoena like the data is just not there you know like and this is the I think this is something that's gone on a lot of the conversations here as you know what about the data after you what's happening to it so that had been a little bit of a long answer I've talked too long I'm gonna hand the microphone to Margaret well I think we had two questions Danina and Gray there was a spectacular project it's been exciting to hear about it it's evolved so quickly but I do have two things that I'd be interested in hearing how you're thinking about one is data reuse which is to say it's not obvious to me that organizations out there represents room can, should or are comfortable with simply handing over data I mean obviously clients are a very vulnerable population so that's one issue secondly as you know I guess I have three issues de-identification is doesn't get us that far in terms of dealing with sensitive information that's a big issue so data reuse the limits of de-identification and then you said Margaret you were wrangling with this question and it is a really difficult question how do you deal with the fact that you're getting data from private sources that you're not identifying and the reason and I understand maybe I misheard that but the reason that's the right description of it then it makes it harder for the community to give you feedback you know the feedback you're already hearing Reddit white men this age whatever you know about the data sources and how good they are in terms of training the system so those are some questions I had and I'm sure you've thought about all of them so I'd be interested in hearing Margaret do you want to take the that last question I think this goes to the title of the session which is infrastructure I think this has been a test run for us over the past what six months that we've been working on this eight months we're interested in having more protocols so that we're not doing one off like Stanford to a single organization that we then can't talk about and it's private and we need more consistent ways to set up ethical as public and interoperable as possible data sharing agreements so I want that like having gone through some of those relationships over the past year I think we kind of learned from that to think about how can we have a more standardized hopefully people in this room who are working on data projects who have data to share or who could benefit from data that we can think about having a shared repository shared MOUs or other kind of agreements that bake in a lot of our ethical concerns protect the clients but also allow for scrutiny and that balance between privacy and kind of open scrutiny is possible so all that's to say is the wrangling is happening but we'd like it to be not just the two of us not the four of us but a wider community where we could have some kind of shared resources and protocols and then of course like the Belmont report is probably the thing like hovering in the room right so like this sounds a lot like human subject trials so like all of those concerns that come around sort of like around individual autonomy those are all those are all things that so I guess the point is this is not something that social scientists have not been struggling with for a long time the scale and some of the tools are different but the the core principles of personal autonomy and informed consent are things that we want to take very seriously and so that's why when we talk about sort of data your use might determine whether or not we store it all right so if there's no opportunity for someone to give informed consent for their data to be stored or used in a future way then we we just can't have it right so that's why I say it's going to be in these they're going to be case we're going to look for cases where we can find it because there's going to be a benefit to the community but this is the tension that you have in I forget what the the terminology they use in the Belmont report the after personal autonomy the second one and how that's often in tension with the other has to be some like realizable result to the to the individual right so here you're going to get and you can't be coercive about it right so it's not like you can only use this if you opt in sort of thing so we we don't want to set up that sort of scenario so we want to make it so that it really is you know people or or or organizations on behalf of those people with their best interest at at mine and and depending upon different stages like this bit of work uh... we were dealing with existing data so these were not identified these were not identifiable people with whom we were having interactions um so that that's a different scenario than what we're seeing going forward so obviously they'll have to be new protections in place there i don't know if that in so you asked about data storage we might not store at all if we can't you asked about i don't know that we answer your question so yeah okay yeah i'm going on additionally programs are going to need to consider what what does their engagement letter look like what does their terms of service for the software that they provide to people what is the privacy policy a lot of organizations are really looking at those privacy policies given what's going on in california and gdpr this is a time to look at that data that's being brought in how we can respect the clients rights and how we can create useful pieces of information out of that to inform these projects i should be clear one of the next steps is the explicit creation of responsible use constraints so if you have ideas on what those look like or should look like please please get in touch this is not this is a conversation is going on all over the place around data for the good and data reuse and so it's worth not trying not to reinvent the wheel their resources out there there people who've really thought long and hard about it so bringing them in and having getting some advice from them may short cut the amount of wrangling and stressing and hair pulling so greedy it looks like it looks very similar to a lot of all right i think we're going to go to the next slide which maybe people have been waiting for and hands there's the game it looks nice on your laptop or your or your phone there's the url uh... learned hands dot lot at stanford dot u so uh... if you go there uh... you'll be asked to there are several ways that you can interact uh... you can uh... you can sign in with uh... social media or you can slather created account right here you can create an account the idea is we need a unique identifier for the labeler so that uh... we can look at the history of your votes so that we can can disentangle things if it turns out that you're really bad uh... issue spotter uh... we need to be able to to know that i sort of explain how the uh... how we determine whether or not something is labeled the cool thing here is that that confidence interval that were calculating that allows us to present questions to people that are gonna get the most bang for their buck so what's gonna happen is if someone comes in the way it operates if someone comes in say there's no one's ever use the game something's pulled out at random and then your ass is their family laws there this is there that and then someone else comes by and then they're gonna see that one you just saw because now there's a potential bang for your buck in that one and if the first three or four people also the same thing then we're like oh okay well that's clearly this if there's disagreement then we have to hold that open longer and the exact nature of how we do that is that wilson's confidence interval uh... but the point is like we're not going to spend more time labeling things than we need to and we're not going to trust one person to do the labeling and then we will go back and we will evaluate whether or not this method uh... how it performs to sitting down you know some some experts and and they're labeling and yeah and it's a lot of fun you can uh... well you can try to be margaret's top scorer you can get badges you can read interesting legal advice posts yeah and there's a whole community of folks who just read legal advice questions on reddit just for fun so here's an opportunity to do that and to to do a little good yeah everyone's getting interesting questions already great for your commute uh... if you have the train and i just want to reinforce that we're really excited about this crowd approach it's been used in all kinds of other crowd science projects that you might have read about in the popular press where we have lots of people especially people within a community like the lawyer community coming together to apply their knowledge to train these models hopefully for the public good so be part of our community we really appreciate it you can tell from our discussion that we're in the middle of a lot of big questions and planning and thinking about how to make more of this infrastructure more of these protocols the floor is open for more questions or thoughts before we break for lunch how much capacity is there to make adjustments to your current NSMI version two uh... and at this point is it just you and David and start deciding whether or not you take our suggestions or well you want to if you want to get to the top level categories it's a little bit harder but the subcategories are still malleable that's why we're presenting it now and asking for your feedback so what we've done so far is my team at uh... at stanford's i have a team of about ten folks and three of them have been working with me to do a lot of the deduplication deduplication and the uh... the basic cleaning up and then we create a draft and then we've been talking to experts in hawaii and alaska as well as folks in main and michigan where we show them and then spend like thirty minutes or an hour on the phone to say does this fit with the issues and the the importance of this type of issue like is it at the right level of grouping and the right level of hierarchy uh... so we have this kind of qualitative discussions uh... but yes we are making decisions that but we are opening it up and looking actively for people to give us feedback so we can happy to talk more this is more of a comment but from an australian perspective i just wanted to congratulate you this is so exciting to me it's part of the reason why i came to this conference i think there's a lot of complexity in this and i think the questions that are being asked here are so important but i also just wanted to give the optimistic you that i think this is so exciting and i think that you should all be really uh... proud that this community has come up with this initiative and from an outsider's view i'm really jealous it has to do with support from erica and her team at pew they really made it possible and moved really quickly so it's wonderful to see that kind of agility uh... in the funding sphere as well i just wanted to underline one thing that maybe we hadn't underlined enough which is that in summer there are a lot of benefits to building these tools but these are infrastructure projects so yeah the classifier is one of the things that's an infrastructure project that's a cool thing to have an issue spotter but it's also a trojan horse to get the adoption of of nsmi version two right imagine a situation where everyone's actually labeling things with the same names well that means now you can start to discover resources and you can uh... refer people to places in a meaningful way so there's a lot of network effects that build upon the adoption of these tools even if you're not gonna use the the classifier it is uh... is a tool that is going to help spur like i mean eric and i wrote a paper a while back on data standards and how do you get them and everyone wants to use the data standard everyone's using okay well what if no one's using a standard right so usually what happens a big market dominant player comes and uses it and that market dominant player is usually has profit incentives and their de facto their data the way they classify the data becomes a de facto standard because they're the eight hundred pound gorilla i think the really exciting thing here is this community is the one setting the standard so we're setting the priorities in conversation with our clients and the people who are affected so hopefully i mean we're gonna make a hundred different mistakes but hopefully we'll make mistakes that will err on the side of our clients i just think that's a beautiful thing so adapt the standards even if you don't want to use the fancy shiny machine learning stuff last chance going once going twice all right thank you so much oh john sorry got it under the buzzer this may be silly but are have you got commentators from both community property and non-community property states uh... if anyone wants to volunteer i have lots of copies of the family law taxonomy in full all right thank you john now that i think that's great as we get more into the subcategories because at the highest levels it's easy to pretend that we have consistency across the country but as we get into the details clearly yes we need and this is one of the challenge areas of uh... having a taxonomy that streamline but still covers fifty states pretty in the community property uh... just because we're at stanford doesn't mean we're focused just on california experts and also the taxonomy i like to think of it sometimes not as legal issues because that always sort of implies legal remedies but more like uh... factual patterns so you might have something that is a factual pattern that is presenting as that everyone would agree was a fact pattern but in one locality there's no remedy for whatever it is uh... hopefully this would identify the fact pattern that's why this is not unauthorized practice of law right it's not applying the fact pattern to the law it's just trying to figure out what's there can we sort of squint and get it to the right place uh... and hopefully that right place also involves an attorney and you'll see once we get down to the specific if you want to look at a detailed taxonomy and give me feedback most of our uh... new taxonomy terms are not about legal remedies they're not about d four or a certain statute it's more written from the user's perspective so it's how do i get a guardianship of a child within a state that's uncontested or how do i contest a proposed guardianship so it's more written from what people want less than what a lawyer's suggestion would be because we're trying to again match it to how people are framing their problems and how we categorize that i think we're done thank you so much