 The broadcast is now starting. All attendees are in listen only mode Hello, welcome everybody. This is Sartreau from LS NTAP. We have a training here today talking about the Legal Help Issue Taxonomy and the National Subject Matter Index this National Subject Matter Index has been around for over 15 years and we're really excited to have two Wonderful people here who are working on looking at how we can improve and update this We're gonna get started here in about a minute two quick announcements in the chat area I'm dropping a link to our next webinar it is on case management systems and It's coming up here We've also are recording this so if you want to go back and watch this later It will be on our YouTube channel. We've got about 200 legal services related technology Webinars that are up there. They're all free open available to the public So we are recording this and it will be posted to our YouTube channel Afterwards one last thing we're going through a redesign of our website So if you check out our website about a week or two from now You're gonna see that we've moved over to the D law template if you have any issues finding anything Please let me know there. Hi everyone. We're really happy to be talking to you today Let me go back to my screen. So I'm hoping everyone can see my presentation David and I will be talking about a project that we're working on jointly starting this summer and hopefully through the next school year all about how to develop machine learning models to Classify Legal help posts and legal help resources online So we'll give a little bit of a background about our project and our motivations Most of today's talk and most of what we'll ask for you and your feedback on is how to adapt the national subject matter index as a taxonomy of legal help issues to be more Flexible and applicable for machine learning purposes So we'll talk a little bit about how we're approaching taking NSMI version one and cleaning it up merging it with some other relevant taxonomies and Reviewing it to make into a version two And we'll be asking Perhaps your involvement in reviewing the draft that we're creating as well as giving us feedback today about other concerns or principles Or strategies for making a really useful version two of a legal help taxonomy So That's the basic structure of how we'll talk today I Wanted to just Give a little bit of background before I hand it over to David to talk more about our machine learning project and our motivations one is that we've been Will we really appreciate that the national subject matter index exists? It's a wonderfully comprehensive list of terms and Issues that are common particularly for civil legal aid Purposes and I know that many groups have used it to structure their website navigations and to encode what they're working on for Billing or record-keeping purposes But as we've approached it from a more machine learning purpose where our goal is to label Different posts that we have from data sets from reddit's legal advice board or from ABA free legal helps conversations In their virtual clinic between volunteer lawyers and litigants As well as from many self-help court resources or public court help Legal help resources that we've scraped from online. We've not found that NSMI has been The terms in it have not been very easy to use when doing this labeling So that's where this motivation for the project comes from is to think about what would a better taxonomy be that would help us label for machine learning purposes So I'm going to hand it over to David to explain a little bit about what we mean when we say Machine learning for legal help or machine learning for access to justice. I'll click through the slides David if you can Give a little more description Sure, let me just make sure I can see everything here on my machine great So the thing we're working on here is sort of sort of three pillars that sort of come together so we've got this notion that there's access to justice issue which we're all very familiar with and The question is is there anything that machine learning can do to help with that and machine learning is great Except that it needs well labeled data. So in order to learn Quote-unquote learn something it needs to have a lot of examples of that something So the question is Can we find that label data anywhere and the answer unfortunately for the use cases? We're looking at is no so there's not a bunch of data out there that has people's lay legal questions and then Basically being key-sided with issues saying oh this person has this issue or that issue that type of Training data doesn't exist. So what we're trying to do is make that data So that then we can train machine learning models to do this issue spotting basically So what are we talking about? We're talking about if someone has a question Can a machine be able to do some issues spotting on that. So yeah moving on to the next slide there So how how we're looking at being able to do that is we want to gather data sets of these lay legal questions, then we're also interested in Professional statements of issues and so the idea is that what we want to do is eventually We'll create a taxonomy that says these are the a comprehensive tax on this is the version to taxonomy that will incorporate pretty much every Issue you can imagine and some might be a bucket that has a lot of issues in it But the idea is we group it up in the buckets that covers everything comprehensively even if there's a larger over What that then allows us to do is if someone can say oh, I'm having this problem My landlord is doing x y and z to me our machine learning model could say hey that looks like a housing issue it could also then look at the Courts website you could see that it could say hey that looks like a housing issue and they'll be able to say hey this person should go to this resource or We might be able to look at Population level Say look at all the questions that are coming in to an agency And then be able to make some statement about what the services are there need so we see a lot of housing And a lot of this at the end of the day though this model is only going to be a best guess It's not trying it's not a robot lawyer. It's not you know some determinator coming to take people's jobs It's trying to make it a little bit easier for people to get the information They need by doing that first a little bit if they don't know what to call it They don't know that this is trust in the states. They don't know that this is You know some some bit of nomenclature. They don't know so they can't search for it If they just say it naturally which we'll be able to connect them So what we're doing is we're taking examples from the wild of people asking questions Primarily a BA legal questions and Reddit legal advice questions and then we're using a tool to crowdsource the label and know of those questions And later on will provide you with a link to that tool so you actually can help us do some of that labeling So There's there's a nice little picture of what the tools an old version of what the tool looks like and actually looks a lot prettier now And then the idea is that once we've labeled this data We'll be able to release part of it out to the public so that other people can train machine learning models on it So what that's going to look like it's going to be a question from John Doe and it's going to be labeled. This has a housing issue. This has a family law issue this doesn't have Towards issue or whatever that happens to be and then other people can train models on that as well Now of course, we're limiting what we'll be able to share there because we're only intending to share Labeled data for the reddit data because that data was something that two really important things One it's a they already those questions are already available on the web Visible to everyone in the world and to the moderators at Reddit Do their best as the questions that come in to try to strip identifying information from that. Yeah, so there's The expectation can explain what the Reddit data you have is because I know a lot of people aren't familiar with yeah The Reddit forum and the millions. Yeah, we have a slide here that has some examples One one back. Yeah, my computer is lagging. There it is So read it if you don't know it is a website where people can It's a discussion forum and what people will do in a subreddit Which is a portion of the website devoted just to legal advice is they will ask quasi-legal or legal questions Now this gets touchy and a lot of people have a lot of strong feelings about this We are not interested in the answers they get for our Purposes what we're interested in is what questions do people ask so when someone has a problem? How do they phrase that problem? So what happens is they might come in and you see here You know, I did not get the job. I wanted because owners thought I was gay But they did offer me a different position. Is this legal? And then they might have additional information in a longer part of the post where they talk about the issues And so the question is this is an employment issue You know, what what can we glean from that? Based upon their post so the expectation by all the users who are making these posts is that anyone in the world can read them And that's the only reason and and the moderators additionally go in and help them to make sure that they're new Identifying information. That's the only reason we feel we're going to be comfortable sharing those back out with labels associated We have another Source of it, which is the ABA free legal answers and these vacancy operate Sort of like a lawyer for the day situation where someone will write a question Closed it to the website and then the ABA tries to match that question with an attorney Now there there actually is unlike the reddit legal advice where primarily it's not lawyers involved Here these are These are confidential client communications So we had to work out a deal with the ABA where we would protect those conferences. So we're never going to expose Those questions out to the public But what we can do is we can train We can use them as part of our label data set and use that to help train our classifier And that will make it better than it would be if it was just trained on The reddit data so we'll share as much as we can and then we'll share the results of Everything that we're able to train on and the main thing there is that the the population the ABA free legal questions is going to be hopefully more Representative than the reddit question so a reddit question skew to the demographic of the users of the website Which you know has a tendency to be? male young Internet users so that's Very different than you know the general population so anything we can do is to first buy those cycle questions We had a more representative sampling The important thing to realize that this also helps to realize where it might also be useful and not just taking some on a website and Directing them to resources, but in the ABA context you can imagine using it to help connect people directly to the attorneys They need so one of the problems with systems like these free legal question answers Is that people will ask the question and then that question never gets claimed by an attorney? So it might never get answered and part of that's just a matching problem So if we can sort of have a guess at what practice area this question requires You know, we can do a better job of matching people to the attorneys that might be able to give them answers so those are the sources and sort of some statement about the Motivation behind what we're doing and Then the main the other thing there is that So this is all going to be Labels yeah So the labeling is going to be done by a crowd that's going to be mostly law students and Lawyers basically anyone that can access this website that we've put together And so the taxonomy also needs to make sense to them such that when they are reading a question They can look at it and I can say hey does this This says family law. Okay. Well, is this a family law question? So it needs to make sense to those users so in that respect That's one of the places where we're going to want to feedback from you in relationship to the labels that we've all pretty identified Does this make sense as something that you feel confident label and then to be clear how we take this information in The reason we can get away with not vetting every individual who just comes to website and labels things is because we have developed a Method by which we come to a consensus as to whether or not something is or is not a certain type of Text so it's it's similar to basically people voting on it, but it's not a straight majority of rules vote But basically what happens is after enough interactions have occurred such that X number of people have said yes, this is something Then we're able to say okay now We're going to count that as authoritative really having that issue spotted and so we're taking it and manage of the wisdom of the crowds to be able to label this data And then the idea is to yes, please step to a greater community So yeah, the ultimate goal would be then to Take the label data and also as well as releasing it to the community Trying to build some machine learning models and classifiers that do that automatic classification both of posts made on Reddit or AVA or other kind of lay forums as well as Perhaps automatically classifying the resources that are available on a self-help center website from a state court or from a statewide legal help website With the goal that we can then perhaps better identify it for search engines that we can better pair or be a Rosetta stone like Function to pair people asking questions with free jurisdiction correct public resources Even before a lawyer or other expert can review them but connect people better to the resources that are already out there So I don't know David. Do you want to talk a little bit about model building or? Just as a final Yeah, I mean is it what we're looking for here is we just need for the most part We're looking for just volume of label data. So the more data we get The better we're going to be able to create the models the initial models We create are gonna be proofs of principle to show that this can do something our hope is that by putting some label data out there other people will be able to use that to train models and Maybe figure out stuff that that we weren't able to and then also by sort of being this trusted Curator of data from more places than just public. So hopefully we can get some other Other partners to share data with us We can then use that to train models on this sort of larger collection of data And then share out those models without violating the confidence of the individuals The stuff that we did and in those cases the labeling obviously wouldn't be done by the public They were done by people in those organizations or people who are covered by their confidentiality But the idea is just that the more the merrier And of course, one of the things we're committed to on this is open sourcing All of the results to the extent that we can Regard taking into account client Yeah, so this is part of a larger project The goal really is to then figure out how to make this label data or this classifier natural language processing tool as Relevant and problem-solving as possible. So one of our big goals is to at least get to the point where we can classify the existing court self-help or statewide legal help resources with these codes with these taxonomy codes And refer back to them as kind of a standard so that whether Especially when it comes to Google searches or Yahoo or Yandex or other search engines That we can identify to these search engines exactly what legal issue is present or what type of resource for a legal issue is present on a given Court or legal aid website within schema.org made a data, which is a whole other topic But just that we have a standard way of identifying what legal issue topic is being discussed Whether it's for referring people directly or using search engines or Presenting that knowledge to search engines so they can connect people directly to the right type of resource Our whole big goal is to make the internet a smarter place to make better referrals or better connections For people to the right jurisdiction and to the right legal issue So that's one of our end goals We had a wonderful group here at Stanford last July that helped us start thinking about that kind of schema.org or how to make Start tagging up issues. So that was one of the genesis of the projects One of the other things that we're thinking about later on is what other interventions we could use This prop this power these machine learning models to then Deploy we're not at that point yet. We know there's lots of ethical And legal professional issues with putting let's say a bot on reddit or on Twitter or other places where people are expressing their legal help issues, but we're very interested in what could be an ethical and supportive way to use this ability to classify issues for people online So with that I wanted to then go back to the question of the taxonomy and How we could think about reviewing the national subject matter index to be a better Infrastructure support for this entire project and for future machine learning classifications and kind of semantic web projects So again, I'll be asking for your feedback in a little bit I'd be I'd love to hear if any of you have tried to use NSMI for any purposes and Where you felt any frictions or frustrations? So I'll give a little bit about where where I've experienced some frustrations in trying to make it work for our project And then some of the revisions and the next steps that we started on and that we'd love your help in reviewing So three of the key things that we've been looking for from a taxonomy Particularly the NSMI to serve this machine learning purpose is that any term within the taxonomy would be clear So when we present that term to a law student or a lawyer And we say does this term does this issue appear in this reddit post or this ABA post That pretty much consistently anyone would say that it does that people understand the same concept with the use of that term The other thing we're looking for is that as David mentioned earlier any issue that we're seeing that is a legal issue Would be included in some level of family within that taxonomy that there's no outliers or orphans that don't belong somewhere and the final one is kind of the balance that There's enough families to cover all of the many many legal issues out there But that it's not so fractured or that those parents those head category labels are not so overbundled that they lose their clarity So I'll give a little bit more examples about what I mean. So when that that first That first standard of clarity again We want relative consistency that people who are semi expert So not you know practicing benefits law every day or practicing family law every day But at least have a kind of a 1l or a bar exam level education in most of these topics That when they see that term, they know how to read it. They know how to apply it and right now many the NSMI terms are Clear only when you see their full Categorization so for example, if you just take a given term out of NSMI often right now It's just a single phrase or a single now like the term marital status. So if we showed a Taxonomy term from NSMI right now to most people in a labeling context like read this reddit post Do you think there is a legal health issue regarding marital status? It's a little bit confusing because the term is so there and Only if you see the full lineage of that term that it actually belongs in the category of work discrimination at work marital status Could you then understand that that term actually refers to being discriminated at work because of marital status? So just the way that NSMI was originally set up You can't really read a given term unless you know all of the lineage So that's where we're thinking of how we can make these terms more clear more resonant with how Both laypeople and professionals actually describe their resources so that it's easier to be consistent across labelers when deploying these terms The second factor we were looking for was inclusion so that most all of the issues that were presenting that We know as lawyers do have a legal That they are possible legal issues that they have a place in the taxonomy that they belong in some family But what we see right now with as we try to apply NSMI especially to the reddit data set There's a lot of issues presenting that are either very 2018 or 2010 issues around online activity Sexting and other types of activities that might not have been common when NSMI was created 10 or 15 years ago And also a lot of issues that are not necessarily the bread and butter of legal aid civil legal aid groups So a lot of torts. There's a lot of torts on reddit and right now the torts categorizations within NSMI are pretty bare bones there's not a lot of subcategories or fleshing out of different issues around problems with neighbors problems with Strangers in your life accidents other kinds of disputes So we need to do a lot more bulking up to make better homes for these issues that are not currently that prominent within NSMI And the third factor of keeping that balance of having a good comprehensive amount of parent families top-level categories Without mashing them up too much So that it's it's clear how to categorize stuff, but that there's not millions and millions of parents So right now NSMI has a pretty sprawling amount of parents So at the top-level categorization things like bankruptcy and taxation and consumer are all separated The benefits are all separated into different parent categories So like social security and social security disability benefits are totally separate categories of NSMI And if you go on to NSMI's website, you'll see there's just really long list of at the parent level, which makes it a little bit unapproachable So what we've been doing is thinking about how to make some marriages without over mashing those categories I'm trying to do that as sensibly as possible and still keep really distinct sub-level categories So let me walk you through a little bit about how we've been approaching cleaning up the NSMI and preparing for an NSMI version too So we've gone through two phases of refinement. The first one was kind of just like a little bit of brute force of making some of those initial Parent categorizations clean and taking out non-legal aid issues from NSMI to separate out And the second one has been more closely detailed refinement term by term So in our first passive revisions, we brought in some other wonderful taxonomies from Illinois legal aid online who had made a very comprehensive taxonomy of issues that are presenting through their website and traffic on Illinois I think mainly for the goal of web navigation though, please correct me if I'm wrong and they have done a lot more Full phrasing of the issues. So it wasn't one term Terms, but it was kind of phrases said expressed in the way that a person would express the issues So that was very helpful and then pine tree legal aid also shared their web navigation taxonomy So we brought those all those taxonomies together into one giant spreadsheet of almost over 3,000 terms And then we went through the kind of again this brute force of removing non-help issues out of NSMI Streamlining topics into similar parents and then kind of doing a prioritization of parents to be paying attention to So that combining of taxonomies and laying out all the different top-level categories that currently existed and then There were a lot of things inside NSMI that were more about administration of a legal aid group or a nonprofit than about Legal help issues of lay people. So we took out all those categories that were about Fundraising technology for legal aid management and operations dealing with LSE communications and marketing and of course, they're still there They're still preserved, but we're going to focus NSMI version to to be more focused on The help issues that people expressed Sorry computer lag and then we started to Streamlining into common parents So we started to see of all the top-level categories, especially when we merged multiple taxonomies How do we put them together so that we don't have 50 or 60 top-level categories? But we have a more streamlined list and I'll show you that list in a little bit But we started to see how could we put like Social Security into a benefits category or public utilities into a housing category or a consumer we also Move to a system where we have co-parents so that a subcategory does not have to have a single parent issue Because we realized there were so many transverse issues that could be categorized as benefits or as work or as money So our top-level categories Could be or a subcategory Like let's say workers compensation Could be have multiple parents. It could be classified as a work issue It could be classified as a money issue. It could be classified as a benefits issue It can live with multiple co-parents. That's okay. And I think that helps reduce a lot of the the term proliferation from NSMI We tiered the parents so that we are going through our refinement our second level refinement of cleaning up term by term Based on prioritization. So we're starting with the most high-traffic families. This is not a final list This is just a draft list of the different families And then we had a second tier list of issues that were not as common At least what we were seeing on reddit and ABA and then we'll be cleaning them up After we get to the first tier of families Yeah Sorry if my computer is lagging Just a second So now we're entering into our second pass of more careful refinement of NSMI And this is again where I'm going to be making asks of many of you legal experts on the call to help us Because we at Stanford are doing a draft of what these particular Terms and categorizations could be and then we were hoping to have Experts who are great at benefits at work issues at consumer issues come through and help us make sure that We're categorizing things correctly that we're not missing things that that you've seen, etc So in the second pass of more detailed, we're going through and we're cleaning out duplicates which of course exists because of the merger of different taxonomies as well as There were a lot of terms that were coming up multiple times because they were put into different parents But now with our co-parenting scheme We can take out a lot of those duplicate terms We're also doing a lot of adding to the phrases of terms like I had said before we're talking about clarity of NSMI terms kind of building out from one or two word nouns to more gerunds more semi-sentences to Be more expressive and hopefully clear about what a term is referring to and then we've been doing a lot of streamlining of Categories and subcategories to not have such a massive thing. So we put all of the terms We still have over two thousand On one massive spreadsheet and we're going through basically line by line of these two thousand terms and seeing is this a duplicate Is it phrased clearly and how do we also clear up the categorization so that we have relatively consistent? Categorizations at child level one and child level two just a little bit of terminology Parents are the top-level families. So we have up to three parents for an issue We could have more but we haven't had any need for four parent issues And then we have child level one, which is kind of the first level of some category child level two for second And usually we have the concrete issue presenting around child level three or child level four so we've been going through line by line and trying to Streamline those categories again, but keeping a home for everything And then we've been sorry my slide keeps slipping. We've been making those high-level parent categories And I'm happy to walk through more deliberately, but I just have a gif that Shows you basically what the high-level categories are So we have definitely fewer than NSMI originally We have a few straggling issues that are hard to put in correctly like around FOIA or gun rights or voting rights We put them under government services. Most others we've been able to fit What we think is in a really responsible way Into one of these high-level categories Then we've been refining the child level one category So the first level of subcategories to streamline those so that we don't have like 50 or 60 Child level ones within a category like benefits or families Which we started out with because there's a lot of small straggling issues always the problem with legal stuff There's always lots of edge cases or small issues. So we kind of That's fine all the small little kind of orphan issues can still live within the parents But generally we try to make a under 10 Child level ones within a given parent issue And so what we're at right now is we're going through line by line We're making steady progress and what we need and what I'm going to ask your help for hopefully is That review from the expert community because we're doing our best at Stanford to make this draft But what we really want to make sure is that we're doing a responsible job and a legally accurate job that we're not violating any of those principles around clarity and giving every term a place and not overmashing parents or Overcategorizing so the asks from you and where I'll stop is if you have any interest in this and you have some expertise in Civil legal aid issues worth keeping criminal to the side if you have a Civil legal aid expertise if you sign up or help out reviewing our draft. We really appreciate it You can talk to you about that And I'd also love to hear if you have tried to use National subject matter index if you've had frustrations or other principles that you think we should be taking into account Any any other recommendations or Pushback thoughts you have based on our presentation today and our final ask is to come try our labeling tool It's actually really helpful to reflect on what makes for a good taxonomy when you try to Apply the terms directly to people's search queries and posts So I'll stop there. I don't know David if you have anything to add, but I'd really love to hear any questions Or feedback from the participants Yeah, I just want to second that if you try out the the tool of the you are all below learn hands dot law dot Stanford dot edu that that in some ways can make it a lot more concrete Because you're fair. I know I was going through and I got to something and it was an IP issue I'm like, where does IP fit and I don't see how it fits. So that that This rule interaction can help a lot. I guess this is where we open it up to questions. Is that is that how this works now? Definitely and please if you guys have any questions Please feel free to type them into the question box or use the raise hand function because we can unmute people So what what type of a timeframe are you looking at on this project? When do you want this first level feedback by? Well, we're going through the families of the main family. So we have kind of benefits work Money debt and consumer almost ready for review. And then we'll be doing housing torts family I think The high-level categories we have about 10 of them that we're really looking for people to review even if it's just a review of the The top subcategories it doesn't have to be a long exhaustive like spreadsheet experience We're really just interested to know if we're overcategorizing or we're missing some big issue areas So even if it's like a 10-minute review, we would really appreciate experts input And to do that review, do they just go to learn it hands?law.stanford.edu or how do they do that review? E-mailing me. So we're doing the taxonomy refinement privately it's not on any public website So it would just be emailing me. I can send my email address out MD Hagan at stanford.edu or I can I can put it On the chat. I will drop it into the chat right now. Perfect. We can do it directly Yeah Oh, and then the the time frame wise how how long are you taking feedback on this particular round? So we'll see How much feedback there is but we're hoping to have a pretty solid draft of child of a child level 1 child level 2 Within the next month for their top issues So we're hoping that if people have any time over the next few weeks that they can write us back We can always add more issue areas in we can We can be flexible, but we're at least trying to get that basic infrastructure so that as we start labeling that we're doing it In a legally correct way Yes, also, we don't want to do a thing when we start labeling and then realize oh my goodness We've been labeling things that we need to split into two Because then that's gonna All those labels we have to go back and do So Bill Jones from ABA has a question here. How does it work? Does this tie into the Microsoft portal project at all that Hawaii and Alaska are doing is this parallel with that interaction? So the goal is that the taxonomy can be something that's a shared resource across many of these projects because we know that as we Especially with the portal project is going to be a very intelligent technology. It needs to be labeling In the same way consistently all of the resources that it's directing people to and the search queries or other kind of The people's needs that they're presenting on the platform So the taxonomy we're building for our project. We're sharing with Microsoft AB APU with the goal that We're building this ecosystem. That's all referring consistently to issues in the same way So this will all feed into the portal project as well Yep, and just just to make sure people know Microsoft donated a significant amount of Time in helping to develop an online Triage issue spotting tool that is currently being developed for those two areas Hawaii and Alaska that is all on open source Technology we had a talk for the tech fellows about it, but early next year. We will do a Webinar highlighting where that project is We should say that Pew Foundation is supporting David and My project on machine learning. So we're all connected Yes So is Mark lords in here Might the legal Publishers like West be helpful in their own taxonomy experience and assets key numbers, etc What is being looked at to look at those pre-existing things? Just like to say hi mark There So I don't know mark mark. It's more the in the driver's seat on the taxonomy. I Think the main thing here is we're looking at yeah Oh, I thought the question was for mark, but it's from mark. It's from That's great Yes, I would say the more taxonomies the merrier it'll add to our work But it'll be healthier in the end So if you do have any Taxonomy whether it's from a legal publisher a legal aid group legal help website Built for navigation if you can send it to us at least we can match to see whether we're covering The same issues that you've been covering if we're categorizing it in a consistent way anything that will help us confirm or Raise any more suspicions about whether our taxonomy is accurate. Send us your taxonomies We want them even if it's just your website navigation order. It will really help us So that's that's saying if you happen to be on the call and you are from one of those companies and you are willing to share So we're where does end user kind of client testing of this work into the project so that will be So for the taxonomy part and then for the classifier part It's a little bit separate for the taxonomy part We're hoping that after we get the expert review We can do some card sorting exercises where we give people but this will be more for the labelers So like whether to a law student or a lawyer or if we give them a given Child term and we ask them to categorize it that they would put it consistently in the same Category as we expect them to so at least we can see then if people are Our target labelers are labeling in that clear way that that we're hoping for so we're going to be test running the taxonomy from like law students young lawyer perspective that way when it comes to Kind of the fruits of all of this labor. That's when we think about lay people Then we'll be testing more of the the actual types of like interventions how we could use classifications to connect them to resources And then it will be about what is this trustworthy? Is this accurate? Is it does this help you more of the usability and engagement questions? Yeah, and there's a robust we're doing you know sort of a robust data, you know Sort of cross-validation as we're building the model So we do we are having to assume that our labeling once we have it is the ground truth Which is why we've established a means to have multiple people come to agreement on whether or not a label is What we say it is but once we have that ground truth established in there And there's a good amount of stuff we can bring to bear to to measure how well we're doing and in fact That's another that's another one of the deliverables That will be provided because the the labeled data set as it is will also be able to provide as a benchmarking tool So if we accept that as ground truth and we say we get a x percentage of accuracy on Labeling these texts then that can become a benchmark against which other people can then come in with their own models Using that same data set as a benchmarking tool to establish whether or not they've actually made something that improves upon it So from Jack Haycock, this is really exciting I'm curious how you approach categorization for some state-specific law issues or terms things where two states may be using similar or different terms such as forcible entry or Warrant of inhabitability those type of things So right now we haven't gotten to that level of children We've been cleaning up child level one child level two which means kind of higher-level categories about Problems with current job problems with horror job problems with job applications So we've been focused more on the the context of where problems appear to make sure that we're getting consistent with that or maybe like Under benefits, it's more about the different benefit programs. You can apply for what phase you are and applying for those benefits So we haven't run into the state-specific things. We haven't made those difficult trade-offs or not difficult trade-offs But like the getting into the more nuances of state-to-state issues as we go into our next month of work We're getting into the like actual specific Laws programs benefits offering state by state and we can see that from the different like the Illinois Taxonomy clearly has a lot of Illinois specific benefits programs or housing laws. So we're going to be reckoning with that soon I think What we're trying to do in our first thing is really to make it national still So we're removing the references to specific like state by state programs or agencies things like that We're trying to still cover the The actual experience of the person or the need of the person that's being expressed with that so needing this type of help or Looking for this type of relief and then we have to think about whether we allow State-specific like if there is a state that has a certain type of relief or benefits programs or offerings that other states don't have That's fine. I think we need to include that rather than cut it out so that they will be able to label Consistently, but we just won't have it won't be referring We're trying to take the state-specific terms out as much as possible unless it is very unique But that said that's why we need more expert review But as we make assumptions here at Sanford that they are being checked and we know whether we are When we take a very state-specific term from like Illinois taxonomy or Maine's taxonomy that we're not over generalizing it or misreading We try to research every single term that we That seems to be state unique or program specific But I'm we're trying to do with the best job that we can but that's why we need you Jack I mean you just keep in mind the goal We're trying to have this this general taxonomy that can be pointed at or mapped from individual situations So the idea is to generalize across that so it doesn't matter what you call it The main thing is that you can see there's a mapping from one to the other excellent So jet Jack's very happy about that excited to help out with it. There's another question here that Close so if this is used for search or for a chatbot or something else How will it deal with? The arcane legal terms is it going to return answers in plain language or how do how does that? work There's sort of two answers to that right so the The taxonomy we're using that as something to label different data sets So they're actually sort of two types of data sets that we're labeling we're labeling the laypersons data sets Which we'll just use how people speak and then we're labeling court resource Data sets which might include some of that arcane language Basically what happens is by training the model on the court data set We'll figure out what is the mapping from arcane language to the taxonomy and by training it on the Person data set we'll figure out what is the mapping from lay language to the taxonomy which then allows us to have a Translation now, that's not to say that we'll have a tool that you type in you know This violates the rule of perpetuities and then it will spit back out to the layperson Well, you see so the thing is if you have a will and you just want to declare someone and it's not going to do That what it'll do is it'll be able to say okay. These are What within that taxonomy this is where they both map so at the highest level and say okay Well, this is trust in the states so they both come here and again. It's not trying to We will probably never get to the level of the specificity. Well, we'll be able to say you have this question Here is the answer to your question. What we're trying to do is say you have this question Here's where you can find the answer to that question So I hope that I hope that answers your question. I don't know if I did I forget what the question was I think so That is that is all the questions We have at this point. I just want to remind people that this has been recorded. We will put up a video on youtube We'll put together a blog post about this. We'll make sure that the Ways for people to contact you are available Thank you so much for coming out any closing remarks or final things that you want to say Well just email if you have any questions or really Do some labeling. Yeah do some labeling. It's fun. We tried to make it as gamelike as possible On that exact thought give us feedback about the tool as well. So tell us what you like about it. What is fun? Eventually we might get to the stage where you know, you can have running competitions to see which we can do the best labeling So tell us everything that'll make it fun for you Yep excellent Thank you, and we look forward to seeing more progress on this and We will stay in touch Greatly appreciate it. Thanks. Thanks David. Thanks everybody