 The email out Okay Robert is Johnny. There's Robert. Hello We can't see you or Maybe that's intended, okay pity I From Europe There's David. Hey, I'm here Hey there. Hello Did you get the Google invite or only the email link? I just got the email I Thought my address was registered with Google, but maybe not Yeah, it seemed to know you when I typed in your name, but and it kind of looks like your picture, but Hard to tell yeah Well, we'll give it a couple minutes here. We got some more folks Boris unfortunately Won't be able to make it to okay Amazon EC2 went down yesterday and flip board is still all hands on deck trying to get everything online The cloud sounds like fun. Yeah All right now We showed two of you Rob. Yeah, I've got my laptop connected without Mike or video And my phone connected But of course the phone doesn't do video as well. So Got it So I think bail joined us Howdy and Also Morton Nice to meet you. We'll give it. I know Randall's coming online here in a second and Also, Adam and maybe a few others Thanks, Rob. I think you should actually go for a hundred eighty degree version So I think we're missing Adam Randall, I think we've got another line coming in at the bottom here. It looks like Yeah, it's trying So Google limits these hangouts to 10 and I think it doesn't it might have something to do with technical limitations, but Probably also it has to do with the fact that having more than 10 people in the discussion would be completely Unwieldy. Yeah, I'm just testing my microphone. Can you hear me? Yeah? Yeah, absolutely Thank you loud and clear Morning Randall. Hey We're just getting started here a few more people Lagging in but we're close to a quorum here Is it clear if I talk or should I go get a get a microphone or some kind? Sounds pretty good to me. All right, cool Why don't we just go ahead because he's pretty up to speed with the whole project and he can jump in if he shows up So hey, thanks everybody. We really appreciate your time and Joining us here at various times around the world We'll meet just kind of introduce I think most of you have actually met each other except for David Glyke and Morton Wang I don't know of pronouncing that So you're gonna have to correct me there So hi everyone, I'm Morton warden key way I'm a PhD student at the University of Minnesota group Lens research John Reedals my advisor and I I run suggest by the task slash article recommender for Wikipedia Fantastic So I think what the goal of the call here today is really to we've been talking about this for a long time amongst ourselves and It's really a chance for us to listen to you guys So what I want to do really is create as much time as possible for folks That are here to Listen and then weigh in on their different perspectives about This project and to just kind of Characterize it the you know the hypothesis project The goal of it is to create a global Content around the web To let people interact and reply and moderate Limitations spread it to arm of conversation and then as a quality control loop on top of that to Bring in folks occasionally On that threaded conversation and adjust the reputation of people as a result So Previous systems like we like slash dot for instance have used randomized algorithms non-algorithms to Find to match the meta moderators with the opportunities And We think that and other people like other folks that suggest bot have Thinked that a more efficient way of matching tasks Makes a lot of sense one in terms of making efficient use of knowledge and the other is in terms of making people happy because People are more satisfied when they've got something to do that's Meaningful and relevant to that So the this call is really just about the matching part. How do we match people match opportunities? The goal here and Mikaela Katasta who's in the middle of my screen But he's the one that I bought the shirt on has really volunteered to help us take up the task of forming an approach to this and Helping us maybe design some initial experiments to help structure that approach and then perhaps to set get some data and then Drive it into That we could prototype So I think the goal here is to help him think about and to help us think about what kind of design Might be optimal and what kind of experiments We could craft and we could design to help us get valuable information about How to approach this problem Stop there and let Mikaela weigh in if he's got or Randall Okay, maybe I can suggest So I don't know if you had time to go to the slides or the document we created But more or less what we try to do was a parlor between what has been already done in slash that which has done with saying is Absolutely randomized or what has been suggested by the suggest by the approach Which is something that we already had more or less in mind and our main Concern of the moment is how to devise an experiment that might resemble what? Metamodulator matching and hypothesis should so in Maybe if you follow up on the slides, I can show you I can do a screen share. Maybe it's the people's way Let me try okay, so There are a bunch of fundamental requirements that we want So the first one is we want it to be 100% Emergent and we basically decided not to impart any external data about user So we will be facing any way the cold start common because we don't have any information about users But we want to gradually be able to build up profile of interests of Topic that the user is expert on and then given these profiles We want to be able to do matching on good opportunities in the approaches We also wanted to the support multiple languages by design and then the Part of the reasons why we are here is that we want to be able to measure the quality of metamoderation Otherwise, we will not be able to Alternatively improve our model and this is something that we are still trying to figure it out What we were able to come up with is just a set of Features that we remind we don't have to go through them now. I think and this list will Grow up or some of the features will be discarded throughout the journey But the idea is both to use the content of the notation to use the source of what we are notating and then to either work on Topic extraction on the test or Working at a different level. Let's say at a more high level where you start to extract entities from the text and concepts And then we want to build a semantic graph out of this concept And these are I think this is one of the main topics that we're gonna we want to discuss later and Given these features then we want to devise a list a bunch of test-bed algorithms where we can Measure how good we're doing in terms of metamodern or matching and in the document we sent around yesterday We also have an example of a test and that we we can run at the beginning We're gonna go through it later once again Yeah, that's pretty much shouting. I might just ask I think bail I think if you You're There might be some background noise there that switching You to your screen you might use the microphone, you know, okay It sounds like a back-end Microphone except I'm actually speaking There you go Wonderful perfect Just throw it open to folks who have a perspective here Well, so so why don't we start by go going over how the evaluation and the suggest bot paper was done Great yep More than way in on that What do you what do you mean exactly by evaluation in this context? Well, so how did you guys say if you were doing a good job or not? basically in the so I Wasn't around for the original experiment that when and one suggest but we're original design, but Basically, the way we evaluate success in suggest bot is whether or not a user made an edit to the given article that we recommended to them We have not yet published, but we're working on getting more More data on what exactly they've been doing when they've edited a recommended article So the way just just to quickly describe how things work in Wikipedia The Wikipedia community tags articles with issues saying well this article needs more sources This article needs cleanup It's a stub. It needs more of everything and so on and is that process automatic or Is that manually done by people? That is a manual process At least Least as far as I know it's a purely manual process that community members look at the articles and say well here's something is missing To some extent we we're working on automating parts of that process You can you can easily imagine and there's also been research on on Finding flaws in Wikipedia. There was a Workshop this summer or if I remember correctly specifically on flaw detection in Wikipedia So there's there's some papers published around that But we we have the the Fortunate opportunity that we can reuse the already existing humanly tagged data and say well here's something that needs to be done to an article So basically suggest bot grabs articles out of that pool And tries as far as it goes to to Find articles that appear relevant to a given user So basically when we evaluate success we say how did the user actually edit the article? A different way of looking at success is of course did they make the edit that we wanted them to So and for instance if an article needs more sources If they're not adding sources to the article technically the the task hasn't been completed But there's we're doing work on on studying that as well in some context so Okay, and and what sorts of numbers do you get for how I mean If you recommend someone 10 articles do they tend to make 10 edits or is it five edits or? um Generally so a set of recommendations and suggest bot it contains 31 articles if I remember correctly no 33 somewhere around there typically The hit rate is around two percent or two around four percent or something like that So a user will generally edit one article in every three set of recommendations that they get and that's One of the things I I I wrote down on my notes sheet as I was walking through the documents for this hangout is that one of the issues surrounding Editing on wikipedia and task recommendations is what are the the The forces that that that decide whether a user will actually successfully make an edit or not And so far We do not know enough about that and we're working on learning more. We did a bunch of user interviews over the summer That i'm still working on analyzing user interest is a key We've we found that and that we also see that from the from the experiment in 2006 that User interest is a key that if you are not interested in the article, you're likely not going to edit it. So And more than didn't uh, if I remember from the paper There was a control of randomized suggestions Which were if the hit rate was about one in three For your co-edit or your text-based recommendation engine The hit rate on the randomized control was about one in 12 And that's that's what I remember as well Yeah, you got around 4x improvement from the baseline Which is Not bad at all But I mean on the other hand, I think our situation is a bit different because the Metamodulator opportunities in our case are and they're really cheap actions compared to editing a wikipedia article So, you know, you might be facing an article and you don't really want to edit it because it takes too much time or for many reasons In our case metamodulation is is quite quick But the approach that we might use for matching text For using the context of the opportunity versus the Accumulated aggregated Context that a person has operated in before might be almost identical potentially Okay Yeah, I think Randall does not agree with my Point which no, I don't know if there was an agreement or what? Wasn't a disagreement. It was just a question. Yeah, it was a question I don't know. I mean at least in the the preliminary design it looked very similar to what slash dot does So it's just a fair or unfair Action that I think it's at least fair to say that it's not necessarily Very involved Yeah Okay that was if I could just Jump in one of the things when I was was looking at this and comparing it to slash dot is is that um, one thing I might have considered is trying to figure out something about the the The cost of understanding the context of the metamoderation and in hypothesis I think in slash dot the context is more or less given and and as you've mentioned The metamoderation task is is probably very cheap I'm Not knowing enough about hypothesis. I'm not sure if the the metamoderation task is is Cheap in that context if there's if there's more Information that needs to be to be looked up to be to be able to understand whether whether it was right or wrong or not Probably probably not quite as cheap as up or down but it's Certainly not as expensive as as editing a witty pd article But you know, I think there's some some reading and some interpretation and we probably would would like to have a rationale Um, even a short one that might be provided along with the recommended moderation We could start simple as usual and then you know, unless we find any strong reason why metamoderation should be more powerful It's clearly it's more easy to stick with the fair and unfair like slash dot It will make things easier So getting back to the uh suggest bot paper. Was there any I just glanced over the results And is it fair to say there wasn't much of a sharp difference between the different recommendation procedures Um, it seemed like there was some some difference in performance, but they all would get you three and a half or four times improvement or something like that That's um, that's basically what I also have remember from reading the paper. Um, and um So the way the way suggest bot works is it has three different recommendation algorithms built in one uses um similarity between edit users So it's a user user recommender. Uh, one uses the link structure of articles on Wikipedia and and walks the path um to find art to find other similar articles that way and the last one uses a text similarity measures. Um, and uh base and then it it picks When it when it chooses is looking for articles to recommend it picks one of them. Um, at least it currently does We might change that at some point But but basically they all they all perform similarly well in the sense that that not They each have their strength and weaknesses When you do user user comparisons, you tend to get more serendipity and you get tend to get um More results that are popular um So what we what we've seen in some cases is that users are wondering well, why did I I get? Recommended the article about the modern killing Well, because everybody else edited it and then it shows up while some of the other strategies might find less popular articles And of course the text recommender might find some keywords that match something and end up picking articles that appear to be relevant That there really aren't and so on so forth. So But overall they they perform similarly well If if you had the guest one to start off With as a baseline, which one would you pick? Ah, that's a very good question. Um I think I think uh, well, let me now see Um I think it would that would depend on on what features you have available in the systems Let me see if I can I can put it in a context of hypothesis In such a way that it sort of the answer perhaps Comes out of it. Um It's the way I understand Um, how hypothesis is working users will will act upon On web pages that they see and basically you have the opportunity to build the profile of the user based on what what um What pages they've shown interest in is that the correct understanding? Yeah, not not only just the pages but say I annotate a sentence On that page. There's two things of interest. One is the selection That I annotate so the sentence or the even the word and probably some of the surrounding context you might you might have a window of 200 characters before to 200 characters after that sentence to get more words to fill the graph with But then there's also what I contribute the words that I would annotate And the comment that I would make so the two of those form the the source of Or the or the the complete context that um one might be able to Feed into the hopper Okay, um Let me see um A lot of the a lot of the research on recommender systems 10 so so I I see suggest bot as a slightly on the On the side of what is commonly done in recommender systems Because it doesn't really contain the typical so the typical two Uh strategies you have for recommending things is either comparing items to items or comparing users to users So you're either looking for users who look similar to a user looking for items that are similar to the items the user already has Then yep, go ahead You're already doing user user in the equating patterns Like one of your three approaches is user user Okay, so yeah so so um Suggest bot does does basically both it does user user in the in one of the recommenders and it does item technically item item recommendations and the two others but um typically in In recommender systems when you're doing item item uh comparisons Or at least as far as I I haven't I'm not an expert total expert on all this Is that you'll you'll extract a set of features for a given item and then compare similarity between them So let me now see to try to get actually to answer the original question. What where should you start? To me to me it sounds like it's it's easiest to start with some way of comparing users and try to find users who look like other users To do a user user comparison Um partly because there's existing research and technology on that and partly because Sounds to me in the conflict of hypothesis and metamoderation that you're looking for people who have interest or a skill set that allows them to properly evaluate what what has been happening And finding users who look like somebody who already did um Did this action would help help that that process? Yeah, I think the question is um so The only way that we're going to know about users because we're not importing External criteria or certifications or degrees or citations or anything like that at least at the moment. That's our thinking um The only way that we know about individuals is from the actions that they take within the system and The thinking is that those actions are probably their previous annotations um We could start looking at other things like pages that they um look at or Other annotations that they read or things that they favorite or people they follow There's potentially a wide variety of actions that you can incorporate But the simplest thing would be annotations that they've previously made and Ones that they have been favorably metamoderated on because we really want to match people Not just to what their interest is to but to where their expertise as you know Wade Dispassionately over time has been accumulated um So those so those that's the set of people that we have to leverage um And then the you know the question is on a particular annotation who do we who do we match and how do we route that that match So so it's it It seems like you're setting this up in a very difficult way to get some initial evaluations because without any data on users it's it's sort of hard to uh Guess how a system will do because there's there's no user data to do anything with um So it would I mean it would seem like you'd probably I mean Based on that you'd probably have to start with some sort of item item Recommendation method then even though a user user might be sort of a A standard that you'd like to do in the future, but um I mean I think one one one one one one of the other key questions in this call was How how can you guys get some some type of experimental test bed to start playing around with these ideas? Is that Yeah, yeah So if you're if you're averse to user data, um I guess that makes it a little bit more interesting, uh Or it makes it a more interesting question um So I mean it it it sounds like we probably can't use a user user baseline here, um Yeah, for sure not at the beginning I know bail has his hand up bailed any thoughts Hi, and um in one of the things was set was sent out, um Possibility was raised of having a user profile users talk about what topics they would like to metamodern it and um Also, I want to clarify that the goal here is to suggest things that uh Is to make the users happy right to find Suggest things that the users want to metamodern it So uh with that goal there there wouldn't be anything wrong with having a user written profile where they say what they want to metamodern In addition to their actions on the system Right, okay Sure as um Generally I mean regardless of what kind of recommender system you're looking at you have the cold start problem that Unless you know anything about the user you can't recommend them anything And you can then you're basically then you have to resort to Uh, just randomly randomly picking things And The way um the way suggest bot currently solves the problem Is that if you're Uh, if you're not if you if we don't know anything about you if you haven't edited any articles on Wikipedia We kindly ask you to uh, we don't do it automatically yet, but we we would kindly ask you to to tell us what interests you um because and and that's basically um So either so basically The way you could look at it is to say that well If you tell us what articles interest you we can you can use that to match you against other users and then Use that to find Stuff that interests you which is a user user recommendation Or you can take those articles and find other articles or similar to them recommend them which would be an item item way of Without that profile you're you know, you you still have the problem or what does the user what's the user interested in and uh, trying to figure that out so So they're providing you a sample set of articles Is that what a new user will do in your system? correct um, so What we we um What we actually do is we say we have a page on Wikipedia that says well If you want one time just to test suggest bought out if you haven't made any edits We ask you to tell us either the title of articles that interest you or Categories of articles that interest you because we'll unfetch all the articles We we uh We could have You know seated it with a kind of like a random set or something like that and and and got articles that way But currently we don't do that. So It's it But I think in our case there will be some sort of staging area between your registration and the time you start to matta moderate Which means you might be annotating some content and interacting in the system and only after a while You become a matta moderator Maybe it's it's true that the system itself has a cold start But this part of matta moderation might not have That big problem compared to the whole system. So we might work work around in this way And that's that's what I thought when I was reading up on the papers as well that You know, it seems to me like Like metamoderation is something that would that would come up some some Some way down the road from the users first registering is starting to use the system And then you don't have a problem at all because you know about the users. So Yeah, exactly So in this case, I think it reopens the question because we We might still ponder if it's better to do matta or user user or a combination of both of them Well, but but I mean, I guess it sounds like this would only be a problem once you have the the rest of the system up and going In which case you can start doing your experiments then Yeah, that's that's absolutely true I think our point now was can we find any data set that resembles more or less the data flow inside the policies So clearly we cannot scrape Slash that because even if we get the comments, we cannot get what's happening under the hood And we can get the whole edit history of Wikipedia And try to map it more or less with what would happen in hypothesis Which is something that we described in the document and I mean quite easily It would just be training a model on part of the data set and then you try to predict if a certain user Is gonna add it a certain article And this would be like this the first thing that came to our mind to be honest We don't know how close this is to our hypothesis work So is that suggesting that? Um Whether somebody edits something is a good indicator of how strongly they feel Or is a good indicator of the fact that they feel somewhat knowledgeable or in a place to do that Or that at least they have interest in that domain Yeah, that's a great remark Yeah, and the other idea is we could track if the Edit has been reverted or not So basically every time it has been reverted you know that Most likely that edit was bad or so maybe you are not really knowledgeable about that If it stays The certain probability means that you wrote something good It's a great area. I I thought I agree. I mean you cannot say It always works, but this could be another feature to use Rob Sanderson Is asking about youtube comment annotations or flicker in the sidebar It's just kind of throwing out other domains to look for data sets right, which will mean basically Scraping all these comments plus The up and down votes Are there such votes on youtube at this point? Yeah YouTube the quality level there is uh Really crusty But I'm not sure if we can pull useful meaningful conclusions out of it If we get useful conclusions from that data, then we can do anything, right? Yeah Um Another suggestion that I got regarding the asset when I was Presenting that slide deck here. I was about newspapers So maybe not in your times, which is huge and the community is too large etc etc But trying to use a small newspaper website where the comments are threaded and are up and down boats Maybe being able to extract as much data as possible from that system will look similar to hypothesis That could be another idea Still not sure what are of them. Yeah, go go do it Hey pale go ahead Sorry, um, my anecdotal experience has been that uh Comments on newspapers. Uh, there's a lot of You know, just kind of trolling and stuff like that. So I don't know if that makes it much better or much worse data set for this Yeah Interesting Yeah, no, you're right. I mean I had the same The same remark that you had because I can only imagine about the talent newspapers which are quite famous for being crappy And that was the only one that I didn't mind, but you know, maybe in some other countries, they're better So can we clarify the goals again for a second? This would be we're talking about mapping the data in some other system To what we're thinking Meta moderation might look like and then what making judgments about which Meta moderation type actions went well Which ones didn't and why? Is that the I mean, is that the what the experiment is designed to judge? What are we what are we trying to extract? Because what we're not doing is is trying to Match users in someone else's system to one another. We're not trying to compute user similarity for some existing commenting system out there So what are we doing with this data set? Like can we describe the experiment a little bit more? So we want to come up with a meta moderator matching algorithm, which has a decent success rate The plan is if we start in the on date zero with an algorithm that we think makes sense But in reality does not this could be quite detrimental for the Beginning of hypothesis So we're wondering if it's possible to run some tests beforehand on a different data set Where we can say, okay Most likely this algorithm will work then as David was pointing out as soon as we have our own data That's a totally different story and I think things will work better by design because At that point we can use our own data And we know what to do On the other hand, maybe Somebody here could say, okay. It doesn't make any sense to try to run experiments before let's start with something naive And then as soon as we have the data we start to tune our algorithms Which could be another acceptable opinion clearly I guess my my my sort of take would be to try one of the sort of simple item item Or user user techniques just because they tend to work Not perhaps the best but they work okay in a wide variety of scenarios So they're they're they're sort of the I mean they they ought to do much better than random Or even a popular so so the alternative really shouldn't be random, but it should be if you just rate things or score things by some Popularity measure the more popular something as the more likely someone is Going to be to met a want to met a moderate I guess But I but my My take would be to just Use one of these simple ones and acknowledge that whatever you do it's going to have problems once you run it on real data There's there's there's no way around that one If you did want another test that you could try and predict replies on twitter that would perhaps be Modestly useful as far as predicting if someone's interested in a topic or not Okay, do you know anybody david who's tried anything like that? Uh The the research literature on twitter seems to Basically follow the volume of the tweet stream. So it's it's growing exponentially So I I haven't seen that particular task done, but I would be shocked if it hasn't been done Um, I think I know one I will Not exactly the same but I have one in mind which is quite good. So I will look for it and Try to get it that soon What tends to be a little more common on twitter is retweet prediction people want to know if you're going to retweet something Uh, but that that's a little different than replying to it. Um, sure Yeah, right. So these are these studies Is that more of a user user or is that more of an item item strategy? Um, I would uh, let's let me I'll see predicting retweets on twitter. I would I would generally see that as well Disregarding the the the reputation of the user who tweets something in the first place as a sort of if we would just push that out of the system because you know, uh, and Perhaps also a number of followers because that I mean the The popular people are likely to get retweeted a lot because just because they have many followers. Um You you could look at it as a does this tweet contain Information that is likely to get retweeted. Um We had a phd student called Dylan Chen Uh, here he graduated a couple years ago and he did a Recommendations and twist a study on on recommendations and twitter and and trying to find Useful information that you hadn't already seen Um I don't remember exactly what the name of the paper is but I can look that up and email it out to people if that's uh, that's interesting, but But generally, uh, generally I would regard this more as a Michael, I think it says you knew I was unmuting myself Sorry, really sorry Yeah, Morton, did you finish that last sentence? So, uh, so I think I was just saying that generally I would look at this as more of a way of Analyzing the the tweet as a as a content item and try to figure out does it contain stuff that is likely to get retweeted one, um, we're we're Getting close to 10 o'clock and and just I'd love to talk to you guys all day, but I'm sure we're gonna start losing people one question, uh I wanted to end with is just, um the question of the two basic design approaches in A system like this and one to kind of characterize them both one being, uh a Completely emergent design where the graph kind of comes out of the content That's within the system and we iterate over time To to to build that graph and the other Which david touched on in in a document I sent around this morning Is where we use an external corpus maybe like wikipedia or a um Take on wikipedia for instance the recent google data set of 175 million stem words Or statistically significant phrases to help seed that and facilitate that process and uh Maybe um, I'd definitely love to get your thoughts on that uh david and anybody else who'd like to weigh in on that so uh From from my perspective it it comes down to a question of how much data you think you're gonna have it originally the more More data you think You can get before Doing this meta moderator business just in terms of annotations amongst users um Then you probably don't need to rely on on external data sources as much That said if you uh sort of restrict yourself to the data uh, I mean if if If you want to launch like a complete system where you don't have any annotations originally and want to do all this meta Meta moderator matching and all these other things. You'll probably need some way of of Guiding it initially. So in that case it it wouldn't be fully emergent, I guess But you could transition there as you get more and more data one thing that Wondered is whether Even if you did something that was kind of emergent You might still the the recent google data set one of the most interesting things about it is the way that it matches words and concepts between languages And That maybe it was an interesting dictionary or cross cross lingual dictionary that you could use Even if you were still building something that was fundamentally emergent in nature But exactly how that might work. I had no idea That's a quick comment there. Um, if you I mean there there's a there's a link network between articles and different languages on wikipedia, which is To a large extent a one-to-one mapping between concepts So if you're if you have some some some topic or or concept that matches a wikipedia article You could technically use that to match Topics and concepts and other languages just following those links. So is it concept net? I'm not sure if concept net how that fits into things, but There might be is that people have built databases and stuff on it But you could technically There's also a Brand hacked out of northwestern has built a an api that they're working on That does something similar, but I haven't looked looked at Different ways of mining those network though. So So anyway, it looks like one of the main outcomes is we should be able to I throw it quickly as soon as we get data and change completely what we're doing So this will happen most likely and we we thought it would be on the other thing all of us And the the second one would be at the beginning. What do you think are the features that could be? That most potential basically and this is maybe the last open question that we have we are missing Which is sort of related to what dan was saying before like should we just do Topic extraction should use concepts etc. So Can you restate the last question? I think I missed it So the last question is related to that slide we had with all the possible features or inputs to use And the idea is although Eventually, they will change or we will be able to add new ones It would be nice to get an idea of what are the initial ones that are good You know, let's say we have to turn on the system at day zero. You pull the lever and we want some sort of a user user or item item in place But we still need some features to use to do this matching. So what do you think are the ones? that have most of the potential so looking back at the slide one thing I thought I had is that it It sounds to me like like a lot of the A lot of the actions in the system are related to text being written or So it's comments on either specific sentences of a page or or a page itself and so on And to me then it sounds like the text that a user enters is likely to contain Some useful information about what they you know about what they think And and mean and what kind of association they have with With what they annotated so I think Using so it's it basically sounds to me like I'm using that figuring out if it's possible to use that text to extract features from it to understand Things like what is the user positive or negative to something? You know, as you mentioned you have a context based on on text around it and then use that to find Other items that are similar sounds like the Probably the first thing I'll try Give it one one quick clarification here when people do text analysis on twitter They have a heck of a time with it because it turns out people write differently when you restrict them to 160 or 140 care Rectors or whatever it is Are these anticipated to be short or more lengthy remarks on hypothesis? We're not going to bound them to character restriction um but People I think when they're in commentary mode Some people will write long replies if you give them an infinite notepad But I think the average length of response will probably be relatively short sometimes very It also depends on how we kind of construct the interface to incentivize or Reduce the friction to making several annotations and reply to something rather than one giant one So if you have several points to make Can we make it easy enough for a person or desirable for that person to make them as separate annotations? And that will that would obviously have a huge impact on the tendency for annotations to go longer or shorter Yeah So remains to be seen I guess Yeah, I mean although at least we should not expect this Absolutely impossible to understand words or shortening or jargon Something like twitter is full of these words that do not exist in reality. So On the other hand, I I have it has occurred to me that we might see some jargon of our own Coming up particularly if we go forward with something that has been floating around the idea of Voting with comments Then I can imagine all kinds of justifications for for a vote That maybe would be shortened acronym Form or something, you know, I can see the ot off topic or something is a common thing Someone might write as the only body of their annotation When saying that they don't think something else is off topics So we might see a bunch of that but aside from aside from what arises as our own just sort of jargon Um Yeah, I kind of agree that I think will tend to be at least of some Some length and not with excessive shortening of everything Okay, um So in some way, I think the design that David was Proposing with this slide deck The at least this initial design is really It's as close to what we might be doing basically So I think you were suggesting just to go for causing similarity and user user and pretty much that That that's certainly a very simple place to start. It's not going to be perfect. It's going to have issues But I think regardless of what you do You're going to have You're going to run into to some type of issue that that will surprise you Um I think if we can what it feels like is if we can even implement something crude to do Just the you know crudest form of item item or or basic text matching Um, there'd be massive improvement over obviously a random baseline And uh, should be able to get us some some good data And I think the key is maybe just which Architectural up road to start down on in terms of putting that together and getting done some initial data so maybe we have enough to kind of from a call and and You know previous thinking to go off and Propose something and maybe circulate it back to the group In paper form Maybe a short one pager or something like that and get some Any any further thoughts? Awesome Well, it's uh straight up 10 o'clock and I wanted to be respectful about people's time. Um, we really appreciate you guys taking my time out of your day and and Helping us. Thanks for this problem and hope that we can keep reaching out to you back out to you as we on as we move forward All right, thanks. Yeah. Yeah, thanks everyone. Okay. Thanks. Bye. Bye. Take care Uh, just save the chat content just in case you know Yeah, I feel as though that should probably be saved along with the hangout But I've never done one of these on-air things. Um, maybe it's automatic, but you never know So I just didn't want to know What you guys said