 Hello everybody. We're here for the research showcase of September 2016 and Today we have two guests First I'm Abby which drive the lead design researcher here at the foundation on the on the research team And so today our speakers are that's Nick Fettalboum Who's going to speak about finding news citations on Wikipedia and then? Amy Amy Zang will be speaking about designing a building online discussion systems So at the research showcase each each part each presenter will have 25 minutes and then we'll have about five minutes for conversation There's the IRC channel that you can ask questions through the with media-research channel and Aaron half acre will be Be there making sure we ask questions to the presenters So With that That's it. It's all yours Thank you. Sorry. Does anybody have any other questions before we begin? I know that Amy had some questions that you related to me over email. I'm good. I got I got the answers earlier Thanks. Sorry to interrupt go ahead Okay, so I should be sharing the screen I suppose now Okay, so thank you Abby for the introduction. So as you said, so the work is about finding news citations for Wikipedia This work has been done in together in collaboration with Katya Wolfgang and Amisha here at L3 Research Center in Hanover and without further ado. I'm just gonna jump into the motivation behind this work So one of the main motivation point was the actual One of the core policies in Wikipedia that is the verifiability So what that means is actually that? statements which are added in the entity pages or Wikipedia articles should point to an external reference in in order to serve as evidence so Basically, there are some guidelines on how you can actually cite and Wikipedia itself and one of them is For instance that you should cite third-party sources which come from reliable and authoritative sources for example like news and In order to have also this objective point of view regard with respect to some added statements the other motivation is that Actually, if you see in Wikipedia Many citations are actually outdated or point to dead URLs apart from that. There is citations which Statements which have citation needed markers and this is usually the case for long-tail entities. However, this can be also for newly added entities and Yeah, since we can imagine that Wikipedia is constantly evolving Therefore in this case you would a preferable system would be something that actually manage this in an automated manner so that you can enrich in by means of finding citations or Actually just providing suggestions to the Wikipedia editors to To provide actual citations to the statements For this kind of problem we propose the following approach. So as input we have the Wikipedia entities and We kind of decompose the entities into the kind of more atomic parts like sections the anchors which are within the Section text and then we have the categories and so on so for example here. We have Barack Obama We know that Obama's type politician we can get this through any knowledge base or through the Categories in Wikipedia and then what we do is we decompose it further into the statements so here The first step that we need to do in order to find actual news is To make sure that a statement actually requires a new citation. So what we do is a statement categorization here There are several categories that the statement can have let's say based on the Citation type for example, it might be a web book or journal or other reports Let's take it as an example that Obama for example was elected to the Illinois Senate in 2007 or something and this requires a new citation so now in the second step what we need to do is find actually this news article which Services evidence for this statement and you can imagine you might have a real-world news index and from that kind of you can use the statement as a query and find out the actual Citation however as we will see later on this is not actually a standard Retrieval task and it's much more complicated and you need to follow quite some guidelines which are actually Stated clearly in the Wikipedia policies. So let's begin with the first task. So the statement categorization Let's take again Barack Obama here. We have three section. One is let's say we have 2008 presidential campaign then we have legislative career and a section about early life in Korea as you can see from here In the first section in the presidential campaign These statements are very newslike. So there's these events which have very Broad and high coverage in news media and you would expect a citation to be found there However, if you have for example the legislative career, you might you have laws that are being proposed laws That are actually passed and so on so for these kind of info types of information You might find it also in news, but more preferable source would be a report which is coming from some governmental body So in this case, you would actually want to find the let's say the more suitable citation type or citation source and as I said we focus in news in in our case because news is the second most cited source in Wikipedia and Apart from that it has quite some nice features like it's In ideally it goes through an editorial board. So information that is kind of reported in news It's a cure it and in many cases also true So so we focus and in order to do that So we pose this problem as a as a classification problem So we have a statement and we want to classify it into one of the citation categories So you have that news book and and so on and for that what we do is we analyze two types of feature one goes at the Language level. So we we analyze the statements and their corresponding language so what are its amount attributes that we might find in news and we further analyze the structure of the Wikipedia entities and In more detail for example language features are Like quotation marks. So this is a high indicator of paraphrasing and you see this a lot in news articles Where somebody actually quotes another a third person And this as well is also seen in Wikipedia and another thing is temporal proxmo proximity So for example here you have two Two cases one is the first statement is about Barack Obama and the second one is about Leibniz So you can see that in the first statement you have a temporal Expression or a time point which refers to 2003 which if you compare to the current time It's not very far off. However, if you see the second example that dates back to the 17th century So it's very unlikely that such a statement dating back to the 17th century will actually point to a news news statement news article and another feature which goes into this if this group of language features is the discourse so we perform a discourse analysis on the Statements and what we expect and what we see also in news is that news are at have this temporal discourse So you report about events sequences of events in a temporal manner. So you have this Temporal discourse and we expect this to be found also in the news statements And not in the other for example in books and so on. So this is the first group now consider you have let's say several statement and be focused on this one and here the the other type of features that we consider for this The statement categorization goes into more of this structure of the entity We know that certain sections and certain entity types are let's say are more likely to be found in news For example here that as I said the presidential campaign section Will be very likely to have a news citations and if we combine all this let's say we We sort of measure the likelihood that given that I this entity is of type person or politician and I'm analyzing a section About presidential campaign how likely it is that I I can have here a news statement similarly We can further analyze the Statement itself so in the statement we have these Anchor texts which points to other entities and what we do there is basically we see now we know that we are analyzing Barack Obama, which is a person or a politician and We see like what is the likelihood that as a politician it occurs Let's say with an organization or or something similar in the news context So we simply measure here the likelihood and then combine all this feature into a supervised model and try to predict accurately the news category as I said the The The classifier is a multiclass classifier, but we the feature are sort of optimized for news citation So once we determine that the news a statement is a news statement basically belongs to the news category Then we proceed further into the actual Challenging task of finding the citation so in the citation discovery step So again we have now a statement which is categorized as a new statement And what we need to do is we need to find in our big news index The appropriate news Articles let's say we do a query and manually find out that actually as many of the Wikipedia editors probably do it We find out that these two news articles are the most appropriate ones For this particular statement and actually these are all already in Wikipedia itself So what we see here and what is also represented in the guidelines of the Citation policies is that there is a definition of what makes a good citation So first of all the statement itself from the Wikipedia entity should be entailed in the news article So we need some form of entailment or similarity between the statement and and the news article The second point is that the statement should be central in the news article That is for example here. You have an article talking about the launch of the presidential campaign of Biobama however, you might have a similar article which reports about all all campaigners So you might have a summary of all these persons Participating in this election and therefore you would prefer the one which is let's say more focused towards the specific entity or person So this means that the statement should be central in the news article itself and the second at the third Attributes which defines a good citation in our cases and in the Wikipedia itself is that the cited news articles should be From an authoritative source. So if you have let's say BBC or CNN reporting a news article You would prefer it over. Let's say a local newspaper So these are some of the features which make a good citation and we build upon these two actually to to extract actual feature for our for our problem So the discovery process goes in these main steps. So now you have a statement and from This statement you need to construct a query since not all terms are equally important. You would want to actually Retrieve from a news collection index the most Important news articles that match the most important terms and from that We we made quite some some extensive analysis and we found out that actually if you go further than the top 100 news Articles retrieved by this constructed query for which we use related work as shown in the slide You do not gain much in the coverage Meaning that if you retrieve top 1,000 you won't be finding any relevant news articles at all So we we found the best threshold to be at top 100 and then after that you compute these Features which make a good citation between the statement and this top 100 news articles and in the end You simply classify these pairs of this statement and the news articles as either being a good citation or not or simply like is it appropriate citation for a statement and so in more details, so what are the features so the First attributes that we were looking was entailment. So we do this in this manner So we first have a news article candidate which we retrieve from our index We chunk it into sentences and then we simply compute the similarity or the entailment between the statement and the individual sentences for that we rely on Jacquard similarity phrase overlap proper non-phrase overlap since these are kind of indicators of Named entities and then since this kind of let's say if you compare jacquard It's kind of a bag of words similarity for that reason. We rely on more sophisticated similarity measures like three kernels So in this case here, you would actually consider this in semantics in the sentence and the syntax of the sentence so actually The structure would be similar and also the semantics of the sentence between the statement and the sentence in news article Would be the same and furthermore we we compute language models over the news article candidates and see like how likely it is that we can generate such a statement from from this model and As you can imagine since news articles have different number of sentences. We normalize this across Per article by taking the average across these features and minimum and weighted average and so on and the second attribute which we Were focusing on was that actually if we have high entailment. This should be also that The sentence with which we have entailment. It should be central to the news article in order to do that what we do is the following so we We decompose again the news articles as a set of sentences and then simply compute the pair by similarity between each Sentence and if the similarity is above some threshold then we construct an edge What this gives us gives us a graph over the sentences and then you simply can run like a page like page rank like Algorithm and find out the most important sentence. So this is some work which has been proposed in 2004 you can see the reference in the slide and then basically we have again the entailment Features between the our statement and the most central sentence in the news article the second Kind of centrality feature that we consider here is some work which we proposed last year Since we are dealing here about a specific entity Necessarily the entity should be a salient or central concept in this news article So it should be the news article should be about this entity So we cannot really like propose a news article Which doesn't have to do anything or the entity is not salient in that news article So the entity should be salient and this kind of like further Emphasizes the centrality attribute for the citation and the last Feature that we consider here is the authority of these news Articles so basically from the news domains from where they come from so here you have two types of Entities one is a an athlete and the other a politician So here if you have a news story about let's say Stephen Curry you might prefer actually that this news article should come from more specialized Sports news art domain like ESPN sky sports and so on and the other case for Obama for example you would prefer BBC CNN and so on because these are more specialized in Politics and and so on so these are have higher authority for certain certain entity types So we further kind of decompose these news domains into these entity types and and prefer We simply compute. Let's say. What is the authority of let's say ESPN in the different entity types and and and in this way We can establish this news domain authority. So once we have computed all these Three different attributes in in forms of features we feed them into a classifier and then try Train a binary classifier in order to to say like what is a good citation and what is not So let's see what is our setup here for the experiments So for the first task in the statement categorization We have we extracted roughly 6.9 million Wikipedia statements and these point to 8.8 million citations so these are citations from all sorts of types and These are coming from mainly 1.6 million entities and roughly 670 thousand sections and as you can see here in the plot web and news are the two most popular citation types in Wikipedia Whereas for the second Task we have for the citation discovery. So we extracted roughly 1.88 million news articles that were cited from Wikipedia new statements and here we had to kind of in order to have a Realistic evaluation Experimental setup we had to limit it between the ranges of 2013 and 15 So the news articles should be post should have been let's say posted it or published at 2013 between these two dates Because the our real world news index, which is G dealt An index of overall English news media across the world Was within this range and for that we had roughly 20 million news articles and the kind of merged these two news collections together and as you can see this presents a really realistic setup with 20 million news articles which were not actually Coming from the Wikipedia itself. So finding the Already cited news articles from Wikipedia is a quite a challenging task. What we realized is that Before we actually started training our models is that for the statement categorization the ground truth is quite mixed up So for that we so here in the bottom of the slide, you can see one random example, which I took this is about a person in UK and it has a statement which points to a citation in BBC It's obviously a news article. However in the Actual citation type you see web. So this is clearly kind of a mismatch of the citation type and What we propose here are two simple heuristics One is we simply do majority voting where we count how many times this news domain actually appears in one of these citation types and we pick the One which has the majority and the rest we kind of moved to that majority type and the other heuristic which we propose and use to clean up this ground truth is That in case we find kind of as a sub directory in the URL news then we would mark it to as a citation type of news and In the table here, you can see in the left hand side of the slide the changes from one citation type to another The majority so the most of the changes are between web and news So for example from web to news we move around one million citations and from news to web roughly 400,000 so by simply applying these rules we kind of clean up a bit the ground truth. However, there are still noisy data in the ground truth which Need more sophisticated approaches to clean up actually the ground truth and for the citation discovery what we From this 1.8 million news article. We realize that 19% actually point to deadlings or to move content and After crawling these web pages we see that 7% are actually shorter than 200 characters And in manual inspection, we realize that these are actually pointing to the main main page of the respective news domains and in some cases they are really really basically containing almost nothing So we have discard also these news articles which are less than 200 characters And so now the task is obviously the first task is to categorize the statements and the second task We have to find these actual citations news citations So for the yes Okay, so for the first Step so the evaluation is the following so here One obvious fact is that you cannot really mix different entity types because they have completely different structure Like you cannot mix a location with the person since the language features and all the entity structure features are completely different so we train classifier which Follow this type hierarchy, which we extract from the algo So we start with very generic types and then move to more finer grained and as you can see here in the table For location we perform really poorly some reasons for that are that locations are do not have that many news citations at all and For the other case for person we perform reasonably well with almost 70% accuracy However, if you go to more finer grained types the performance starts improving so So one conclusion that we see from here is that for some types You actually should not try to do automated approaches because they do not perform well and for some you can actually categorize with reasonably good accuracy the categorization task and in the second Evaluation here is a bit more complicated So we propose some we actually use some baseline just to show like the difficulty of this task And the baseline is the following so we take the top one document that we news article that we retrieve from our news collection and and another baseline which we propose which is simple model trained on the rank of the document and the retrieval score that we get and in the first evaluation Basically where we are evaluating our approach is we try to find the actual new citation that will that is already in Wikipedia And as you can if you remember you had 27,000 news article in a news collection of 20 million. So it's a quite a challenging task and another thing which we realize is that Many news articles can be duplicates So what we do here is we the false positives by our models We measure their similarity to the ground truth news articles And if it's above the threshold of 0.8, then we consider them as relevant as well and for the remaining false positives we simply ask the crowd and Show them in pairs of ground truth news articles together with our Suggested news articles which are false positive because they do not exist in the ground truth and ask them to evaluate based on the statement Which one they would they prefer it's either one of them or both or none of them the result for this Evaluation are the following so As I said the baseline is not really suited for this task because it's much more complex and simply retrieval and it performs as expected quite bad and for the first evaluation where we try to find the actual citation in Wikipedia Then we perform reasonably well with almost like 70% accuracy based on micro average across all types that we compute this process and once we add also the false positives who are very similar to the ground truth this further improves by 5% and Then when we ask the crowd for the false positive to assess we get an improvement of over 22% So what we can see here is that actually the citation discovery can be done with high accuracy for some types even with 90% accuracy and As you can imagine this task is very difficult because we have to find the correct citation between a doc as a list of 100 the news articles candidates and What we further can do through automated approaches we can actually suggest better citations So here in this small table you can see that in almost 20% of the cases We were able to suggest better news citation than the existing one in ground truth while in 38% of the cases both the ground truth and our suggestions were Equally good and to make it shorter since I don't have time the conclusions from from this experiments and these proposed approaches that You can actually maintain the one of the core principles of Wikipedia namely the verifiability by Doing an automated approach Which first? Categorizes statements into news statements and then try to discover news articles Which serve as evidence for these? statements important aspects to consider here is that how you construct queries from statements and There is there needs further work to be done on actually cleaning up the noise from the How these citations are categorized by the Wikipedia editors in order to have let's say more Accurate and reliable models and with that I would like to thank you and Open for any questions that are available there Thank you Erin has some questions on IRC and Fun story these are these are all for me because I didn't get any post through IRC By the way, if you are on IRC and you're hearing this ping me. I'm hellfack I'm basically the only person that's talking in the chat Okay, I haven't because I had both screens kind of blocks. I didn't see anything. Oh, no, that's just fine So So so the the first question that I had was why detect new statements first why not just look for relevant news citations for all assertions in Wikipedia Yeah, so we had Initially, we were thinking in that way However, you might end up so so you might have let's say a Collection of let's say books news or reports, which is your target to To find citations from for the statements. So in this way, what you can do is actually Suggest a new citation only when it is necessary for a statement So as I mentioned in one of the slides you might be able to find the news citations for Statements which actually require report from a governmental body But however, this is not really the most appropriate Citation for that statement. So it is better to first Make sure that this statement actually requires a new statement New citation and only then actually find these news articles You can do it in both ways But I think in that way if you do not to do first the categorization then you might Introducing some noise into these or not really appropriate citations So that was one of the main reasons Okay, I have more questions, but we should probably let people in the room ask some too. Thank you Okay, so so this one's a little bit more technical so when you were going over your Entailment measures one of them was a tree kernel and I was wondering if that's like a probabilistic context-free grammar so what we do there is actually we compute the dependency parse trees of the sentences and the statements and then simply the maximal subtree match so you try to find the from the statement the maximum Subtrees that match this statement and in this tree. So since there's a dependency parse tree you the let's say the part of speech tags Match both in both cases and furthermore they should match and Furthermore, they should be also have the same content. So in each of these notes in the tree They should have the same The same words so in that way you have the semantics and the syntax of the sentences. There's a paper Which actually proposed this I didn't put in the slide, but it's in The paper the reference which points to this tree kernel Gotcha, so that sounds exactly like a probabilistic context-free grammar Scoring system. Do you are you familiar with these? Do you know what the differences between the two? Um, I might Not not really I wouldn't be able to explain it exactly, but okay, well, maybe we can chat later. Thank you so much Yeah So I just need to share my screen And then I'm just going to push the timer for 25 and I'll And maybe if you could push Presentation there you go. It looks great. Yeah. Okay. I'll just Break in just quickly when you when you have five minutes left. Okay. Thanks so much Okay, so I'm going to be talking about Designing and building online discussion systems And this is a project that I'm working on with my advisor David Parker And we are both in the haystack group at MIT C sale So basically what I'm we talk about is first go into some of the problems with online discussion systems today And then delve into a particular project that we are doing to work on collaboratively summarizing discussion And I'll talk a bit about the technique that we use to summarize discussion The tool that we built and then our evaluation of the tool Okay, so there's a lot of places for online discussion, but a lot of them also have problems So right now with basically any discussion out there They can only ever grow in size and as discussions get longer It becomes impossible for people participating in the discussion to read everything before they Add their add their piece. So this leads to issues like Too much content for any one person to be able to sift through so here's an example of like a CNN article I just pulled a random one off and there are a thousand responses Nobody can read a thousand responses. So why do we show users all of it? other example of just like a forum and For each of these threads, there's you know over a hundred pages and several thousand replies An additional issue that is Exacerbated by volume, but is an issue of its own is that a lot of these conversations can be really deeply Threaded or have a lot of replies sort of replying to each other And oftentimes you really need to you can't read these out of context You need to read them in full to understand what was discussed But this leads to issues when you have deep threading where you can't follow Or it's difficult to follow the thread of conversation So here's an example from reddit I pulled I also pulled an example from the Wikipedia talk pages And this happens in a lot of places where there's a lot of discussion. So here's Twitter and An issue that sort of exacerbates these issues and is exacerbated by these issues is that there's a lot of redundant conversation And in redundant comments as well Since there's there's already too much to read so users don't bother checking to see if someone else has already said the thing that They were going to say or ask the question that they were going to ask So then they ask it anyway, and then you have this redundancy So now I'm going to talk about this tool we built called Wiccum bridging discussion forums and wikis using recursive summarization And this is going to be presented at CSCW next year so what we try to do here with Wiccum was To address these kinds of discussions that are they're too large to read they're they're too deep They have a lot of redundant conversation Or they have these kinds of conversations where you have a back-and-forth and there's some sort of consensus or conclusion or Or sort of like a summary of the discussion where anybody who is going to go through and read it later They only really only really need to read that as opposed to going back and forth So one way you can do this is with the textual summaries But this can be really hard for a single person to create So if you have a really long discussion, you can't really keep all that in your head when writing a textual summary So one thing you could do is have a collaborative summary And there are some applications out there that I've tried to do something like this Mainly in a wiki like format So one example is Quora where they've had Added in some of their pages answer wikis at the top of the page where anybody can participate it's just a text box and You are attempting to summarize the answers that happens below But there's a lot of problems with you type with this type of wiki Which is that the wiki itself is completely not connected to the discussion that's happening happening below So if you read one portion of the wiki that's talking about a particular answer You won't be able to know until you go through and read the answers and find that out yourself another issue is that You actually don't know how much has been summarized versus not so maybe somebody started writing an answer summary and Only got to like the first answer and then didn't bother continuing So when you read that summary, you're you're not really sure If you're actually getting a summary of the whole thing or just a small portion and Then finally there's only one level of detail So maybe somebody wrote a summary, but it's it's really It's like a one sentence or it's just too short for what you want So you can only read that or you can go through and read through the different answers So instead we had a different idea with wikam. So I'll explain sort of how wikam works first So say you start off with a long discussion Then you have a reader who's going through and reading the discussion and then while they're reading through a short thread or A particular comment they can at that point then summarize it so Not the entire discussion but a small segment of the discussion and then the portion that they summarized Then gets replaced with their summary in the in the discussion thread So that the next person who comes through and reads Gets the summary they read the summary and then they can decide whether or not They want to open the summary to see the comments that happen within that summary and then on top of that Then anybody can edit anyone's summary and then we keep that edit history So basically the summaries themselves are wikis and then people can also write summaries that contain other summaries So if you have somebody who's written a short summary and in a deep Deep discussion thread and it's been replaced by that summary then someone else who's coming through and reading at a slightly higher level can write another summary and That that encompasses that summary and then we call this recursive summarization So to get a little bit more deeper into the different things you can do in recursive summarization. I Have a little diagram here. So on the left you have Discussion tree so each one of these nodes is a post and you have a link from a post to another post if That person that post is replying to that post So in this first one somebody chooses to summarize a small subtree of a parent and two replies with the summary So then in the next panel that The dark orange is a summary node a summary post that's basically injected itself between The comments that it summarized and the parent of those comments And then we have the light orange symbolizing that those are comments that have already been summarized So you can group sort of sub trees like this you can also group Different comments at the same level so below in that second panel you can see we have three replies to a Comment above so you can select comments at the same level and summarize those together as well Moving to the third panel you can also move summaries around so if you were Wanting to for instance summarize at a higher level you want to summarize this big chunk of discussion and you see that Somebody's already written a summary that is summarizing a smaller portion of that discussion And you want to move the entire text of their summary up to where you are and then edit it to incorporate the new The new comments that are going to be in your summary. Well, you can just move it up Promoting a summary So then you continue working and then at the end you have Basically an entire discussion that's been summarized and you have different Summaries at different levels of the discussion So we call this a summary tree So it's a it's a summary document that can grow to reveal Summaries at deeper levels that cover smaller portions of the discussion And if you're interested, you know, you can click in and click in all the way to get to original discussion So compared to a wiki summary our design of wikim was so that people could immediately see what was summarized versus not so you can see that by the Delineation of what's a summary and what's an original comment We also link the summaries to the particular parts of the discussion they summarize So, you know, if you click on a summary, you'll see exactly what what discussion it summarizing Also because of the way that we sort of modularize this This breaks up the summarization tasks into the smaller more manageable chunks for people to work on And then finally the the summary tree that we have in the end is is a sort of like a hypertext document that creates summaries at different levels So here's a screenshot of the tool that we built This is sort of like an instantiation of our idea. There are different ways that you could you could create a tool like this And you can play with it at wikim.csale.mit.edu So just to describe some of the things that are in the tool on the left, there's a directly manipulatable tree visualization where you can select Particular comments and it shows them on the right. You can also toggle The different levels and you can drag to select to select a certain portion We also have these editing models for summarizing individual comments also known as TLDRs or you can summarize groups or sub trees of comments So what you would do is you're reading the discussion You might select some notes on the left or you might just click in onto the comments on the right and say I want to summarize this And then this would pop up and you can write your summary in the text box on the right. Within the editing model, we have a couple things to help make the task easier. So for instance, we really encourage citation of paragraphs Entire comments By adding these buttons on the left hand side where you can cite particular things In your discussion. We also allow quoting. So if you highlight over a particular sentence, you can still be a quote button and then it'll automatically get added to the summary on the right. We also so we were interested in some automatic approaches and using automatic summarization techniques to help out the users. Unfortunately, we didn't. We couldn't really find any that were good enough that That they could just be used without the input of the user in terms of extractive summarization techniques and unfortunately abstractive summarization techniques. So ones that are not just taking sentences from the original document. Most of them require a lot of training data, which doesn't really exist for this task. So we were stuck with extractive techniques. So instead of actually, you know, letting users click a button and an abstractive summary, or sorry, a summary extractive summary appearing on the right. We have this highlight feature where We're using an algorithm to highlight the top sentences for the users. Okay, so now I'm going to talk a bit about the evaluation of this tool. So we first wanted to evaluate the summarization process. So How easy was it for people to summarize using the Wicom tool. And as our comparison we used Google Docs with track changes on this is sort of to mimic What people would use today if they were going to summarize the discussion, probably something like a wiki editing box or a doc where people can see what other people have edited. And for to do the study we, we, we got 20 participants. We divided them into three different groups and each person in the group was Summarizing a part of a discussion and then the next person in the group would build on the work. Previous group And we counter balanced the order of Wicom and Google Docs. We also had three different types of discussion. So one discussion that was Sort of more social. It was the why a woman can have it all Atlantic comment section. And other was a deliberative discussion on the mailing list regarding a political event. And then we also had a discussion from Reddit from the explain it like I'm five sub Reddit. And then so this is what a starting point and Wicom would look like. So we have just the entire discussion. There are no summaries and you can see that The summaries that the discussions that we chose are quite long like Something between 7000 to 8000 words, which would be around 30 or 40 minutes reading time for one person. This is how it would look in Google Docs for someone just starting out in Google Docs. So we just pasted the discussion in We added like IDs in case people wanted to do any citing on their own. We had the like Count and also the author and we added some indentation to indicate the reply structure, but we only had it up to four levels and then flattened after that because of space. So this is what a partially completed summarization would look like in Wicom. You can see some of the discussions. Some of the discussion has been replaced by orange nodes, which are the summaries And while other portions are still left unsummarized. I didn't mention this, but Wicom also has capabilities for tagging. So you can filter by tag. You can also filter by all sorts of other things. And you can see the citations in the summary. So if you click on any of the citations, it'll jump to the portion in the discussion where that citation points to. Here's an example of from our evaluation, a partially completed Google Docs summary. As you can see, there are the people, the different people that participated in our study and the edits that they made. So they had started a summary and different people had added to that summary. And then the portions of the original discussion are still there and they are crossed out. So now to our results. Overall Wicom was faster for summarizing than Google Docs, even with the same groups of people summarizing. So in this graph, in this graph we have here group one, which were the same people, and they worked on summarizing the Atlantic article in Wicom and the Reddit science article in Google Docs. And you can see that for all the groups that are using Wicom, the summary is actually finished. So finish in this sense means that we set it at 250 words, which is about half of a page as sort of the final summary. As all the discussion has been collapsed into 250 words, then we'd say that it's finished. And all three groups were able to do that for Wicom. But by the end of our study, none of the Google Docs had finished. And actually you can kind of see in this graph, some of the discussions actually sort of leveled off. So some other findings. And maybe some ideas for why this happened in Google Docs versus Wicom. So in both conditions, people were really reluctant to edit each other's work. The Wicom condition, this meant that when somebody wrote a summary, not that many people went in and edited their summary after they had written it. In the Google Doc version, this basically meant that most people were writing summary text and appending it to this ever-growing summary at the top of the page, which was just getting longer and longer. So many of the Google Doc pages were like greater than two pages long by the time we were done. But we're not really condensing any of that down. So as time went on, we saw that stalling because, you know, at some point, the summary was basically longer than the amount of discussion, the original discussion that was still there. Also, we saw that users reported spending a lot more time reading in the Google Doc condition as opposed to the Wicom condition because they needed to see what had been done already, see what still needed to be done, sort of get a sense of that. So what was going on in the document. And something else that we found was interesting was that users reported summarizing this content was harder for things that they disagreed with. So we specifically chose some, like, controversial or subjective discussion topics to see how people would react to summarizing that content. So this suggests that, you know, having training or having some indication of, like, using neutral point of view would be useful. Amy, I'm just starting to interrupt, but people on my YouTube stream are saying they can't see your presentation. Could you unshare and read, share your screen? Yeah. I can see you share it right now. Let's just give it a whirl. I'll unshare it again. And now I'm going to press play. We'll see if that works. Let me know if it doesn't. I can also post these slides later in the mail. Yeah. Okay. So here's a quote from one of the users. This person said, a lot of times I look at a comment and all sub comments and be like, well, I can't summarize all that. It's really overwhelming. But then I was able to drill down into the sub sub comments and get the whole comment subtree and sub comments into my head at the same time. Write a summary and then go a level up. So this is sort of what we were anticipating with taking care of in Wickham. Okay. And then we also did a second study where we tried to evaluate the created summary artifacts in both Wickham and Google Docs. So we recruited 13 more participants and each person was given 10 minutes to skim the completed summary in Wickham with the ability to, you know, see all the original comments and lower level summaries. There's also a summary in Google Doc. So this is the complete artifact where there are summary. There's a summary written and the original comments are deleted when they were summarized by the participants in the previous study. And we also added an outline so you can see an outline in Google Doc where people can click to particular parts of the summary to make it easier for them. And then we also had as a control just a completely unsummarized discussion in Google Doc. So the original discussion. So 10 minutes is not enough time to read that entire discussion. Since I said earlier it was it would take around 30 to 40 minutes. So they really need to sort of skim in that time. And then afterwards we gave them a survey where they had there was a list of 12 points that were six of them were in the discussion. Six of them were not mentioned in the discussion. I was able to get sort of reasonable points raised by looking at proportions of the discussion that were not in our study but were in the original like comment thread that we grabbed it from since we only took a portion of the comment thread. And these points I came up with before doing either of these studies. And so then we asked people to. You have about three minutes. Okay. Yeah, great. Yeah, we asked people to select what they remembered. So the results from that. Overall users remembered the points more accurately in Wicom versus Google Docs and the Google Docs with summary and Google Docs and no summary. But there was no difference. So statistically significant difference between Wicom and the Google Docs summary. There was a significant difference between Wicom and the Google Doc no summary. But you know with 13 users. That's not too surprising, although we would like to do this with a lot more users. And we found while observing people reading through the different summaries. Most people would explore Google Doc linearly while Wicom they could export linearly but some of them chose not to so they took like a breath first approach, or they did depth first or like some combination of those two. In terms of some other findings. So some users stated that they preferred reading linearly down the page while other people liked being able to drill in. So thinking about people's preferences for speaking into a discussion, being able to see how long a discussion is before they click on it or the entire size of the discussion. So thinking about the presentation of the summary tree. And then also we found that people would open the summaries in Wicom to read the original content for different reasons. So some of them were intrigued by the summary and they wanted to read it so it was like a good summary. But other people thought a summary was not good and therefore they should open it. So one quote from a participant. It felt good on a few comments. It was very noticeable that there is a large amount of text just swirling around a few simple ideas and the summary got it simple like into a tweet. That was really, really nice. I wish everything could be summaries like that. So looking forward, what we are planning to do with Wicom going forward. So one thing is more integration of automatic summarization techniques and more machine learning to help out users. So we had a clustering component of Wicom where people could cluster comments at the same level into different clusters that were supposed to be sort of like topic models. But people didn't really use that so much perhaps because it may have not have been good enough. So improving that, adding more machine learning such as suggesting good places for summaries to be could be really interesting. Thinking forward to other types of discussions such as synchronous discussion. Also discussions that are ongoing. So right now Wicom is really good for a discussion that's finished. But say a discussion is ongoing, you write a summary, someone else replies to something and now that summary is perhaps stale. So how do we take that into account? We want to do some crowdsourcing experiments to think about ways to take care of quality control and also to figure out how much this would cost if someone wanted to pay for the service. And also of course field experiments. So thinking about who would do summarization? What's the good incentive? How would people actually use this in the wild? Some other things. So like I mentioned earlier, like presentation of the summary tree, are there better ways to present it? How can we preserve context? How do we add more cues to guide people as we're navigating? And you know, is there actually, you know, pros and cons of threaded versus non-threaded discussion? And the last thing was thinking beyond discussion. So can summary trees be useful for just, it's like hyperspace, getting lost in hyperspace problem where you have a graph of linked documents. You know, summary trees instead are more of a tree shape. So there's sort of less room for you to get lost. There's always a parent, there's always a context there. So you know, could summary trees be useful for somebody who's trying to get an overview of a topic? For instance, in Wikipedia. And yeah, that's it. So I have a preprint of the Wiccan paper, but the final copy is not available yet. So email me if you're interested. And I do other work related to online discussion systems. I didn't have time to get into you. So moderation tools, tools for combating harassment, and also like automatic summarization techniques. Yep, thank you. Thank you very much. So I think Jonathan has a question. Yeah. Actually, I'll let Giovanni go first because I can talk to you later. Giovanni on the IRC asked, I have a question for Amy. Do you think this system could work for summarizing things other than discussions, for example, reviews? Yeah, absolutely. So I kind of got into that a little bit at the end with Wikipedia. I definitely think that reviews could be really a useful approach. So, for instance, the problems I talked about with scale and redundancy definitely also have been reviews in this area, because there's no sort of threading of reviews. I could see automatic approaches being even more helpful because people have worked on automatic summarization of reviews, but not so much automatic summarization of discussion threads. So I think that, yeah, absolutely. Hey, one more question from IRC. Volker asks, on the topic of trust, has there been thoughts on how to address negative summarization? Yeah. So definitely, going forward, we want to do more studies where there is the possibility of malicious acts or people spamming the system. So far, we've only done lab studies, so that hasn't been a problem, but we definitely want to, I mean, that's obviously going to be a problem in the wild. So some of the things we thought about were, for instance, like using community moderation or moderation by the people who had the discussion, for instance, like maybe they have to approve of the summary before it's written or posted or people could write multiple different summaries and a specific summary could be uploaded. Other options include just, you know, having summaries, letting people rate them and letting people admins or community moderates know like, oh, this is not rated highly or this is potentially spam, and that sort of thing. We are at time, but we have a little more time, it looks like, if anyone else has questions. So I have a couple for Bosnek that came in late. I have one more for Amy then. How about that? I'm curious from your perspective, what are some of the real-world use cases, specific real-world use cases for this kind of tool? And obviously I'm thinking of Wiki use cases, but I'm curious to know what are some of the use cases you were thinking of as you started to develop it? Yeah, I think there are a lot of different use cases out there. If you can think of just, you know, discussions on the web where there are a lot of readers of a discussion, but you know, every new reader has to go through and read the entire discussion in order to get a sense of it. So basically any popular discussion could be useful for this. I guess I was thinking more specifically of when is it important that a discussion be summarized? What's at stake in a discussion being summarized? Yeah, absolutely. I think, for instance, like, deliberative discussions are incredibly important. So one example would be like the polymath community that they are doing research collaborations within discussion forums and coming up with new math theorems. And actually they are doing a summarization, but basically it's just the owner of polymath writing all the summaries of all the discussions themselves. So, you know, that's one obvious use case. And there are a lot of other cases where there are, you know, important or high stakes discussions going on where people are going to refer to it later because, you know, a decision was made or points were brought up that people then need to go and wait through. Awesome. And I think Wikipedia Talk pages is a perfectly good example as well. Thank you. Are you guys, but I'm still experiencing everything here. Okay. Is there any other questions? I think Aaron had a couple. Yeah, so this is for Vesnick. User, what am I doing asks about the system? How fast does this run? Can I run it in real time while I'm editing an article or is this more of a process in the background and get an email in the morning kind of computation thing? I would say the later. So it's, there's quite some pre-processing involved. So I think it's, yeah, so until you make the data available in such a format that you actually can run a query on a daily basis. So if you imagine that news come in a daily basis, then this would probably take a bit of time. This would work fast in case where you have pre-computed all these features and so on for a snapshot of news, let's say, and then you could use it in real time. But if it's like on a daily basis, then this would probably take time. Well, I think that we, Aaron, are you there yet? Are you there? We made Aaron, is there another question? So, sorry, I dropped out for the answer, but I assume that it happened. The other question was just, is this software released open source? So, right, so November is my target to actually release all the code and with all the explanations that are also in the paper, but also like with proper code documentation. So November might be a good target when exactly November. I do not know, but November, I think will be like when I will be publishing the code. Great, thank you. Thank you. Thank you very much, Amy and Pesnick. It makes everyone for asking great questions. I think with that, we'll wrap it up for this research showcase. Thanks for having me. Thank you. Bye-bye.