 So thanks Irene. It's very fun to be here. I've really enjoyed this conference. Everybody's really friendly and it's a totally different community than I'm used to and It's been really enlightening. I love how everybody's paying attention and not like totally, you know Academic conferences people get a little bit distracted and come in and out and stuff. So this has been really refreshing So as Irene said, I'm a professor of computer science It is a little bit true that when you get your PhD you get a little bit more out of touch with the hands-on stuff But I'm going to tell you today about some of the work that I've been doing in my lab with my students one of whom is here and our research area is really around Linguistic information visualizations what I call it. So looking at language data and how We can interpret documents and understand document collections We also have some work in visual visualization technique and interaction design and thanks Arvin for the shout-out to our dimfizz project That's one example of that kind of work that we do We also do some work with Natural user interfaces tables walls that kind of stuff, but I'll talk to you today about the visualization research So first I want to try and skewer a little bit some of the concepts that we see in the media about About the information overload so in in my grant applications. I'm a bit guilty of this too, right? We say oh, there's so much linguistic information. What are we going to do? We need to solve it and so things like this 2010 with the amount of information online is very easy for people to drown in useless information That they do not need in their business or in their lives if you try to absorb all the information you can find online Then you will experience social media overload But of course this isn't the first comment like this, right? You can go back a little bit further 1967 one of the effects of living with electric information is that we live habitually in a state of information overload There's always more than you can cope with and this of course is Marshall McLuhan famous quote Let's go back even a little bit further 1755 as long as the centuries continue to unfold the number of books will grow continually And one can predict that a time will come when it will be almost as difficult to learn anything from books as from the study of the whole universe It will be almost as convenient to search for some bit of truth concealed in nature as it will be to find it hidden away in the Ments multitude of bound volumes It gets even worse We have reason to fear that the number of books which grows every day in a prodigious fashion will make the following centuries Fall into a state as barbarous as the centuries that fall followed the fall of the Roman Empire How's that for drama? I should put some of this in my grant applications So I would say don't panic Solutions have also been around for a long time Utility of lexicon comes not from reading it from beginning to end which would be more tedious than useful But from consulting it from time to time and of course This is one of the early examples of an encyclopedia where she would consult instead of read Also techniques such as cutting and pasting creating an index for yourself understanding the documents structure through a personalized system Charles Sorrell 1673 the greatest secret is to make different marks or sorry marks for different kinds of passages crosses circles half circles Numbers letters and other characters which have had which had the various meanings one had assigned to them So this is a personal process of annotation that allows somebody to understand a document So the real problem is not and never has been the wealth of information The problem is the lack of appropriate tools for filtering exploration annotation and collaboration The pace of information dissemination exceeds the pace of tool and technique development We're always catching up and that's just me today So Visualization has also been used for text for a long period of time So this is a more literal sort of drawing of a visualization But these beautiful visualizations of Bible stories from Clarence Larkin Really depict the kind of possibilities that that are out there if we really delve into a document deeply And of course a lot of researchers in this field outside of my lab They've also been looking at this over the past few years and including before when I started working in this field in about 2007 So what kind of language data are we looking at we're looking at possibilities for things like understanding culture and society understanding the history of English literature Looking at the history of the court system business analytics, of course sentiment analysis these kind of things These are all under the field of language data analysis that I'm working in However, and and thanks to Marty Hearst for this example Visualization is not really a replacement for reading What do you think this book is first impressions? Goldilocks right Goldilocks and the three bears in fact It's this is a little quay to be or not to be from the nunnery scene in Hamlet So we do have this problem of a multitude of books But I don't think visualization is going to solve the problem of having to read them. I'm sorry to tell you But what I'm here to tell you about today is how visualization can help us find what to read And I think this can operate on a bunch of different levels So we start off at the level of text, you know, we have multiple corporates We may have multiple different websites or multiple different collections of documents We could go all the way down into a single document or a section of a single document down to the form of the letters in the topography And anywhere along this hierarchy. We also might have metadata at each level So it's a really challenging problem So what I want to Sort of propose today is that if we can make a visualization at one of these levels We're actually helping people dive down into a lower level to find out what they want to investigate more more deeply So for example We might visualize a document collection to help somebody find a document of interest and I'll talk to you about a couple of projects We have in that area So the first one is called flux flow and this was a project led by a student Jian-sao In collaboration. He was at University of Toronto in collaboration with my lab and some other collaborators And what we were interested in this project was looking at whether or not we could detect Rumors on Twitter and rumors being maybe also things that were unusual had unusual propagation patterns And this was inspired by events that we were seeing Where rumors would actually go viral on Twitter So we saw for example shark running down this shark swimming down the street during Hurricane Katrina was one And here are some others from the London riots So for this project it had both a sort of a big back-end piece in terms of the linguistic analysis And that was we had some help with that as well as the front-end visualization On the back-end in terms of detecting anomalies We had a variety of different features that we used to process all of these tweets So we collected millions of tweets from a particular event and Processed them using a bunch of different features on a high level the features were about the user So for example is it unusual for this person to be tweeting at this time or to be retweeting this particular person Or to be using this kind of language that they don't normally use Treat trajectory features means things like is the retweeting volume increasing in a way that's unusual And then also the lexical and semantic features of the tweet itself So we were looking at retweeting patterns and we created these retweet thread glyphs and graphs So the glyph at the beginning here is a single tweet and what in there it represents a few different things I'm not going to get too much detail into it because I have a bunch of projects to tell you about But here you're looking at the time since the tweet was posted The overall color or the hue of this circle is the anomaly score from an orange meaning Low anomaly to purple meaning high anomaly and then the size of it is the volume of the retweeting that's happening So the pattern across from left to right here then are all the retweets of this tweet Okay So each one of these circles is one person retweeting their original tweet and you can see here that it changes from orange to purple as our Classifier is saying that over time this became more unusual. It has a higher anomaly score later in the time period Which is why it's becoming purple The stuff in the background there is a bit of a artificial intelligence visualization shout out to Martin and Fernanda I don't know if they're still here, but I don't have time to get into too much about that But it's the hidden states of the algorithm that's doing the classifier is is what's the background coloring So you can see here for example on the top There's a lower anomaly score a lower volume and on the bottom. It's a higher anomaly score a higher volume So you can play around with this at this bit leaf flux flow address if you want The data set that you'll find there is the hurricane Katrina. I think yeah hurricane Katrina data set and If we look I just have a version of it here See if I can bring that over so in this version We can see I've loaded in a few tweets and this one is one that was given a higher anomaly score and it's actually They were talking about the People still standing at the tomb of the unknown soldier in in New Orleans during the hurricane which of course was a rumor So the way we evaluated this Was to look at Whether or not humans could do classifications of tweets better using this tool than what if they were just reading them and We used it against a couple of different classifiers And what we did was pull out the top 500 abnormally ranked Tweets and put them into the interface and allowed people to then manually triage them and classify them as rumors or as not rumors And we found that although the success rates were sort of low You can see here. We were able ours is the blue ours are the blue bars for the hurricane sandy example our top Our top ranked anomalous tweet was also ranked by our annotators as being an anomaly or rumor But when we started getting down into the top 20 we only had about a 40 percent accuracy rate So about 40 percent of the top 20 and normal anomalous ranked tweets from our system were rumors Or if you take this in in the context of random chance You'd be choosing one out of 500 to try and get whether or not it was anomalous or not So we you know we're better than a lot better than random, but still a bit of a problem So the message to take away here is that we were able to create a system that allowed people to find tweets That they had to read to decide if they were rumors or not, right? The accuracy rate was around hovering around between one or between a hundred and forty 40% depending on what list you were looking at But it wasn't enough that it was you can have full confidence But that said you didn't want we nobody wanted to go out and read all 3,000 tweets Or 500 tweets or five million tweets or whatever happens to be too many too much information So we're looking at here then can we make a system where somebody can? Speed up the access to the information of interest So another project along the same lines that we were working on a few years ago. It's published in 2013 Was looking at car accident reports? So in this project we looked at 600,000 car accident reports and these are car accidents that were reported to the National Highway Traffic Safety Administration here in the United States and Each of them contained a report of an injury or a death related to the accident So what we were interested in in this project was could we take those car accident reports and do a bit of a different kind of Twist on text visualization by turning it back into the original object that was being discussed so We took the we made our own ontology of keywords relating to car parts and to do that we used Wikipedia and a few other resources to gather car part words with some manual curation. We created this ontology We also processed the text To find synonyms and things like that And then we had to do a bit of manual tweaking to pick out problems for example first second and third are gears But they're also used a lot in a description of an accident where people will say first this happened second This happened so we had to take those things out. So as usual you're used to this some some cleaning of the data We use some stemming you'll see that coming up now and then and then we pre-processed all this to put it into a database The visualization works So that we get if something doesn't occur at all We show it as sort of a ghosted image to provide some context about the the shape of the vehicle And then if it does occur then we we play around with some rendering using a hue varying We really struggled here about whether or not to use lighting effects because of course the lighting effects Modify the way we interpret the color But they were also necessary in order to distinguish the parts from one another so there's definitely a design trade off there So this is what it looks like So we can drill down into different years different makes and manufacturers of the cars It's all based on a rendering of a vehicle and then we also have A lens that allows us to look at more details about a particular part of the vehicle anything within that lens We get a time trajectory sort of matrix plot or heat map about The occurrence of that car part in the accident reports over time Finally going along with the theme of the talk you can drill down and look at the actual documents relating to the part of the car that you've selected As well as the time period and the make and model and model year of the car that you've you've chosen So this is all intended to be used on a touch screen display, but it can also be used in a Environment where you're using a mouse and here's in it. It's just a video of how it works So just looking around this car you can see the various different details There are too many car parts to show them all at once and you can also use the little handle That you'll see they're happening on the on the lens allows you to sort of spin it around and drill down into the car itself I think that's gonna happen now Yeah, so you can sort of cut away the car to see the internal parts and and isolate those parts So what were we able to find well, we were just as a proof of concept We were able to see for example that the Toyota accelerator pedal issue that was coming up with the Toyota's un-controlled acceleration very highly coming up You can see here just the power just to go back to some of the stuff We heard yesterday around humanizing and the power of visualization the power of reading this example The actual text of it is so much more viscerally impactful than looking at the visualization So I really advocate for providing access to the underlying text data Because it sounds really scary right the way that the car just took off and they had no control over it And that's coming out from isolating the brake pedal and the Toyota you can see here the huge jump between January and February It's interesting actually the biggest increasing complaints about Toyota cars in this database happened after Toyota announced that there was a problem with the cars So then people started just jumping on board and complaining about this problem But we can see in before that we can still see that there are some issues being reported So what could we use this for so things like maybe an ambient visualization at a car manufacturer to see what's happening with the current incoming accident reports Maybe something that looks at hotel reviews or product reviews where you might want to highlight the parts of the product or the hotel room That are being discussed on social media Okay, whirlwind tour Here's another project so in this case We were looking we were interested in again finding documents to read and this actually was a project from quite a while ago I did with Martin and Fernanda at IBM when I was there as an intern so In this project we were interested in the US court system and our collaborator was particularly interested in the idea of forum shopping So this means do people bring a particular type of case to a specific court district because they think that they're going to get a favorable decision in that district So the if you don't know the US court system is broken down into a series of what I call circuit courts So they're roughly geographical and they're numbered here And then we had of course time and also parts of the court cases that we were interested in So we had the history bulk download of the history of all the courts I won't bore you with the problem of the fact that the data was so messy Quite a bit of my internship was spent cleaning up this data But we turned it into a database that we were able to work with So this is we were able to detect and separate the parts of the text and then do some Typical text analysis techniques like stemming and some stop words We had a dynamic list that was based on the frequency because we didn't want things like judge and court to be showing up on our visualization And then we use something called expectation statistics, which I want to talk a little bit about So if you're doing anything with relation to text one way that you can look at how a text is Interesting is to look at how the text differs from a reference corpus So for example if we had the collected works of William Shakespeare We might have these words as being the most common words if we had Macbeth We might have these words being the most common words the exact same set of words However, if we use a significance measure an expectation measure that looks at what's the likelihood that The occurrence of this word is going to be this high given the reference corpus of William Shakespeare as a reference Then we might now get the more specific words that are related to this particular play So we did the same thing for the courts We looked at given one court and all the rest of the courts as a reference What are the words that are more distinguishing for that court district? And this is what we found so notice first of all that we have words like Vermont and names like Kierce This was a bit of a Good good thing actually because it was a sanity check. This is actually what we expect right? We expect the names of the courts the judges the names of the states to appear as the most distinguishing things for that district so that was great. We knew it was working. So then we took those things out and Looked at the more contentful words that we were interested in So the visualization is basically like a word cloud however It's organized into columns which are relating to the different court districts We have these edge stubs that sort of indicate that a word is significant in multiple court districts You can see those here in sort of the light blue and then if you hover you get the full the full edge So here in this video sorry about the quality. It's like I said, it's an old video But you can see just some of the exploration of the details here. So Again getting into the idea of being able to drill down if you hover on any of the words You can actually get details about How that word occurs across all of the court districts because maybe it only occurs in one of the columns And you can also then see those connections Finally you can select terms in order to get the list of cases that contain that term And you can see here I'm going to select a few words relating to this was coming up a lot. It was really interesting to me These words I found out later relate to a coal miner a disease that coal miners get and they were bringing work workers compensation cases to the court so you can see here the details of one of the cases that Contains these three terms and then you can click this and get down into the full details So what else did we see some interesting things like the word ostrich appearing a lot in the seventh circuit? I didn't know what was going on. I was looking at this data every day I'm not a legal expert and I was wondering what the heck like why is I thought maybe there's an ostrich farm or something It turns out actually that this is a legal term and I'm excuse me if I get this wrong Anybody legal experts in the room that basically it's an instruction that the judge can give to the jury to say remember that It's no excuse that this person didn't know what they were doing was wrong, right? You can't put your head in the sand and this was actually used in a in a case of famous Canadian was prosecuted here in the United States and it was used in that case We also found some interesting patterns so for example our drug and narcotic related terms have some weird geographic differences So methamphetamine in the West versus What do we have your narcotics in the Northeast? And we don't know of course if this is actually meaning that there's more prosecution happening or if it's actually a higher rate of of use of these drugs Okay, so that's document collection to document so let's look at another project which is document collections to Multiword collocates and here what I'm talking about is things that are like phrases, but they're not actually full sentences And this project is really was really exciting for us. We were looking at passwords So passwords are things that we write every day. They mean a lot to us We know that we're going to be writing them over and over so we choose them carefully They're often very personal and we found that they can be very evocative, right? So they have a lot of potential emotional content. So we had millions of leaked passwords From various different websites and we were looking first of all we were interested I was working with my colleague Julie Thorpe and PhD student Raphael Verash and we were thinking about the security implications of if we could detect linguistic patterns in these passwords And of course as soon as we started looking at it. I got really excited about also the cultural like cultural analytics implications the social linguistic implications of this data So what kind of words do people use in their passwords? Do these patterns represent security vulnerabilities? So can we train a model to learn the types of patterns that people use and the answer is yes, unfortunately But it was very good model and what did the passwords tell us about society and culture So the process we undertook was to extract 32 million Passwords and parse them to extract the most likely word sequences and one of the things you'll hear me say a few times Is that we use reference corpora? So in this case we were using the corporate corpus of contemporary American English as our reference corpus Which is available. It's unfortunately not free, but it's available online And to do this then we categorize the words based on their semantic meeting using a tool called wordnet And that is available for free online And then we parsed those results to create a grammar of passwords basically and we were able to see that the grammar of Passwords is very different than the grammar of regular English So for example, and this is the visualization you can find this online. I should have put the URL but Let's see here Okay, so Anyway, so it's my lab lab website slash words and passwords and on this visualization we can see that the What you're gonna be looking at probably the most is this column Which is the g-squared measure which is that expectation measure so every line here is a word and we're looking at the ranking of the words based on their difference in Expectation given English as a reference so the most the word that's most common in passwords Against English is the letter is the letter i right so people talk about themselves a lot in their passwords the second Most common word is love Which was huge we saw it throughout it was very unexpected. It was seriously significantly high And this is hard to do on it. I'm X screen while turning around and and then down here you'll see things like a lot of affectionate terms like baby and sexy and love and those things so We found that people were really interested in in Playing with this and the implications were that the language of passwords was very different than the language of English So you invite you to play around with that a little bit We were able to use this to create a password cracker Which was the best on a number of different measures and we were also able to find some really interesting Phenomenon for example animals cute animals were way more common than ugly animals Which we sort of would have expected I guess but monkeys cats for some reason monkey is like the number one animal dolphins monkeys cats and dogs Emotional verbs like love are extremely common and in fact we also found and we didn't and this was verified across a Couple of data sets people loved male names four times more than female names. We're not really sure what this means if it's perhaps heterosexual women writing I love in a male name in their password or people writing I love myself for something with their own name We tried to normalize for the dynamic for actually the population of users But of course we don't always have that information in these leaked data sets So we didn't have we weren't able to do that and also profanity is extremely common much more than predicted in that in the Sort of canon of security research where people come into the lab and they're told please create a password in the experimental setting And don't worry won't be associated with you, but they still don't use profanity nearly as much as they do in their real passwords So here's just a preview of another visualization. I'm going to show you in a second looking at Animal words. These are the animals in the data set So this got some media coverage. It was fun to collaborate with the New York Times on this article and then we also looked at some Number patterns so again this one looking at Things like dates so on the top here We have a calendar 365 days and the brightness is or sorry the dark blue is more common A date number pattern that matches that in the pet in the password So one four three four four anybody know what that is It's I what is it? I love you Very much. I love you very much. Yeah, I love you very much is the number of letters So it's not actually a date. It's a mistake But it's what people would write if they were trying to say I love you very much in their password So again, the most common number pattern is I love you, which is sort of nice. I guess We also found things like 90210, which is again a mistake, but it's a really common date But we found some really interesting date patterns as well. So for example these Dates from September 11th 2001 things like FDNY 9 11 01 and NY 9 11 01 and Twin towers here as well TT Okay, so I packed too much into this but Sorry, I'm getting the okay from here. So I'm gonna keep going So let's see here. So this is a project where we're looking at the document level So can we look at from the document down into particular segments of the document? so in this project we looked at again the wordnet database and a book and from there we go in two directions so from wordnet We're extracting an ontology of words basically that is our relationship. So Water is a liquid game is an activity chairs of furniture and we're also extracting words from the text and stemming them and bringing those two things together to create a visualization that Creates this somber diagram where the darker the green the more common the word is in the text So it's sort of like a word cloud, but it's organized semantically The latest research. This is a preview. It's not yet published But I've got the students permission to tell you a little bit about it We're working on right now is a general method for what we're calling uneven tree cuts so in this technique and we're going to publish this open source online is a Method for automatically determining. What's the initial view that you should show of a large hierarchy? And this is being done with with D3. So here we have a tree and you can see that the yellow and the and the Orange lines are different depth of a tree cut so if you look at a traditional hierarchy here on the bottom you normally would get say three levels of depth and What our algorithm is doing is using information theory and things like entropy to try and measure What are the characteristics of the data underneath that level and maybe we should open up some of those lower levels in the initial view? Because there's something really interesting going on down there, and maybe we should collapse other levels so here on the top is our version of the initial view and as we expand out we see that you know the the Traditional view expands everything and our view actually expands more deeply in some regions and more shallow in other regions to show The data that actually has higher values So we're really excited about this work. We've submitted it for publication. So hopefully it'll be okay But the demo I'm going to show you includes some of this Anybody remember this? so this is um This is a feminist hacker Barbie who is a response to this awful book Barbie becomes a computer engineer and we've got actually Some data from hello Barbie, which is a a new Barbie doll that you can talk to your child can talk to I guess and We're we have the script of all the things that hello Barbie can say So we're interested in this data It would be great to be able to give it to Kyle to put into his turn and linguistic generator to see if he could create Barbie language But we've put it into docus to try and Look at some patterns in this data. So here is the word entity So these are all this is the top level of things that Barbie might say and again This is the general version and then I'm going to switch over to the uneven tree cut version So this is just a video because for time purposes, I didn't want to mess up the demo So here is all the words that Barbie can say the darker things you can see like she's interested in talking about her Friends and her parents mother and father. She's asking questions a lot too. So the Barbie doll is saying like how about your parents What else is here food? Fun, of course, it's a toy though. So we can maybe give it a give it a pass for talking about fun and games a lot But let's drill down a little bit So if we look in the area of cognition in wordnet, these are the cognition related words that hello Barbie will talk about so kind and right here are a little bit of an anomaly because of the The way that those words are used in regular sentences. So it's a bit of a mis characteristic But she talks a lot about fashion and she tries to change the conversation around to say let's talk about fashion now How about okay, we're enough talking about that. Let's talk about fashion. I love fashion There are so many comments here about fashion in this data set Let's get a little bit deeper and we have to really drill in here to try and find where she talks about things like science and math There are some there's three things about physics So I think you know, there's some response here to try and bring out some diversity and comments But the fashion is still a really big thing and there's I have nothing as fashion It's just you know, I thought you might find it interesting Definitely Barbie's got a different view on math now than she used to so she's really into math, which is great And it's fun to see and let's see where else this goes. So there's one more thing No, those are they're gray because they're not used She's our favorite colors pink and red green is here because of Christmas She doesn't talk about any other colors though, just pink and red The absence here means something so there's no other colors Let's see, okay Second last example. So look at person words. So person words. She's really interested in medicine So Dr. Dentist veterinarian teacher as great other related person words. So things like princess is huge and In here, it's both talking about sort of princess your Halloween costume princess, but also your princess bedroom your princess this and that So there's lots of princess themed things And then let's try I had to look around a little bit to try and find this But if we these are all like people related words But if we go way down we can see that she actually does talk about engineering a little bit Deep into the data set Okay, so so that's hello Barbie It's been fun to work with this day a set and thanks to my student Raphael Vera She's done this analysis over the past day and a half I've been slack messaging back and forth with him to try and get this ready for the talk So we have an online version. Unfortunately, it's broken This is one of the things that I'm starting to get used to when students graduate I have to maintain things and it's hard to do And this was actually using a service called open Cal a which was using it You were using it for an entity recognition and it recently changed everything and it's no longer working. So we're gonna get it fixed Okay, I'm way over ish three minutes. Okay, I'm gonna read you a quote Once upon a midnight dreary while I pondered we can wary over many a quaint and curious volume of forgotten lore While I nodded nearly napping suddenly there came a tapping as someone gently wrapping wrapping at my chamber door Which one is this which bar any intuition? Third one. Yes. So this is a new project called lexic Rome My student so that was at a ground on Po by the way Very dark and dreary language student Chris Kim who's here today at the conference has been working on this project I'd invite you to check it out. We've been looking at crowdsourced data of how people respond to terms With a color reaction. So what kind of colors do words evoke in people's minds? So we have 12,000 ratings of colors done where our colleague Seth Mohammed and we've been turning this into a Visualization that allows you to explore the language based on its color in the colors that it inspires in our minds So we can look at the agreement levels So people really strongly agree that cowardly is a really yellow word Sunshine is even less agreement. Some people say it's red or orange, but cowardly is the most positive We can look at the language structure. So for example here you might think This is available. You can play with it yourself lexicrome.com the upper right is is Not words relating to grass and plants. It's actually greed and money and the bottom The bottom left are words relating to love and affection so you can explore this and drill down We also have an editor that allows you to type in your own document and actually replace the words with other synonyms That might evoke a better color that you want to do so imagining you're an ad executive This is the coca-cola manifesto and they're talking about refreshing and we think maybe they might want to change it to the word Invigorate in this case to bring out some red feeling in the in the reading of this document Other things we're playing around with just as I wrap up. So we're looking at mapping Literature so looking at this is 12 years a slave looking at the patterns of movement of characters in a book Using open tools to parse the text things like geo names and Google Maps Disambiguating locations is a real big issue there. So we've been working on that What do you do when the place has multiple locations the same name and we've got some algorithms for that they don't have time for And some tools so some things that we use generally in the lab Of course, we're doing things like tagging text. We our process includes labeling people cleaning the data stemming Inflection is one that we do so for example in the synonym replacement example Refresh to invigorate if it said cats we would have to make it dogs and not dogs So we have to do inflection of the synonyms as well as just the replacement We often are doing things like partitioning text. We do some topic modeling calculating scores those kind of things We're using some of the same technologies that you've been hearing about over the last couple of days And we're also using a bunch of NLP resources that you can find out more about if you watch this video and pause it and read this slide and I want to give you a bit of a caution that There's also some challenges here. So for example, we saw the word team in the passwords investigation And we were really curious why people were talking about teams so much more in passwords than in regular language And when we don't dove into the data a little bit more and drill down again getting to reading the raw data. We saw this Te amo Te amo Luis Jesus Te amo. So this was actually just the I love you back again, but in Spanish So we were assuming English passwords, but of course not all the users were writing in English in their passwords This one's from Barbie, but this is what she actually says Chelsea would love to see your shell collection sometime So I'm assuming Chelsea the shell collection that the child is telling Barbie about is really seashells and not weaponry, but So we have we don't but we but the systems we're using don't have the ability in the small amount of data We have to disambiguate the meaning of this term We have things like part of speech tagging where things change a lot We also have open research challenges relating to the ambiguity of text the volume of text and things like the legibility So we're doing a lot of manipulations of text. We're putting backgrounds on it. We're skewing it. We're rotating it we're Putting links between things and we don't really know how we're affecting the legibility So some of the research we've been working on lately has been partnering with people who do perception research to try and understand That a little bit better Okay, I'll end there. Thanks to all my students the ones in Boulder have work that was presented here today. Thanks a lot