 All right, go ahead and get started since I have quite a few slides to get through. So thanks everyone for coming. I'm Wes Chow. I'm going to be talking about the Digital Hearth, which is a device that the research group and the nonprofit that I'm in has been designing to facilitate group conversations. This is probably going to be a little bit one of the more unusual talks here at the conference. You can turn off your technical brain for about 10 minutes, and then we'll actually start to dive into some details on how this thing works. I'm honestly not sure why the conference accepted my proposal, but here we are. So I work for a nonprofit called Cortico, which is closely attached to one of the research labs inside of the MIT Media Lab, the Lab for Social Machines. It's kind of a mouthful, sorry. We are a kind of deployment arm for the research that's inside of Social Machines. So first I'll give an overview of the goals of this project, and then I'll go into the details of how this thing works. Okay, sorry, I'm not able to advance slides, give me a sec. There we go. Okay, so my boss is Debra Roy, who runs the research group as well as a nonprofit. He is the former chief media scientist of Twitter. And so a lot of his old research was around the characteristics of online discourse, particularly on Twitter, and I have this secret theory that he was tired of people asking him to think about online toxicity, and so he pivoted to a thing that was completely opposite of what you might be able to do online. So the question there was, is there some kind of a social medium that might be more representative of ground truth than what you see on Twitter and Facebook? This book was going around the lab at the time, and it was starting to kind of catch in people's minds. This is the Politics of Resentment by Kathy Kramer. Kathy is a researcher at the University of Wisconsin, and what she did was for something like, I think like 10 years she drove around Wisconsin inserting herself into conversations and these things that she called coffee clatches, which is basically a group of people that were naturally congregating, and she would go in and ask kind of very passively directed questions. So one example from the book is that there was this group of men who had been getting together for something like 10 years to grab coffee at a gas station before going into work. So she went into these communities and found those people. The tail end of her field work overlapped with the Scott Walker recall vote, which if you remember, so Scott Walker was a conservative union busting governor of Wisconsin, and the teachers union in particular in Madison was not particularly happy with him, and they garnered enough public support to start a recall vote to get him kicked out of office. And it was widely reported in the media at the time that he would probably be kicked out. Surprise, surprise, he was not kicked out. And Kathy actually didn't believe that he would be kicked out based off of the conversations that she was having with people in rural Wisconsin. So they're kind of shades of what you hear people like Peter Thiel talking about right now where there's this notion of preference falsification, which is that what people reveal sort of publicly in surveys is not what they actually reveal if you talk to them in person. So we started thinking about Kathy's work and we wanted to figure out a way to scale her work up. So we started on the local voices network or LVN. So LVN is centered around facilitated conversations with the intent to surface diverse and under her voices. To do that scale, we need more than one Kathy. Ideally we'd be able to very cheaply scale out Kathy. We can't have her spawn out the copies of herself so we have to turn to technology to do this. So we use hardware to help us with the collection of data, but since these are conversations with people, we also need to have a good way of pulling in participants. So we have a network that we've kind of layered on top of this. It's a large volunteer network and the idea with it is that it kind of sustains the growth of the data collection. Our future interest is going to lie in the strength of this network. And we think that it will be the basis for connecting communities and for crossing social boundaries. And I'll give you an idea of how that happens later in the talk. Finally, so we'd like for LVN to have a real outcome in the world. This is the reason why Cortico exists. It's because we can set up Cortico so that as an institution its metrics of success are not things like papers published, which is what the labs and metrics are. So Cortico through LVN then sets up this channel for policy makers and journalists to get a better idea of what people are talking about in their communities. So policy makers and journalists right now are target users. So our very first experiment was in Mott Haven in the Bronx in New York. So this is like our alpha. So Mott Haven is the poorest congressional district in the U.S. We put out flyers like this to get people to come in. We also, I believe we published this in the Mott Haven Herald, so the local newspaper. And over the course of a couple of days we had just a study stream of people coming in and talking to us. This is the very first version of our thing or the alpha of it. We called it at the time the conversation box. And it was designed for conversations with just one person, like more of like an interview style thing. What this thing actually is, is just a wood box around a tablet. It was kind of the fastest way for us to get something going. Being a bunch of engineers at MIT, we still of course had some issues and things actually caught fire because we deployed this thing in the summer in New York and it was getting to, I think it was over 100 degree days and there were no ventilation holes in the box. We learned our lesson. In future versions, any time that we kind of smelled smoke or thought that something was on fire, that was our first thought, is that we need more holes in the enclosure. This is Max Resnick. At the time he was a student and he ran a lot of the conversations for us. Now he's a full-time employee of Cortico. So what kind of questions do we ask? What do you like most about your community? What do you like least about your community? In Mott Haven, the responses to this question was actually surprisingly unified. People talked a lot about the diversity in the population there and that's what they loved about it. What they were all concerned about actually was gang violence. As we were getting all of his audio, we were showing this around to various people, to some potential donors. One donor said that he actually knew that the NYPD was planning on defunding the gang violence unit. So we kind of lashed onto this as a piece of evidence that it's possible for these conversations to maybe surface some views that weren't quite obvious. But still these conversations were isolated and they were one-on-one. So how we turned our attention to trying to figure out how we would bridge people and how we would try to get people talking to each other. So this is version one of our group system, our group conversation system. So these conversations were, they're still facilitator directed. So there's a person who's trained in how to use the equipment and the person asks the kind of passive style questions that I just showed in the last slide. The conversations included, they include typically between four to six participants and they're quite long. They're about an hour and a half long. The conversations are, well so the hearth has to work kind of in an offline way for a variety of reasons. So that informed the design and I'll talk about that later. And there's also a highlights system that we built into the interface and I'll talk about that later as well. So this is version one of the hearth. One of the principles in the design was that it had to be a very human object. And we have found that the quality of the conversation is different when you have people sitting around something like this versus something that might be, you know, that looks more like an echo or like a Google home. So it's completely solid wood, you know, it's very, it's nice and hefty. There's a soft speaker girl on the top of it and then there's an LED ring that serves as a state kind of state indicator for the hearth. When it's recording it's orange and people actually stare at it in the same way that when people are sitting around a fireplace they stare at the fire and talk, just kind of talk to the whole group in that way. So Kathy Kramer, she came on as an advisor to Cortigo and is doing her sabbatical in the lab group. And we deployed our first set of hearths about a dozen of them into Madison, Wisconsin in January. Why Madison? Well, because of Kathy, she has very strong connections to people to the community there and people there. But also at the time they were in the process of electing a new mayor. So we went in in January, I believe the primaries were in February and in the actual election was in April. So there were about, I think there were something like eight candidates when we first went in and half of them had access to the LVN data. And then of the two final candidates one had access to the LVN data and that was a person who won. We also gave access to journalists in the area. So the Cap Times is a local newspaper there and so they wrote some pieces using the LVN data. We partnered closely with the Madison Public Library for distribution and kind of keeping track of the hearth. As a long-term strategy this is the thing that we've talked about a lot is trying to put these things into libraries. There's approximately one library for every 10,000 people in the U.S. So it's a good kind of a vector for distribution. And so these hearths are still running now. At this point we've accumulated several hundred hours of speech. Okay, so enough of the mushy human stuff. I'll talk about the cold heart attack now. So this is V1 of the hearth. When it's in this state we actually call this open hearth surgery. At the core of it is a small raspberry pie which serves as a place where we kind of centralize all the complexity in the system. There's a speaker in the middle of it and the speaker's used for highlight playback and I'll talk a bit more about that later. There's the LED ring that I mentioned before. The hearth itself doesn't have a whole lot of controls on it so we use an iPhone that's paired with the hearth to actually start and stop playback and stuff. And then again, the entire feel of it is much more substantial and human than the consumer devices that you're probably used to seeing. So version one was solid wood. We encountered some issues with solid wood, namely it swells and we also didn't really plan properly the amount of tolerance that we might need in the screw holes. So within a day or two after assembling these hearths it became impossible to actually pull the pie out. So our yield here wasn't great but we did get a good number of hearths out. So this is version 1.1. There's a kind of internal plywood structure which is much more tolerant to things like humidity and then the outside of the hearth is a wood veneer. You actually can't tell like if you're just a look at it from a couple of feet away you can't tell you would have to pick up the 1.1 hearth and actually examine like the kind of wood seam lines to be able to tell that it's not the same as the first set. These are the custom PCBs that we had printed. So this deals with the power circuitry and the microphone array that's in the hearth and I'll go into that in a bit. This is our assembly room in the Cambridge Innovation Center which is a co-working space in Boston. This office is actually at the intersection of two hallways so people are always kind of peering in and kind of wondering what's going on because most of the companies in this space are just people staring at monitors. This is where we test all of our components and trial different kinds of hardware configurations. This is our Canadian team member who's curling the hearth. If you can't tell, this is actually fake news. We were actually concerned about air travel and security so we had the Canadians on the team go first. The hearths themselves have a switch that completely cuts the power off just in case there is some kind of a battery issue. So as a safety mechanism we have that in place and if people at the TSA at the airport ever asked what the thing is we just say that it's a speaker and it seems like that could be plausible. Okay, so how do we control the thing? So we have this, each hearth is paired with an iPhone and there's an app on it that controls all the activity on the hearth. The app updates are tied to deploys on the pie. I'll explain how that's done in a bit. And the connectivity. So we tried a couple of things. The first thing that we tried actually was for the iPhone app to talk to the hearth over Bluetooth. We had some issues with this. The app itself, some of the request payloads are quite big and the app itself is actually quite big and it loads off the pie when it first starts up. So we were seeing latencies that were kind of unacceptable for interactive use. Also just trying to issue play and pause commands like you expect the hearth to respond pretty quickly and we weren't able to get a good latency from it. So our next try actually was we put a secondary Wi-Fi dongle, it's plugged into the USB ports in the pie, like inside of the hearth. So the pie has two Wi-Fi interfaces and we have a medium post where we go into all the details of the configuration of the operating system it's raspy and how we pulled this off. So the pie has two Wi-Fi interfaces. One goes to the public Wi-Fi and the second one is private and a broadcast is a unique SSID. So in this case, hearthnet five corresponds to one hearth. And so when you start up the phone, you just pick the SSID that you want, that phone, which hearth that you would like that phone to control and so that's how you pair it. The phone makes API calls to a web server that's on the pie, that's bound the pie.local address. The pie runs a zero conf debon, so pie.local resolves correctly when the phone connects and the pie also serves as a DNS server for the phone. And the phone talks over the private Wi-Fi to the web server. So the phone can issue these API calls to the pie to control the playback and the recording hardware. The phone app can also configure the pie's Wi-Fi. So this is how we get the entire thing onto the internet. It can pass through a password and everything. And then once the pie gets on the internet, it sets up IP forwarding. So the pie itself actually acts as a gateway for the phone. And so that's how the phone can do iOS updates. As a side effect, this allows us to get through login pages. So like if we're on a network that requires you to accept the TOS, since the IP traffic is just passing through to the phone, we can actually see the terms of service on the phone, say yes there and then, but since the packets are passing through the pie, the MAC address that the public router keys on is actually the pies and not the phones. The hearth is stored inside of the Madison Public Library system, which serves as a home base. So these are codes on the hearths. So they're actually part of the circulation where they're checked in and checked out. When they're checked in on the library's Wi-Fi, then the pie will sync stuff to and from our servers. So offline operation. So oftentimes the hearths are used in environments where there isn't easy access to power or to Wi-Fi. So we had to build this into the design of the thing. So for instance here, this is a short stack eatery, which is supposedly the best pancake house in Madison. You can notice the drink on top of the hearth. We didn't plan for people to actually put any kind of locusts on the hearth. We're lucky we haven't had any accidents. Can't plan for everything. Since the pie has to run offline, it needs a nice big battery. The pie consumes typically between 500 and 1,000 milliamps of power. And this is a stupid big USB battery pack. It's 26,000 milliamps. So we get about one to two days of continuous operation here. We don't ever run the thing for that long. So there's a microcontroller that's in the PCB that provides power to the pie and then also listens for power button on off events. And the micro needs a tiny amount of power for it to run. One of the issues that we encountered actually is that if the USB battery, I guess as maybe a safety mechanism or something, but if the USB battery doesn't detect a power draw above some threshold, it just completely cuts power off. And so the micro was underneath this threshold. So what we did was we set the micro up. So there's an LED that's internal to the hearth that the micro cycles on and off and that consumes just enough power to keep the USB battery going. So this, when everything is operating correctly, this actually extends our runway to about a month if the pie is powered down but the micro is on. Our next version of the hearth will have much more well thought out power circuitry. Okay, so how do we do software updates? So the iPhone app itself is updated through the app store but we actually don't push updates to the app very often. We use a framework called Apache Cordova which it was formerly called Phone Gap. So it's basically, it's a way for you to write a JavaScript application that has access to the native controls of the phone through a web browser. So essentially the app itself is just a web browser that doesn't have any of the navigation controls on it and then it runs your JavaScript application as it was just a normal website. So we hosted JavaScript on the pie. So when the app starts up, it's configured to grab the current version of the control app from the pie. And so in this way, we can sync the updates for the iPhone control device and the software that's on the pie, like in all in one deploy. We have a version database. So we keep track of which versions of what software is running on each hearth and we push the updates out through Ansible. Now, a typical Ansible run assumes that your hosts are actually online, but in our case, our hosts are usually offline. And so we came up with kind of an async update a system where when the hearths come online, they check to make sure that they have the correct version of their source code. If there's a version difference, then they'll download the new version of the source code which contains a bunch of Ansible files in it and then they'll run Ansible locally. So that's how we keep these things up to date in in sync. We have a monitoring system. So we know which version every hearth is at, when was the last time that it was offline and so on and so forth. All right. So the microwave. So this thing has eight microphones in it. The diameter is somewhere, probably not quite a foot and a half or so. We use Mem's mics. They're pretty good for this purpose. And the pie gets the audio data from the mics over the GPIO pins. And since there are eight mics, we get eight channels of audio and eight channels of audio is actually quite a lot of audio to be sending over GPIO. So we had to invent a kind of interleaving scheme. So we take these eight channels and what we do is we bring them down from 32-bit samples to 18-bit samples and then we sample at a 16 kilohertz rate. After all that reduction, we can then stuff that into two channels of 48 kilohertz, 32-bit samples. And so what the pie sees actually is two channel audio that is completely nonsensical. There are actually eight channels of data that are stuffed into two channels. And the way that we stuff those eight channels in is with the scheme. So here are eight words. And so I'm not sure you can tell. The first kind of row of colors is dark red. So that's the first sample from the first channel. So it should be 18 bits. And then there are 16 bits of channel markers. So the channel markers all start with zero, one, zero. And then the last three bits show which channel number the previous sample is from. So zero, zero, zero is the first channel. The next color is pink. There are 18 bits for a sample there. Channel markers starting zero, one, zero, and then zero, zero, one, which says, hey, this is channel two. And so the pie grabs all of this. In various places downstream, we will de-interleave this and convert these files. We call these .ca1 files. We'll convert them into eight channel wave files for processing. And we were initially doing this in Python, but Python kind of proved to be too slow. So we converted it into, we actually used numpy. And so we rewrote some of the core code to use vectorized math in which case it was fast enough. But we had some issues with the pie kind of losing a few bits here and there or like partial words, which would kind of shift the entire bit stream over. And then the vectorized math wouldn't work very well. So in the end, we just bit the bullet and wrote the entire thing in C. So now it's much faster. It's much more robust to failures. This audio is then posted into our pipeline. Okay, so here's our processing pipeline. I'll just run through these steps and then I've slides on each one and then I go into details. So we take these .ca1 files and we asynchronously upload while they're docked at the library. By async here, I just mean that the audio gets sent, not when the audio is finished recording, when the hearth goes back online. So we push this audio from the library to our servers. The first thing that we do is we normalize the audio levels and then we send it to two different transcription services. One is human-based and one is the Google API. We also have a third one that's based off of our own stuff. I'll talk a little bit about that, but it's currently not really in production for the hearths. Then when the audio comes back, we do some metrics calculations. We calculate top terms. Top terms are kind of like a topic modeling thing that we show in the interface and then we do a thing called topic indexing, which I'll show some examples of that. So each one of these things is a distinct step in the pipeline. We're not very cohesive about how we string these things together. So in some cases, the phase is triggered through JobQ, through we use Rabbit internally. We have a slightly older queuing system that's based off of Google watermarks from Millwheel, I think. And then we also, in some cases, use S3 triggers. We like to make this all one cohesive thing, but maybe sometime in the future. Okay, so this is the raw audio on the left and then the range compressed audio on the right. So you can see the raw audio is actually quite low. It's not that we're not really picking up the audio well. It's actually that these mics are quite sensitive. So they actually do a pretty good job. So we normalize it. It's range compressed, mostly for human consumption. When we send the audio to the transcription service, the humans have to be able to hear it. Speech-to-text systems, like Google's API, will typically do their own kind of range compression. So, right, so we use two transcription services. One is Rev, which is humans and the other is Google API. Rev is fairly accurate, but it's very expensive. It's about 30 times as expensive as a Google API. Google is, of course, less accurate, but it's dirt cheap. So what we do is we run the translations through Google first and we show those transcripts and there's like a little thing that says, hey, come back here for higher quality transcripts. Rev usually gets their version of the transcripts to us between a couple of hours and 24 hours. One kind of an interesting technical problem here, actually, is that Rev gives us the transcripts with time alignments on speaker turns. So, like, when a person starts speaking, Rev says, oh, right now, I think that says one minute and 14 seconds in or something. But in our interface, we allow the user to click on a word and have the audio jump to where that word is, regardless of if that word is the start of a speaker turn. So we have a system that does some sort of light speech to text on that and then we'll take the Rev, the Rev supplied alignments and align those word boundaries to the transcripts. So in addition to these two transcription services, we're also working on improving our own internal thing, which is based off of an open source framework called Calde. And that's a setup that we built from a different project where we're transcribing about 3,000 hours of talk radio per day. And we've been doing that for about a year and a half. At that rate of transcription, it would be prohibitively costly to even use Google. So after these conversations are transcribed, we run some analysis to characterize the speech. So we look at things like the speed of the dialogue, worse per hour, how long are each of the speaker turns, mean inter-speaking silence, I guess, is how awkward the conversation is, how awkwardly silent it is. Turn-taking balance is neat. So this is an information-theoretic measure of how spread out is a speech in this conversation. So if one person speaks for 99% of the time and then everyone else kind of takes turns for the last 1%, that value actually is very close to zero. Whereas if every single participant speaks for the same amount of time, then that value is very close to one. Yeah, so there's an interruption rate. The matter of lexical diversity is a measure of how many unique words there are from, I think, a sample of a thousand words. And then mean word length, 4.3 words. Conversations love to have four letter words. So, oh, and then the very last metric is actually an important one. So that's a speech-to-text transcriber's word error rate. And we do some research on speech errors and bias in particular. So we actually run the audio through gender classification. So we know, for instance, what the ASR error rate is on a gender breakdown, as well as geography in the case of talk radio data. And we believe that geography is a good proxy for accents. And so we have some papers coming out that show that indeed a system like the Google API does have a bias against certain ethnic groups. So we calculate top terms, which roughly corresponds to the topics of speech during the conversation. And then we link them into the parts of the conversation that are about those terms. So these terms, these are TFIDF computed terms. So basically unusually frequent terms, but they're filtered down by the topic indexing phase, which I'll talk about in the next slide. This interface is actually a primary method of discovery for users of the site. Topic indexing. So this is a curated set of things that we care about that we like to track in these conversations. And the way that this works is for any category here, let's say childcare, we curate a set of very high precise terms for that topic. So we say school has a lot to do with childcare, education has a lot to do with childcare, for instance. Then we build a word-to-vec, a word embedding model on the talk radio corpus. And then we look for words in the embedding that are mutually close to all of those high precision terms. So this thing, for instance, it naturally discovered that the phrase creative writing, for instance, has a lot to do with the high precision terms that we picked for childcare. So we set up these topics and then we can link from this into conversations and the portons of those conversations that people are talking about these topics. So this is a work in progress. I talked about the microphone array but didn't actually say what we use it for. The goal with the microphone array is to be able to diarize a speech. So this is to separate out voices from the audio and transcribe them separately. So the Google API is really bad at doing this. Rev is pretty good and we count Rev as ground truth but even still humans get things wrong. So it like particularly happens like in a very long conversation, sometimes you can tell that the transcriber is like getting fatigued or something and just becomes more lazy like near the end of it. Misattributed speech is a big deal for us. We allow participants to retract audio and most of the retractions come in two flavors. One is take out my name which we take names out anyways and the second kind of retraction was hey, that wasn't me. So these conversations oftentimes have very like highly personal stories in them and if someone is talking about childhood trauma and you say, oh this was John but it was in fact Jack, that's a problem. So even if the accuracy is 99% if we get it wrong 1% of the time in those cases then that's not cool. So any kind of advantage that we can get to separate out of the speakers will take. So on the left is a clustering of speakers using speaker embeddings. It's a method that's published by Google called Dvectors and I believe we use an open source package for this and that's based off the frequency spectrum and so you can see the labelings and the space in the visualization, the projection is kind of muddled. The second image is also stacking in information about time delays in the microphones right that we can see. So the amount of time separation between two microphones at opposite ends of the heart is about one one thousandths of a second I believe. And so you can see these clusters are much more distinct. So this is the thing that we were working on this this past summer and we'll be integrating it into the pipeline soon. There's a paper coming out probably this academic year about these methods and we're hoping to also publish and maintain source code. Okay, so the last thing of note are our highlights. So the highlights in this system are curated by a special set of users and what happens is that these conversations are paused about halfway and the facilitator will play a highlight and then ask for a response from that group. So we get a lot of like interesting responses like this. This is the way to kind of force different viewpoints like into the audio and the phone interface like has a way for you to pick which highlights get synced to which hearts. Since the hearts have to work offline what happens is when they do go online in the libraries they'll download all of the highlights that they're supposed to have and then the phone app can play them. So we initially built this thing using kind of a duct tape R-sync based system but then after a couple of weeks of operation we switched to a custom API which gave us a lot more control about unlike how the sync is done. And then what happens with the highlights is so if you think about the conversations there's nodes in a graph. A highlight is an edge between two nodes and so this forms a kind of network structure. And so we're starting to look into how these highlights are cross-pollinated. The decision to share a highlight in some conversation is made by the facilitator. It's not a thing that we curate. And so with enough of a network and enough kind of people out there doing this kind of work we should be able to discover sort of structures on the network that we didn't think were there. But yeah, so our future work will be focused on this cross-pollination and it's like the manifestation of like the kind of worldly outcome that we're trying to get which is to actually bridge communities. So we're through next. So kind of squat, wooden disc lead to address civic journalism and smarter journalism. It's kind of a kind of a snarky title. It doesn't quite capture exactly what it is that we're trying to do because our work isn't centered specifically around the hearth but it's the harder part of our work is actually human network that we're trying to build. So all of the lab's research is as interested in the way that humans and technology kind of back and forth can augment each other. That's the origin of the name social machines. And so Alvin is just one instance of that. We're continuing to scale this out by 2020. Hopefully we'll be nationwide just in time for the election. Beyond 2020, I mean the fantastical vision of this is that it actually creates a new kind of a civic institution. And so maybe every year you'll decide to go and give one and a half hours of your life to the project and talk with people in your community. To do this, like we need a lot of people involved. There are 60 active volunteers in Madison. I think we had a wait list of something like 50 or 60 people who wanted to be trained on the device but we didn't think that we could actually handle that kind of capacity. So coordinating, so that's just one city. If we wanna be in hundreds or possibly thousands of cities, like what we're looking at actually is the coordination of a very large human network. Like it's more of a human problem than a technical one. We are currently in the Bronx, so that's in the works. Our Wisconsin deployment is going out into the rural areas. We'll be in Alabama, I believe within a couple of months, probably in Arizona in a few months as well. So yeah, that's a talk. If you have any questions, I'll take both tactical and non-technical questions. Thanks. So I'm curious, who thinks that this is a good idea? All right. All right. It's been an interesting human skill problem. We get a lot of excitement over it, but it's also hard to get people to come in to actually have conversations, so. Yeah, so the question is the motivation from the previous election. It's definitely, it's the thing that's like on the back of our minds, but it's not really, it's not a thing that we talk about constantly. We like for it to be a more general-purpose civic tool, that it's not specifically to like deal with politics, but it's also to get different communities talking to each other and different communities, like we'd like for them to be able to empathize. And so we are, you know, our kind of hypothesis, I think we call it, you know, our theory of social change, right, is that it's easier for that to happen in person than it is at scale like through Facebook or Twitter. Got some questions here? Yep. Okay, so that's a good question. Okay, so what's the relationship between the study of the talk radio data and LVN? There isn't really a direct relationship aside from the fact that many of the same engineers and researchers are involved. The talk radio thing was much more of a, it started off much more as like a purely academic kind of a pursuit, right, which is that there is, as far as we know, like nobody is storing or analyzing that amount of talk radio data. Like we think it's the largest corpus that has ever existed by probably several orders of magnitude. So as like sort of computational social scientists, it's actually a great set of data to have. And cortico, part of the reason for cortico, when it was first built was to really productionize and run that talk radio system. The productionizing of it, a thing that, you know, is a thing that grad students shouldn't really be spending their time on. And then just naturally from there, the sort of focus on the transcription and the speechwork kind of made its way into our designs for LVN. How do we recognize the conversation bias? Well, so we don't really calculate a measure of bias. We have that measure of sort of complexity of the language, but this is more like a measure of just like how many different topics are people talking about. But we currently don't really do anything about bias. So in some instances with cross-pollination, we found that different conversations, different groups of people have very different perspectives on the same thing. So we have the beginnings, right? So like we know, oh hey, this highlight from this conversation was pollinated into this conversation. So we have that link. And so we do have the beginnings of a way to kind of, I wouldn't say like pick out bias, but more like, you know, like accentuate different perspectives on the same thing. But we're still working on that. No, yeah, so the question is, do we have any kind of noise reduction mechanism? The range compression does reasonably well to sort of deaccentuate white noise. But if there are a lot of people talking in the background, then that is an issue. And we have had problems where people bring the hearts into like a restaurant or a bar. We actually had conversations occur in a bar. And there it's like, even the human transcribers have problems. So I don't know if this is a thing that we'll actually be able to really properly solve. But you know, we do put guidelines in, like we ask that people don't stand up and walk around for instance, which makes the sort of clustering and the location of, like using the microphone array to look at one voice. If we can make the assumption that it's not moving around that actually makes things much easier. But yeah, we still, there's one conversation that I frequently work with that there's a dog in the conversation and it starts barking. And it just like, I mean, it does like crazy things to the audio. And then like it barks a couple of times while people are talking and then someone like stands up, walks the dog out of the room and like slams the door shut. So it's like all sorts of like stuff that are probably not typical inputs into speech systems. Okay, so that's a great question. Okay, so the question is, as this expands nationally, can we use this to map the progression of possibly fake news? Well, so the fake part of it, I don't know. I'm honestly not sure if fakeness is really a thing that as a computational social scientist, we can like really effectively tackle. The mapping of the movement of ideas is definitely a thing that we've talked about in the lab. And there's another form of this actually, one of the outputs of the talk radio research actually is how we can map the transmission of ideas on Twitter to talk radio and back. That is not very precise because we don't have a good measure for instance of, well, okay, so with tweets of small percentage of tweets are actually geo-located, so we actually don't know where a lot of tweets are. We can infer from social graphs what the location of someone might be, but that's not always correct. The second thing is that we have the broadcast ranges of the radio stations. We don't really know if like someone is actually has the radio turned on and is heard or not. So it's a very imprecise thing, but that was the idea with the research was to be able to track the progression of some particular idea through the network. And that came out of previous research in the lab that actually looked at fake news, news that was debunked and how it spread purely on Twitter. So I think I'm out of time. All right, thanks.