 Good morning. So I probably should have started this process about five minutes ago. Apologies. Okay, so I Just want to confirm with the timekeeper. Are we good? Okay, great. So So I'm gonna spend about 30 minutes with you And I apologize for that as well So I So Dave Dave asked me to to provide a kind of an overview of the citizen science landscape I'm not sure that I have any sort of unique view into into LC But I'm hoping that along the way I can raise a few issues That might be related to LC to just kind of get us primed for Further presentations and and discussions so after After that presentation, we'll do a brief activity and and then if there are any questions we can we can talk about it So just by way of introduction. My name is Pietro Michalucci. I'm a cognitive scientist and You know, frankly my my bios probably and some of the materials here, so I won't belabor that but I'm quite pleased to be here and And I also I'm grateful to the organizers for putting this together And also to the NIH leadership for supporting these activities, which I think are very forward-looking So so basically There's a lot of material. I'm gonna go through it. It's gonna be like hyperdrive. So I hope you've had some coffee So we're gonna talk about what we're talking about what is citizen science How does it relate to some of these other constructs and I thought Jennifer did an outstanding job of initiating that that discussion Why do we actually need humans in the loop? I think it's kind of important to think about why we need humans Because that in some ways helps inform the way in which humans participate in citizen science, which Helps inform discussion about our expectations and issues related to LC Why is now a good time to be having this conversation? and then gonna survey the landscape and and then Have a little activity. Hopefully we have time so I Begin most of my presentations with this This is sort of a cornerstone of my personal philosophy and why I care about the space of human computation and citizen science the the hot and awesony Have the great law that in every deliberation we must consider the impact of our decisions on the next seven generations and I think that That this this workshop is very much in that spirit that we're Looking closely at a technology and considering All the implications and possible impact of that technology so So the reason I care about this is because I believe that because of technology and Because of the the large population of humans on the planet that collectively we've created a lot of problems and And so I'm interested in how we might be able to solve them together So I'll start by talking about human computation as a context for citizen science and it helps to ground some of these other concepts so So of course with most things it depends on who you ask If you ask a an HCI person then human computation is an HCI problem If you ask a psychologist, it's a behavioral problem And if you ask a computer scientist, it's an AI problem, of course and And it's it's probably all of these things and and other things as well So at a recent workshop in June where we assembled to talk about human computation We did an activity to brainstorm What sorts of concepts were related to human computation and and this is a tag cloud that That that sort of arose from from that exercise as a summary of that exercise So to think about what human computation is we have to think about what we mean by computation and how this has evolved over time So we used to think about computation in terms of doing a pencil and paper computation on an Afghan What's this number divided by this number and then with the advent of computing machines then algorithms processes of calculations became a Way of computing And then as we became more sophisticated Then we started to think of symbolic reasoning and pattern recognition by computing machines as representing computation and and Now more recently as we bring humans into the loop and we think about humans doing some of the computing then very Abstract kinds of reasoning such as creativity intuition and synthesis are now thought of as computation So one way to think about all of these things is more generally as information processing But when we think about human computation we tend to to Tend to mean lots of people working together in some kind of a distributed network and This can be technology agnostic or technology mediated and and for the most part nowadays We we think about this in the context of the internet So in terms of trying to figure out how the various concepts in a relate The community came together to to create a handbook a little over a year ago and And in the course of producing that handbook We had lots of discussions about well What do we mean by all these things and it was like the blind men and the elephant where we had 20 different disciplines represented in the handbook But eventually we coalesced on a few key concepts and had some agreement on those So I'll share that agreement with you so by human computation We mean the design and analysis of multi-agent information processing systems In which humans participate as computational elements, so that's just to say that you have A network of machines and humans that are both engaged in computation in complementary fashion By crowdsourcing We mean the distribution of tasks to a large group of individuals via flexible open call in which individuals work at their own pace until the task is completed Collective intelligence refers to a group's ability to solve problems and the process by which this occurs and Social computing refers to information processing that occurs as a consequence of human social interaction Usually assumed to occur in an online medium so This these is a this is a collection of definitions. It's not the last word on these I'm sure But it's a it's a place to begin the discussion So what about citizen science more specifically? So the citizen science Association says the citizen science is public participation in scientific research Seems reasonable, but it doesn't say anything about that having to happen in the context of the internet or or in some sort of techno social medium Cyber science Has been used to refer to the use of the internet to conduct some aspect of science But it doesn't say anything about the public's participation in the scientific research and Then finally citizen cyber science would be Citizen science through the internet Often when we talk about citizen science, I think we we tend to mean that and As Jennifer pointed out these are evolving concepts and definitions But this is at least a way to nail down some terms and have a conversation about it So there are more detailed taxonomies emerging in fact and In the latest issue of the journal human computation Greg Newman was a special editor for a special issue on citizen science and In his own contribution to that issue he put together a very nice synthesis of these concepts and and a conceptual taxonomy. This is Available open access online and I would encourage you if you're interested to to look at this And then also hot off the press Andrea Wiggins and Kevin Krausden Andrea who's sitting next to me at the table here Was was an author on this Produces beautiful survey. It's the kind of survey that everybody wants but nobody's willing to do but they actually did it for us and It examines 77 citizen science projects and then tracks two new projects closely in the context of Number of different survey dimensions So I encourage you to look at that too, and I think that's also open access online So How do we get humans and machines to work together effectively so machines Do certain things very well and humans do certain things very well And there's certainly some overlap, but they tend to be quite complimentary, which is why this is a recipe for success So machines do things like counting Calculation they remember things certainly much better than I do and execute processes reliably and Then humans tend to do things like inference some better than others Visual perception linguistic ability Abstraction of concepts They embody world knowledge have social cultural awareness and are creative so you could sort of imagine a continuum of these abilities and over time as Machines become more sophisticated That machines begin to compete with humans on these dimensions as they already have and And so We would hope that humans would in some ways retain an advantage in some of these things Even as we move far into the future But one implication of this is that human cat labor categories will increasingly narrow Just as the industrial revolution Automated the assembly lines. We're going to see other kinds of automation So so there will be fewer things that humans bring to the table so this potentially raises some labor issues and Alec Felsteiner who I believe is a lawyer at the Department of Labor wrote a nice chapter in the handbook of human computation about labor standards and human computation and and so This is These are completely based on his ideas as applied to citizen science So we can ask questions like where does citizen science intersect with traditional notions of employment? Which labor standards ought to apply? Are there work versus non-work forms of citizen science participation? How does one determine jurisdiction coverage and compensation in online context? I think to a lawyer those words have very specific meanings, which I don't completely understand How do we determine thresholds of fairness transparency and indignity and Who is the regulatory authority in all of this? Okay, so why is the oh and can I just do a quick time check 20 minutes left? Left, okay. Thanks So it's like time compression when you're doing a presentation So so why is the timing good right now? So this is just I'll go through these quite quickly It's just to say there's a lot of activity in this space and it's heating up. So we had the NSF socks program Which intended to understand the properties of systems of people and computers? Working together and then this was followed by the cyber human systems program also at NSF in IIS where they're fleshing out sort of the space of configurations of humans and machines and Also with a third dimension of the environment We had the H-comp conference in November Collective intelligence has become an annual event. There are relevant social comm events We have a special technical community in human computation through IEEE. We had the recent handbook We have a new journal the new citizen science association with with Quite large membership someone in this room probably knows it's I think it's at least a thousand if not more global members who signed on almost instantly when when it opened up and And the CSA Will have a forthcoming journal called citizen science theory and practice which will be a great way for both citizen science participants and scientists to communicate with each other There's also the PCAS recommendation requested a Subcommittee of nighter to do cross-agency development and social computing and And then in June we had We convened for a human computation roadmap summit One of the co-organizers here Leah Shanley was instrumental in developing that and The point of this summit was to put together a research roadmap that is to say What's the fundamental research that gives rise to which capabilities that gives rise to which kinds of? Opportunities to address societal issues and So so we're putting together a workshop report about that and we hope to take that to policy makers and And use that in support of a new national initiative in human computation analogous to what exists today for robotics I think there are a lot of hidden click-throughs here okay, so and so now kind of the the lay of the land and Forgive me a lot of this will be quite familiar to many of you So the way that I sort of I try to think of a way to break this down When you're thinking about citizen science and how people access it online there are portals that allow you to search manage Search for new citizen science projects manage your own and track your own activity in those There are citizen science platforms that are host to many different projects and allow people to create their own projects And then the specific prop projects themselves in the various engagement modalities for public participation in scientific pursuits I'll also mention Briefly the notion of building on on past success and and how that could be one way to have some assurances about future successes so Canonical kind of example for a portal is size starter calm Again, I don't know the exact number, but I think there's probably something on the order of a thousand citizen science projects out there And maybe it's 800. I don't remember exactly But what's nice about size starter is that you can you can narrow it down if you're a Public a prospective public participant you can go on there and say I'm interested in participating in health and medicine and And then 77 health and medicine projects come up and it's just a way to get to those projects Then we have platforms and this will be familiar to many of you Zooniverse is is associated with lots of space-based projects But they've expanded the purview as you can see now to include humanities nature biology physics and And they're building lots of tools in to allow folks to create more sophisticated citizen science projects People who don't have computer science background or programming background can get in and configure a new a new research project on their Their flexible platform and then host it there and then also benefit from the huge pool of participants Sit side org is another one. This is tailored more to environmental projects But it gives users a set of tools to create citizen science projects and in this case I think Folks who do not necessarily have a scientific background are encouraged to create projects of their own And then the tools help them to build projects that tend to conform more to the scientific process So I think you know one question that arises is when you start to allow the public to engage in citizen science in terms of Creating new projects. How do we ensure that? that those are that that the the process they follow is a scientific process and and The results can be trusted So we'll look at a few categories of projects research acceleration Scientific discovery and what I'm calling virtuous ecosystems So research acceleration So in 2002 We sent a probe into space to collect some dust from comet viltu and And so there is this aerogel Collector grid which is this the least dense substance known to humankind and these particles traveling at 14 miles per second Would slow down in the aerogel enough to actually become trapped in it without destroying the particles so then spacecraft Sent the aerogel sample back down to earth landed in Utah in the desert And and then they looked at this aerogel and they said how are we going to find these micron-sized particles? in the aerogel So Some of the scientists who are working on this Well, so brief so does anyone recognize this anyone ever Put this as a screen saver on their computer. So a few people in the room. You're not dating yourself. Don't worry So so said he at home. Whoops You know was the idea of distributed computing you put a screen saver on your computer And when your computer's idle then it is processing data from the search for extraterrestrial intelligence data set And if you're the lucky person, maybe someday your computer will be the one to discover signs of extraterrestrial intelligence But the main idea is that you get a lot of computing power from the idle time of many different computers so a couple of guys at Berkeley Andrew Westfall and and David Anderson got together David Anderson was the the inventor of the SETI at home project and And they thought, you know, maybe we could apply a similar technique to To human computation that is instead of distributed computing We could give distributed thinking a try to solve this aerogel problem and that was the birth of stardust at home So in 2006 and this was sort of a pioneering project in citizen science. They came up with an interface that allowed 30,000 internet users to Learn how to find these dust tracks in the aerogel using a virtual microscope and by dividing the labor in this way Sorry, my animation isn't working so well They were ultimately able to To find seven dust particles and it was the I think the first such dust particles that are believed to be of interstellar origin So and this was published in science just this last August and one thing that's worth noting here is Among the authors, I don't know if you can read that 30,714 stardust at home dusters So and and there's a link and you go online and you can see the user names of every single one of those contributors So this kind of raises another LC question, I think Under what circumstances should public participants be credited, you know, and what are the mechanisms for doing that? This might look familiar to some this is the eyewire project the idea here is to Map the connectome. So how do we create a map of the neurons in the brain so we can better understand how they work and they're initially using retinal neurons and So far 120,000 participants have developed the skills in using this tool. These are members of the public without the scientific background necessarily and 120,000 people contributed to a new discovery about how motion detection works in In mammalian retinas The cell slider project someone mentioned earlier The idea is to accelerate if you have a huge data set of imagery Then how do you identify and count cells in the imagery and so it's divide and conquer again But then we have another question that arises What sort of quality assurance do we have? When we use a lay person instead of a trained pathologist, there are nuances that only trained pathologists can detect and And so, you know, they use methods like Consensus, you know, if or if nine out of ten Citizen scientists agree that that this kind of a cell is is a red blood cell then we're gonna believe that Or we've done studies to show that Nine out of ten citizen scientists are just as reliable as one pathologist So these are the sorts of questions we want to ask malaria spot. I think it's a wonderful idea This is an app Download it if you do you can help diagnose malaria. The problem is people walk into these malaria clinics They get their blood drawn it takes 30 minutes for a specialist to figure out whether or not they have malaria or not They leave the waiting room. They don't get their diagnosis. They never come back They might not get treated if they have malaria malaria spot allows the imagery to be taken It's crowdsourced 10 people look at it and making up the number 10 some number of people look at at that one slide They've gone through some kind of online training and they can get a diagnosis on the spot and then the treatment they need But again, the question is how good is good enough when it comes to these diagnoses and true positive versus False positive or false negative So can I get a time check? Seven minutes Okay Scientific discovery we're gonna skip through some of this So there there are three parts so so this is again This is another class of citizen science project that has to do with discovery rather than research acceleration The the previous examples were all about accelerating research This is about actually using humans to engage in scientific discovery so I Won't belabor the human advantage I'll say just a few words about it talk about a few example of winner takes all models of scientific discovery and then an example of cooperative discovery so One advantage that humans have is what I'm calling selective consideration And this is the idea that we've evolved to ignore the things that don't matter And this has been necessary for our survival as a species as it is for most species that is All I need to pay attention to is whether or not that thing is gonna kill me or whether it's gonna be something I can eat and We've gotten so good at this that when it comes to very abstract sorts of tasks where the potential search space Is is enormous we can quickly eliminate most of The paths that would not be fruitful and focus instead on the ones most likely to be fruitful And we're so good at this that this allows us to be better than computers at certain kinds of problems most of the time But there's always the risk when you don't go down some of these different paths that you might miss something so The cost is that it's not an exhaustive search, but the benefit is that it's a much faster search than computers can do The other aspect to human advantage that I wanted to talk about and I won't as much is serendipitous discovery and Really what I wanted to say is is how important serendipitous discovery is historically how important it has been So You recognize this person Yeah, so So Donald Rumsfeld is famous for his known knowns Remember this in the media the known knowns. We're looking for snakes and expect to sometimes find them under rocks The known unknowns we know we may find other animals under rocks, but we aren't sure what they might be and Then the unknown unknowns It's kind of infamous for this, but if you think about it This is actually a pretty important idea, and I think it's a worthwhile idea The unknown unknowns are the things we don't know that we don't know and that's really important. So You should never say that if you're a secretary of defense, but I think we should talk about it So there are unexpected things we might discover in the course of up in the course of uplifting rocks Okay, and so to put these in in more scientific terms the known knowns the notion that we're able to support or refute a specific hypothesis The known unknowns as we can generate new related hypotheses based on our findings and then The unknown unknowns is the notion that we can make unrelated but useful discoveries, and there's a rich history of doing that So for example the discovery of penicillin is an example of that and so this suggests a continuum between a descriptive analysis and exploratory research and what I'm going to argue for is So I won't go through unfortunately. We don't have time all these examples of serendipitous discovery but The main idea here is this is why we not only need humans in the loop But we need to create citizen science systems that enable humans To engage in the kind of analysis and reasoning that leads to serendipitous discovery So and here are some ideas and again, I apologize. I don't have time to go through all of them now But I can certainly make the slides available if you're interested The general idea is that that we can use citizen science to tackle big data problems So winner takes all examples Filo here the idea is sequence genetic sequence alignment I'm not going to say too much about that because I'm in a room full of people who know a lot more about genetics and Genomics than I do. I'm not even sure. I know the difference between genetics and genomics but There's value I guess in in discovering alignments in the sense that that when you have alignment across species It might signify a functional benefit and a place where a mutation could cause more harm than other places So Jerome Valdespule and McGill created a gamified task that that allows people to try to Align these according to rules and there's a score associated with a certain alignment You want the colors to line up is basically the idea But there are again rules governing points and things where certain alignments are better than others and sometimes they're not perfect And it turns out that humans can do better than the best machines at finding these alignments but the reason this is called a winner takes all is that We're not interested in combining different results from different people playing Philo We want the single best result that a person comes up with And This is sort of a crossover project because they developed open Philo, so it's not just a project, but it's a platform So others can submit their own Sequence data for alignment. I've got one minute left. Okay, so Folded is another example of this a question that arises under what circumstances can scientific results produced by non-scientists be trusted We kind of covered that Nanocrafter Okay cooperative discovery galaxy zoo is an interesting case study here Because they gave the citizen participants the opportunity to engage in discourse through the online medium about the work they were doing and Because of that They were able to share discoveries and ultimately Converge on a new kind of galaxy called green pea galaxies So Virtuous ecosystems I'll just quickly mention this one Patients like me Sally Okun. Did I say your last name right is here from patients like me. It's a wonderful idea the The virtuous ecosystem allows people to contribute their their personal health data to talk about their diseases in a community of people with similar diseases and And look at what treatments have been effective more or less for other people and then take that information from the aggregate and apply to their Own situation as sort of a small experiment and then report back their own findings Oh, ibuprofen is working great for chronic fatigue I'm gonna try that and then I'll report back my own results and in this kind of virtuous cycle There's this benefit to the participants for their contributions and at the same time It's feeding into To research data, so I won't cover These I think I'm just about out of time. Can I? Borrow three minutes For this quick participation task Yeah, okay, so now that we know each other so well I Would like to ask you a question How many of you had a bowel movement today Okay a few few okay appreciate your candor How many of you don't want to participate in this a few others? Okay? So this is what I want you to do get out your smartphones get out your devices right now got like one minute, okay? You're gonna do this survey online Those of you who participated already feel free to participate again those of you who didn't want to so here's a show of hands How many of you are willing to participate in the online anonymous survey? A lot better Okay, here's the URL It'll take you 30 seconds to get there in 30 seconds to do it I should have known that a group of medical people would raise their hands when I asked the question It's only fair. Okay Raise your hand when you're done Still working on the survey ten more seconds So I appreciate your participation in this study. Let's look at the results Wow, that's test data back here So we're that's just response volume Sorry, I this is my testing. Okay, so 34 people responded 44% said yes and 55% said no So this is what I would call cyber science. We collected data Did you feel like you were doing science when you answered those questions? Not so much, okay? So then here's another question that you answered. Okay, what did you think about everyone else? And you could only answer whether you thought that half of the respondents or more had had a BM today or not and It looks like 67% thought yes and 32% thought no so that's kind of interesting right So this is what we would call wisdom of the crowds or in this case failed wisdom of the crowds Okay, so this is a form of collective ignorance So now we can ask one more question and then I promise I'll leave What we're gonna do is is we're gonna ask a conditional question now This becomes sort of a behavioral study. Oh, come on. You didn't make me sign up last time So the question we want to ask is What is your answer to the next to the second question conditional on your answer to the first question So then it becomes a behavioral study right to say Depending on what you did this morning. How likely is that to influence what you think about what everyone else did? So then my question to you is Before I ask that question should I've gotten IRB approval, you know because it's a behavioral study Or at what point in this process should I have gotten an IRB approval and am I in trouble with Sarah now? Who I met earlier who's chair of the IRB? So I'm sorry. I couldn't provide the final results. Maybe it's better that I didn't because then I won't get in trouble from the IRB So, thanks very much Doesn't seem to work You talk you talked a lot about Sort of I think a lot of your examples or are sort of related to the data space but I mean, I think it's very obvious to everyone in here that you know our Ability and our citizens ability to You know to sort of discern between paths that are fruitful and paths that are not fruitful Extends well beyond the data space to you know the space of hypotheses We might ask the study designs. We might adopt, you know, what to do with data You know, what steps to take once a study is done So I think we can you know, we think about the broader sort of range of citizen science, you know using people not using but You know to look at different paths that might be fruitful and and less fruitful Was that a question or a comment? Thank you