 I'm gonna go ahead. Hey guys! Nice turnout today on a rainy Friday. That's great. So my name is Jodi Harrell. I'm a marketing manager here at Research Square and I'm just gonna introduce today's speaker. Just a quick note that Brittany is coming back from her maternity leave next month so you'll see her in the next meeting hopefully. Today's presentation is on text recycling or as it's I guess probably more well commonly known self plagiarism. That occurs when an author knowingly or unknowingly uses parts of a previously published paper in a new previously published manuscript in a new paper. Our speaker is Kerry Moskovich. He's director of writing in the disciplines in the Thompson Writing Program at Duke University and he'll talk about an NSF sponsored study that he worked on and the implications for journal offices. So with that help me introduce Kerry. Thank you. Oh thanks. Thank you. If I hold this here, does that seem to be loud enough? I'm sure you all can hear me. I can speak pretty loudly and I guess we can't tell the people at home if they're hearing me or not. I guess they'll let you know. Okay. So Jodi said that I've been involved with this NSF grants past tense. It's actually current tense. It's a five year grants and we're now finishing up the second year of that grant. And so I'm going to be sharing with you some things that we've learned so far and talk to you about some things that we're going to be doing and a couple of things you'll be seeing haven't yet been published. So you'll be some of the early consumers of that. And with that in mind, let me make sure I make this announcement. Thanks to the NSF. They like when we do this and we're glad that they're giving us money to make that happen. All right. So we're going to start out today with a question for you. This is one of a number that I've posed to different people at different times. And so what I would like you to do is read this little hypothetical scenario and then write down so you will have to commit a through E in response to it. I'll give you 15 more seconds. Anybody need 10 more seconds? Okay. So I'm not going to take the time to do some kind of electronic survey thing. I'm just going to trust your integrity. And I will tell you that there is no correct answer to this, but I find it useful for people to have to think through this. So what I would like you to think about whatever letter A through E you chose and then if you chose A raise your hand up high. If you chose B, C, D, E. Okay. Interesting. I spoke at the National Meeting of the Council of Scientific Editors a couple years ago when I was first starting to develop some of this line of research and I asked them some of these questions. This was one of them. This was their responses. So as you can see, there's a couple things of interest here. Most importantly, almost nobody felt certain about the answer to this. And the other is the striking evenness of the responses between those editors who thought it was probably okay and those who thought it was probably not. And this particular scenario while hypothetical is not far-fetched. This is a kind of interesting thing that people face often who are working particularly in scientific fields, but not exclusively so. You also notice that hardly anybody said they didn't really know, which is kind of interesting given the opposite views that people found about the question itself. The next few slides I'm just going to kind of set up the backdrop to what I'm going to be talking about here. So first of all, what is text recycling? Here's what I would consider to be a pretty typical example. So the top passage, which was from article published in Science, and the bottom passage, which was published in the Proceeding of the Natural Sciences, a couple years later, National Academy of Sciences, are almost identical. In fact, the only difference is a citation reference eight and the sample size, right? And by looking little colored blocks, you can see the other parts are exactly the same. So that's the kind of stuff we're talking about. Also worth noting here is the authors of the first piece were almost the same, but not identical to the authors in the second piece. And that's not an uncommon situation in many, especially scientific fields where you have chains of research that are being done and people join and leave the team usually with a stable cohort PI or lead investigators and so on. But if you think about it, you can understand why the basic idea of text recycling, which is not reusing somebody else's stuff, but reusing your own stuff has this other problematic dimension and that is what do we mean by your own stuff if there are different people who are different authors on different pieces? In terms of the kind of backdrop for the kind of problem problematic nature of the topic itself, here is a quotation from probably the most widely cited source on text recycling. Authors are urged to adhere to the spirit of ethical writing and avoid reusing their own previously published text unless it is done in a manner consistent with standard scholarly conventions that is using quotations or paraphrasing. So basically what this says is if you want to be ethical you should really avoid just reusing stuff verbatim unless you're putting it in quotes. And it also refers to standard scholarly conventions in some of my previous writing. I've tried to explain that in fact that doesn't exist. There are different standards or norms across different fields, but there are no such things as standard conventions that go across all disciplines in all genres. And here is a passage from the other most highly cited set of guidelines, these from Cope. I imagine most of you are you all familiar with Cope? Okay, most of you. Use of similar identical phrases and method sections where there are limited ways to describe a method is not unusual. In fact, text recycling may be unavoidable when using a technique that the author described before and may actually be of value when a technique that is common to a number of other papers is described. This is pretty typical of the discourse around text recycling. Some people who say you should just avoid it and you can, it's unethical or it's lazy. And others who would say there are times in which it can be appropriate and even desirable for readers to maintain consistency of language from one publication to the next. The other issue has to do with basically vocabulary. Self-pleasureism is the term that was most widely used since basically the kind of the beginning of discourse on the topic. And over the last 10 years or so, text recycling has kind of supplanted that, not exclusively, but I think most people who are interested in the topic like me see the term self-pleasureism as inherently problematic because it labels as bad practice, unethical, something which many people believe to be acceptable practice. So I will be speaking about it in terms of text recycling, but what I've learned is if I don't include self-pleasureism in like the titles of things, a large segment of the population has no idea what this is about. But if you say it's self-pleasure, they'll say, oh, I know that thing, I understand I've been there. So finally, last bit of introductory work here is defining it. So text recycling is actually a really tricky thing to define. In a paper that I published last year, I spent many, many paragraphs dealing with issues of definition. And this is basically, after months of work, the best that I could do. The version in the paper is a little bit longer than this, but basically text recycling is the reuse of textual materials, and that might be pros or visuals or equations. From one document in a new document where one, the material in the new document is identical to that of the source or substantially equivalent in form and content, meaning if you take a bunch of sentences and you change a couple of words, we would still consider that to be text recycling. Where changes make it not recycled and paraphrased or something else is ambiguous. Two, the material serves the same rhetorical function of both documents. That means this is different than saying changing something like artistically and repurposing it. So if we think about the example that I showed from the science paper and the PNAS paper, that passage was doing the same work in both places. It was trying to explain this thing about the methodology. And then finally, there needs to be some overlap in authorship. While there are certainly exceptions and complications that I would acknowledge, generally people would say, if you take other people's stuff and don't put it in quotations or you use their exact figures, that we would consider that to be plagiarism rather than recycling. How many authors from one document need to be an author of another document? Well, basically we decided was, definitionally, at least one, otherwise you kind of enter a different world and a different set of complications. Does that make sense? Do you notice that the first part there says, from one document in a new document, not from one publication into a new publication. And one of the many complications of text recycling involves what's the nature of the source, what's the nature of the destination, and the ethics and practices and beliefs seem to vary significantly depending on those two and even the relationship between those two. But we would consider, for instance, it to be recycling if somebody took material from, let's say, a dissertation and put it in an article, or from a poster at a conference and put it in an article. The ethics and practices are different than if it's in something that's published or not, or something that's publicly available even if it's not published. Briefly, here are the folks who are working on this project. There are five of us from different institutions who are kind of driving it and the number of other people who have some kind of specialized work on it. One person whose work I'll talk a little bit about today is David Hansen, who's at Duke, who's our kind of local copyright and contracts guru. So this NSF grant has two phases. The first phase is basically we're trying to figure out what's going on because this is a subject that really hasn't been empirically studied almost at all. There are a couple exceptions, but they're pretty minor. And then the second phase is to use what we've learned from the first phase to develop model guidelines, contracts, and policy statements and educational materials. As you'll see as we go through this, this is a pretty messy complicated world and to expect every individual professional society or journal or publisher to go about figuring out how to articulate a policy is not only a lot of wasted difficult labor, but there's a lot to be gained in having as much of that be in common in general and standardized as possible. So in the first phase there are three branches of research that we're doing. One of them is we're looking into the beliefs and attitudes of the different players. This involves authors, whether faculty members, graduate students, etc., and some of course outside of academia. Editors and those are the two main or three main groups we're looking at. So students, students, what we're thinking about as being novices, faculty members or those experts who have published and experienced doing it, and then people who are involved in the publication process itself, the gatekeepers. And we want to know things like what do they think is appropriate practice, why do they think that, where did they get their beliefs from, how stable are those beliefs, etc. So I'm going to be showing you just little bite-sized chunks of different studies we're doing. At the end is a list of citations, so if you want to see the stuff we've already published you can get the details and more of the data there. So here are the responses of experts, so these were faculty members, to the scenario that I asked you to respond to at the beginning today. That's scenario A. So you can see that this group of faculty members were almost equally divided between both not appropriate or not appropriate and also the level of certainty about that. So that itself should give us pause. So these are the expert authors who are working in, and this is within STEM fields here, who are remarkably divided about the acceptability of this particular situation. The other scenarios here, I'll share with you. I'm used to having my computer with its notes in front of me here. Here we go. So A, you know what that is. B, the following year a different group of researchers at a different university uses the same equipment set up and their research, they recycled the description of the apparatus from paper A, the one word, into their paper. Is this appropriate? And it's not surprising to see almost everybody felt it was inappropriate because most of us would consider that just to be plagiarism. Then C had two variants, sometime later Sarah, this graduate student, has completed her PhD and taken a faculty position at a different university. She's now collaborating with a different group of colleagues doing new studies on coal emissions. She recycles the apparatus description from paper A, which is C1, or paper B, C2, that is either the place that she originally got it from, right, from the faculty member gave to her, or the paper that she was an author of. You can see there's not much difference there. And so the reason that we frame this scenario this way is because in scenario A, Sarah is writing as a member of this research team, right? So even though she wasn't an author of that prior paper, she's operating as part of that group. And our research is showing as a pretty significant finding that at least within the world of STEM research, people do think that authorship is not an individual thing when it comes at least rights to recycle, but generally something that has to do with the membership of that collective, not an ad hoc group of people, but a stable research team. So as people come in and people leave, that those people then become part of that authorship group. In scenarios C1 and 2, Sarah has left that group and now is working with a new group, so even though she was an author of one of those papers, people have some difficulty with the question of whether or not she as one of those authors should be allowed to take that stuff through a different situation. And that makes for a really complicated bit of kind of both ethical and pragmatical decision making to say when one leaves and is no longer part of that group, which material would someone be allowed to to recycle if any, does it matter who actually wrote or first drafted a part? So like if Karen had written a paragraph or made a diagram and she was the one who did that, does she have then particular rights that other authors wouldn't to reuse those when she leaves that lab as working with another group? And then finally D, this was a scenario that was crafted because one of the things that seems to be the case within the STEM world is people think differently about materials that are really we might think about just informational or descriptive like here's what an apparatus was versus kind of intellectual material like these are the claims we're making that these are where the results are. And so this scenario had to do particularly with something in which the specific words were critical rather than just it was a description of something. Later in the career Sarah and a colleague named Karen have been collaborating on a research project. The two are asked by a major newspaper to co-author a story explaining the research for an audience of non-scientists. While drafting the piece Sarah comes up with a really clever and insightful joke related to their research. It's one of her favorite parts of the story. A year later Karen writes a commentary which is published in a high-profile scientific journal and she recycles Sarah's joke almost verbatim from the newspaper story they wrote together right so they're collaborating on the story. Karen comes up with this joke right jokes wording matters terribly right can't just like do what my wife sometimes does is like it goes something like this right you have to say exactly how it goes if it's going to work and and so in fact most people thought it was inappropriate for Karen to use that but there were not negligible portion who thought that it was probably appropriate but this highlights at least some of the different factors that would go into considering whether someone would consider to be appropriate inappropriate practice and also how when you change those details it changes even for experts their beliefs about whether something's appropriate or not. One of the other pieces of research from the beliefs and attitudes component here was a survey we did basically of gatekeepers and this was not just in stem but this was across academia so we basically picked top ranked five journals from all of these fields plus because many of us and the those of us who are doing this were from writing studies world we wanted to see how our colleagues felt about it so we picked 10 from writing studies as well and we sent the survey request to editors and editorial board members so i'll share with you a few of those findings and these are already published so first one of the questions we asked was is text recycling always unacceptable we didn't ask is text recycling always acceptable because we know the answer to that right it's basically no there's nobody who thinks it's okay to just take anything you want from your previous stuff and just republish it again for instance okay but we wanted to know who was just fundamentally against the practice no matter what the details were and you can see almost nobody was and that was true regardless of discipline so we have humanities qualitative social sciences quantitative social sciences and sciences and engineering this was pretty surprising to us because we were expecting the scientific folks to be more okay with it and the humanities folks to be not so this has also been a really important part of our research to hang on to because if in fact this came out the opposite way a lot of the questions and problems would go away that is if there was a pretty strong consensus is that no just people should not be doing this okay then people probably should probably not be doing that but in fact if across the board and this was something like 350 the sample size for this feel that there are occasions when it's appropriate well then we've got to do the hard work of figuring out what those are and that came out differently than it did before I think it's readable so one of the many factors that we looked at was the nature of the source so in order to make this a survey that people could do like in less than a day we had to limit the number of questions we asked in scenarios and so for this survey we asked we want you to think about recycling text when the destination is a journal article in your field whatever that might be right so it's not going to a dissertation and it's not going to a grant proposal okay it's going to a journal article the question is if somebody's submitting a journal article in your field from which of these kinds of materials would it be okay to recycle material and you can see starting at the top conference posters grant proposals conference papers most people were fined with unlimited recycling right use as much stuff as you want when you get to grant reports it gets more half and half between it's okay to use whatever you want versus well within some limits what those limits are we tried to get at some of that but that's complicated conference proceedings we get to then hardly anybody thought it was okay to do that without limits we can talk about that if you all want to that's a complicated mess in itself and then journal articles almost no one thought that you could just recycle stuff wholesale without limits but almost 50 percent of respondents said some amount of recycling from a published article to a published article would be okay now one thing that seems clear in our research is that things like a conference poster or a conference paper people oftentimes thinks about as work in progress so i go to a conference and i give this paper and that's not done now that's just a step on the way to publication soon away it's not so much recycling stuff as further development of a draft right into a final product conference proceedings are its own complicated thing for a couple reasons one is in some fields like particularly computer science um in some kinds of engineering conference proceedings are the ultimate final thing it's the highest level of publication like a journal article in other fields but in many fields a conference proceedings is a not quite a journal article but kind of a formal submission and in quite a few professional societies they explicitly ask authors who have had conference proceedings published some literally published some shared to actually submit it with some version of extension or variation etc as as a journal article right so in that case is those societies are saying not only will we accept large amounts of material from a conference proceedings in a journal article we are explicitly inviting you to do that so another variable is location that is from what part of these articles could you take this material not surprisingly the one that had the least amount of just no don't do it is the method section right where almost three quarters or more said either unlimited or some limited amount is okay and not surprisingly discussion the results were the ones that people felt the most likely should just be off limits but even there you can see it's not as if there were nobody who said it the green parts are small but there's still a fair number of people who said some limited amount now whether limited means a sentence or a paragraph they didn't have to say we're just trying to get the kind of big outlook then we also asked this messy question about authorship so here's like little mini scenarios so these questions were asked of the people who were not in the red bar group here so basically um i'm sorry uh not there in here so if they said it was always unacceptable that we didn't ask them these detailed questions about it was but if they said well sometimes it might be then we want to know these things so if the source of new texts have identical authors almost everybody said yes that's fine so basically three authors write a paper three authors reuse some of that stuff is ethically and practically the same as one author doing that if we said well um they share at least one author and any other gave permission most people said well that would be okay interestingly because when we went back and had some conversations with editors some editors um said well there's no record of that that's not a formal part of our process so how would people downstream know that that happened then if we said well same thing but the others have not given permission right most people said no you can't do that now the reality is this kind of thing happens and almost nobody asked permissions because permissions are just not part of the operating schema right if we said well all authors of the new work were with working within that lab scenario like Sarah was when she was a phd student they're roughly equally divided between people said that would be fine or no don't do it or i have no idea and finally as the above but none of the authors of the earlier text so this would be a possible but weird scenario in which there was complete turnover of the people in that lab and so even though it's the still the same project group none of the authors on the new paper were author of the other paper and most people felt that was not appropriate so are you starting to feel yourself slipping into the mire of which my scholarly life now exists right layers upon layers of variables okay so that first part we want to know what do people believe why do they believe it what are their attitudes toward it this next part we're trying to figure out not what they say is right and not what they say they do but what they actually do and so in order to try to study recycling within the particular kind of difficult and most i think most important world from where debate is is from one published article to the next published article right that seems to be the the place where most people have difficulty aren't sure what to do right where that kind of ethical dimensions the most most challenging so what we want to do was actually look and do some kind of big data analysis to see what we could find and we want to know how common is it for people again this this is a STEM focused study to recycle material between published papers what kind of patterns like are they tending to recycle just like a block of text or stuff distributed throughout or stuff just in methods or whatever that is and then if there are any fairly clear differences across disciplines so to do this i decided that one way we could get at what we want which is for given research projects for teams what are they recycling is to go look and add nsf grants and say basically well if there are papers that are published listing the same grant as sources for funding it's pretty much the same big research project so within the papers that came out of that grant how much stuff is recycled so we using nsf database looked at four different basic groups to cross kind of the range of STEM disciplines biological sciences engineering math and physical sciences and social behavioral sciences and depict two different divisions within each one of those giving us eight different kind of meta fields and then picked um 10 grants that came out of each one in a given year found five papers for each grant and that gave us basically this set and we are currently just about to finish analysis of the first batch of this which was from 2015 once we get all that done we're going to go back and look at 2000 2005 2010 because we want to see whether or not these practices are changing over time particularly because we live in the time of the rise of authenticate and there seems to be a reason to think that people may be changing their behaviors in order to not get flagged by the software so i'm going to show you just a couple of slides from where we are with this and these are basically our testing so this code is in the kind of final um detail tweaking now but um one of my colleagues who's at University of Maryland Baltimore Baltimore County um Ian Anson is our coder so he's been developing the code and we've been figuring out what we want it to do and so um we took six of these grants and took the five papers from those grants ran it through the code i'm not going to go into the details and i'm guessing you're probably not interested in those details at least most carry i'm gonna switch one two yeah um and so what we found was with with this algorithm anything below a seven was just noise right because there's so many words that are likely to have in common many sentences will have other words in common with other sentences so we basically just said well let's get rid of everything with a score below seven and then all the dots you can see each one of those is a sentence from left to right is document position so these were sentences near the beginning of their respective papers these were in the middle of the paper and these were at the end of the papers and for each of those sentences that scored above seven um went through and looked at them by hand for all those matches and said when we look at it human eyeballs does this seem like there's stuff that's probably been recycled because it would be very unlikely to get that exact string of words or not um and we found a cutoff of about a score of 10 seemed to be about the sweet spot where you can see the red dots or false positives so basically the code put it above 10 but we looked at it and said maybe not or probably not especially if it's like a very long sentence more likely to have an of an of the and or whatever in common below the line of the blue ones those are false negatives those are ones that we looked at and said that seems to be recycled text but the code gave it a score below 10 we were pretty happy with what we were getting here right so we want to minimize both false positives and false negatives but if we have to air we rather have more false negatives we rather say there were some cases of text recycling that we missed rather than saying we called a lot of stuff text recycling that probably wasn't and then this graph shows all of these same dots here basically arranged from highest score to lowest score from left to right and you can see that over here there's no um uh false or positive negative hits nor over here they're all pretty tightly clustered towards the center which means we're probably getting about as good as we're going to and then last one I'm going to show you from this and I'd love to be able to show you what we found I don't know yet but this is going to be the only time in my research career where I don't know a week or two from now Ian's going to say I've run the code for this 80 grants here's what we know about what text recycling is done and proof there's going to be okay here are six different grants so everybody understand what each of these represents right so this is one NSF grant five papers these are all the sentences from those five papers that were common enough to be above seven our score of seven and you can see that for instance this one this is all stuff that probably is not text recycling there's a lot of sentences every one of these dots they're supposed to be dots not little pixelated blobs right are probably significantly recycled material from one to another this particular grant seem to have none this one very little and this is just a small sample right six grants five papers each but you can see that so far it looks like most of these folks are recycling at least 10 12 15 sentences or something okay so you can probably imagine now what we're going to be looking at was when we get this full data set we'll be able to say oh 90% of all the people that got these grants based on a decent size sample recycled at least 10 sentences or five sentences or half of them didn't recycle anything or whatever it is and those of them are recycled we can see where within the paper it was and the last part of this page the research is to figure out well people are doing it how much are they're doing it we'll find out what I think is right appropriate ethically we'll find out what's legal and it turns out almost nothing has been written about this from a legal point of view the only really substantive article was written by Pamela Samuelson uh is at Berkeley in like the early 90s um and a lot has changed since then and that was a fairly informal piece so what we're doing here is going through and looking at both the law copyright law contract law and we built a corpus of contracts from the major publishers in the world of STEM and we're looking to see what do those contracts actually say how explicit are they and what I'll tell you is not very right most of them have very little particularly to say and those that do say something most of it's pretty ambiguous and what it would mean and authors would have a hard time interpreting it um and most editors would as well so from that part of the research what we've learned is the copyright contracts world is pretty messy I'm going to jump and find my notes here so um what are those complexities has to do with copyright law versus contracts and what the relationship between them is right and um I've learned an amazing amount and still not nearly enough about contracting copyright law in the last year or so um but for instance there's this stuff about fair use I imagine most of you have some familiarity with it that itself is fairly murky um then if you lay on top of that the fact that if a specific author publish your agreement says you can use this amount of stuff in future publications or you can use whatever you want in future publications that basically seems that that will like trump whatever specifics of copyright laws because you've now given those permissions right um the same thing seems to be true um with restrictions so even though fair use law might apply to you using somebody else's stuff from some other situation if you sign a contract that says you're not going to use any of this stuff in a future publication well there you go that's what you got right um so another bit of complexity is it's not even just one contract right so if you're an author published in this particular journal okay let's say um American Chemical Society's journal when you're recycling stuff you're taking it from some publisher and you're putting it into a work in a different publisher so you've got to worry about two contracts right the from part and the two part so we did a survey of um authors and editors asking them what do you think your rights are and realized oh we have to ask them about both these things here's just one result from this work which has not yet been published um and you can see here so we ask can you recycle green without limits yellow with certain limitations whatever this might be and red cannot recycle at all for pros equations and visual materials to the article that you wrote that we contacted them about or from it and a couple things I want you to see here one is their responses were dramatically different depending on the type of material right so for equations the two bars in the middle the majority of people thought it was okay to recycle at least some of that stuff for visuals um basically hardly anybody thought you could do that unlimited and they were fairly split about whether you could recycle with limitations and you know getting close to half people thought you can't recycle that at all and for pros with somewhere in the middle the other thing to note is each pair of bars is almost identical right now it's pretty unlikely that all those authors knew the details of those contracts and for those different cases the from and to contracts were identical in that situation right much more plausible and we suggest is they really don't know those details or which is probably true they're assuming whatever's true for one contract is probably true for another contract right there is and this and the other questions that we asked in the survey suggest that authors believe that publishing contracts are all pretty much the same and in fact as we'll see in a minute they're not so um these contracts seem to vary widely we've looked at quite a few of them in the kind of information that's included that might be or is explicitly about text recycling where that information is located that is is it located in the contract itself or does the contract say you need to follow our ethical guidelines and you have to go look at the ethics guidelines to see where that stuff is included and then what they do or don't explicitly or implicitly allow one we haven't looked at obviously every contract but one that's worth looking at here is the american chemical societies agreement it seems to be unusually explicit about a number of these dimensions so here is you can see the top left orange they're recycling from in a minute i'm going to show you the recycling too right so why do i think this is recycling from because it says reuse or republication of so entire work and thesis and collections and so basically what it says you can see the green stuff authors can reuse stuff of these articles and their dissertations or theses right that's become a pretty normative practice in stem fields and then the second paragraph down here we use the figures tables artwork and text extract and future words right this is recycling from this article to something in the future um that it says text extracts of 400 words so as somebody who's studying this i look very carefully to try to figure out how would i interpret this as an author right so authors may reuse figures tables artwork illustrations comma text extracts up to 100 words comma and data from submitted accepted or published work in which acs holds constant copyright etc including in subsequent scholarly publications of which they're an author right so this says you are allowed to recycle material of these kinds and these amounts now you tell me we know that this text extracts of up to 400 words is that one text extract of 400 words or a total of 400 words i don't know how many figures or tables can you reuse doesn't specify a limit there and it does here as many as you want was that the intent my guess is that level of decisions has probably not been made by acs so like they included a lot of stuff here that hardly anybody's including and it's still as a practitioner pretty ambiguous as far as recycling to if the submitted work includes material that was published previously in a non acs journal whether or not the authors participated in earlier popular publication basically the copyright holders permission must be obtained to reuse material so they're saying when you basically put stuff to our journal you got to get permission for anything that didn't come from us and i guess if it came from them then you don't have to worry about obviously permissions um and this means that what what might have been possible under fair use to reuse is no longer reusable because you signed a contract that said you'd get permission for any amount of stuff finally many editors have inaccurate understandings of copyright law we've learned that from questions and surveys where we've heard many editors for instance in interviews say well we want to make sure people don't violate copyright so we insist that they include a citation in u.s copyright law attribution has nothing to do with infringement right you can you can't just say oh i'm going to take like 20 pages of your work but i'm going to cite it so i get through easy can't do that okay um so just literally 30 seconds this is where we're headed so once we can sort all this stuff out learn as much as we can learn then what we're going to try to do is work with some professional societies um already in contact with groups like cope um this one um and um um uh the major kind of physics american physical society and others to basically we're going to generate some some attempts at trying to formulate some language or policies or options and then work with them to get feedback with the goal of least standardizing as much as possible to standardize broadly across fields within subfields and where it seems like there's not agreement to be able to say well you might say actually might say why you might say z but you need to say something about some of this okay and given the time i'm going to stop there um oh no i shouldn't i should include this uh okay um sorry about the time here okay so cope guidelines um the folks at cope and bio med central worked hard to address text recycling it was like the first really intensive effort to try to make sense of this the realities if you go and try to apply them they're pretty vague right they don't discuss any of these various parameters we discuss here they basically say sometimes it's okay sometimes the desirable journal journal editors should use discretion okay so we're hoping that we'll be able to push that to the next level um standardization is really desirable right as you can imagine it's just not reasonable or fair to expect authors much less editors to be able to know all these details if they're not the same from one publication to the next um and that whatever guidelines are standardized there are these real common issues that need to be addressed right we need to know like does it have to be all the exact same authors or like uh acs says if you're one of the authors on the paper um where the material can come from can it come from a dissertation can come from a previous journal article how many words how many figures whatever and this which i mentioned um publishers wanted to protect themselves by saying well you just have to get permission for everything and we don't have to worry about copyright violations i think we need to realize that that will basically degrade fair use over time to the point where authors won't get to use fair use anymore because it will no longer be an established norm and that's how that basically operates and there are major differences in copyright law between us and other countries substantial ones not in substantial trivialities and given the fact that a lot of publication especially in the stim world is authors from different countries or big publishers that have offices in many countries right how are the editors and authors supposed to know what applies in terms of copyright law somebody needs to actually make that clear in writing for them and finally um authenticate and its kin are becoming increasingly widely used authenticate was not developed as a tool to help um authors and editors make decisions about text recycling it's a convenient and money-making byproduct um i think we should be careful about letting that code make our decisions about what is ethical or not and because a lot of people are basically now as editors and imagine some of you facing the situation is like oh we run this code we've got the stuff it shows seven percent when now this is my face i gotta make some call to ignored or not or have them do something with it that you want to actually have a thoughtful policy not just say oh now we're getting this information and so we have to now do something but we don't know what that is okay there's the references okay i'm happy to stay around for questions as long as you all want to some of you probably have to bolt and go don't feel embarrassed to delete thank you oh yeah did you have a question over here do you think that there's any amount of authors being afraid of citing themselves because it seems like an ego trip to cite yourself and so they just silently cite themselves i i don't think ego i don't think ego is likely the problem i think perceptions are like the problem right that it does it can seem a little slimy is like oh look i've got 50 citations in the last couple years of which 30 are mine right and actually increasingly when you go look at metrics many of them will flag out like self citations it is interesting that the APA guidelines explicitly address this so they're one of the societies that has explicit statements about text recycling and they're say sometimes it makes sense to recycle some stuff and in particular situations authors may be reluctant to cite themselves because they don't want to do that and it's okay not to cite yourself if for small amounts of stuff in those situations the APA's actually said yeah that can be a problem that's okay most others are silent about it thank you that was a really interesting presentation my question goes to your phase one point two text analysis of published articles and in your discussion there you referenced i authenticate immediately and i wonder how you control for the fact that some of these published articles may already have been rewritten to compensate for text recycling in the first or second draft it's a great question in the bigger picture we can't right we would have to get the original manuscripts and we won't but we are interested in that kind of rewriting of stuff and interestingly especially for people in the world of teaching writing and writing studies what's happened and we know this from conversations with editors that they'll say well when we get stuff that's recycled we don't allow that or we don't allow it for more than like a couple of sentences so we send it back to the authors and tell them they need to rewrite it right and what does that mean we ask them like well what would a successful rewriting mean and it seems really what it is is you disguise it enough so authenticate won't catch it right that's a bizarre ethic and also it's the opposite of what we're teaching our students to do like we wouldn't allow them and rewriting somebody else's work to say oh well you want to paraphrase them change some synonyms and move some clauses around right but in fact that's what editors are feeling they're supposed to be getting their authors to do um i think ethically it's problematic because it's in a way disguising recycled material so you can't tell so it didn't get caught right it may be making it harder for readers who are following a line of research to know what's the same and what's different right so if this particular methodology or this piece of it is exactly the same but now you're writing in different words that makes it really hard for me to figure out what's stable coming back to your particular question though while we can't get at the drafts what we're doing is with our our coding looking for verbatim strings words but we're also looking for things that have a combination of strings of words and word stems words that have been like substituted grammatically you know we can't look for all synonyms but we're trying to catch things that would be probably like reworked prose and seeing how much of that shows up in there as well so that would be one of the things that we're hoping to see but you're right in that what we're ultimately seeing at least now is the result of in many cases having gone through the editorial authenticate process which is one of the reasons we want to look and see what was happening in 2000-2005 compared to now does anybody else have any questions i was wondering if you've looked at pre-prints at all um that's something that's caused our editors a lot of trouble and they basically went and said well science is allowing it so i guess we might as well get on that bus yes the second part seems to be true and no we're not looking at it and bless all your hearts that have to deal with that you can blame physics but the reason that we're not looking at it is we're we're explicitly not studying duplicate publication right which there are wide varieties and pre-prints is one right so it's not really publication but it's not exactly not publication right um translations right and so what we're really interested in are the decisions that authors and editors have to make about this basic situation you have something that you're submitting you're basically saying this is a new piece of work it has enough novel intellectual substance to be worth publishing if the editors and the reviewers feel that's the case then how much stuff can you reuse given that you've said it's the content is novel enough but interestingly when we interviewed we interviewed um think 24 editors across fields many of them had difficulty holding on to that fundamental idea of this situation we put in front of them so we said okay for the next set of questions imagine that you get a manuscript submitted to your journal that manuscript you and your reviewers have said is intellectually worthy of being published given that how much material and what conditions could they recycle stuff and many of the conversations slipped back to well it has to be novel it has to be new and we can't accept it unless it's original we don't want people submitting stuff that's duplicate even though at the beginning that part of the conversation we said we're not talking about that stuff right but it seems that for people to kind of wrap their heads around it um a lot of people even editors of major journals have a hard time of like just saying we're just talking about the reuse of textual material not the intellectual content i think we're going to go ahead and end the meeting because we're a little bit over time um i think kary's able to stay around so if anybody has any questions feel free to come up and talk to him um we don't currently have a meeting set for april we do have one for may i believe it's i believe it's may 17th um but be on the lookout for that and i just wanted to thank everybody for coming thanks kary