 OK, so this is not a scientific talk. This is not a talk about rules on writing papers or theorems on writing papers. It's some thoughts I've had about writing papers, which would be different from the thoughts that, say, Professor Roy or Professor Showalter or others have had. If you look online on various websites, you'll see different professors, Professor Wates at Harvard, Professor Uri Alon at Israel, just many websites. There'll be little notes like this. How to write a paper, tips on writing a scientific paper. So different people have different ideas. So don't take this as a theorem, but some possible guidelines that you might want to consider on writing papers. So you write a paper, you submit it, and it's accepted by a journal, and it's published. Then people discover your paper by looking typically at search engines that give you, you look for key words, key words of subjects that interest you. And it's typical, and the numbers that are given in this page are just very approximate. It's hard to get data on this. But, say, 10,000 researchers searching through Google or searching on the ISI Web of Science find, through a keyword, search your paper. They read the title of your paper. They make a decision. Am I going to read the abstract? Is this relevant enough to my interest to read the abstract? Well, for some, it is. Maybe 10%. You've lost 90% of your potential readers already. So you need to think about the words that go in the abstract and how they're interpreted by others. Would the key point of your paper be indicated by the words that are in the abstract? Now, they start to read the abstract. They read the first sentence, maybe the second sentence. They're getting impatient. And about 90% move on, right? Say, by the time they get to the end of the abstract. So you started with maybe 10,000 potential readers who've caught your paper through search for keywords. And now you're down to 100 or so who may read some of the paper. That means if they read the paper, it means either they read it online or they download a PDF and start to read it. Now, there are statistics on the number of people downloading the actual paper or opening the paper to read it online. There are no statistics, no way of knowing how many really do read the paper or how far they read in the paper. But one indication of the impact of the paper is after, say, five years, you look in ISI or some place to see. If people have cited your paper, and maybe 10 people have. So you've lost a lot of potential readers. It may be that you're writing on some arcane subject, or maybe your paper could have been written more clearly or more interestingly for the audience that would potentially be interested in it. So the title is important because you're losing 90% of your potential readers from the title to the abstract. So you want it to be descriptive. You make a list of key words that represent the essential points you want to convey in your paper. And you want to write for as broad an audience as possible. Broad is different for the physical review or biophysical journal from a journal that is really designed for a very broad audience like science or PNAS or nature. But whatever the target audience, you want to keep in mind to use a minimum amount of jargon and avoid acronyms. Acronyms that you use, that I use, you use every day. And we think of as part of the lingua franca that we all speak are unknown to many people. So avoid jargon and acronyms and try to be concise. Try in the title for 10 words or even some very nice short titles with three or four words, which are, even though they're short, are able to convey the essential idea. Now you've downloaded the paper and you're going to read it. And the question is here, in what order do you read the paper? And I want you, everyone, to individually answer that question here. Could you pass out? Everybody should have a sheet here and answer. So we have nine parts of a paper as listed up here and rank them one through nine in the order in which you read the paper. And this matrix up here is going to have results from different people who will rank the right, just in abbreviation, the order in which they read different sections of the paper. Would you do that? Would you fill in the order, please? OK? No? Or Professor Weeks? OK. Would you be willing to just take a column here? You don't need to put your name at the top. It's anonymous. All right? In right here, just write the abbreviated words. And I need 9, 10, 11, 12 more people. Thank you for volunteering. Let's say more volunteers. Someone wants to volunteer right the way, the order in which you typically, there's no universal way you read a paper. Some papers you might read in a different way than others. But would you do it? Would you write up here? The order in which, OK, thank you for volunteering. Thank you for volunteering. Right up here, take the sheet of paper and write in the order in which you read the different sections of a paper. You don't need to write on this. Just write on the board, OK? I need your eager to volunteer. I appreciate that. Thank you. OK. Someone else? Someone in the back here? Yes, thank you. And we need Professor Story. OK. OK, volunteer. Yes, please, how many people would you? Excuse me, these are extra. OK, extra. Let's see. So you write in the order that you read a paper, OK? Do you read the conclusions first? If you put that first, many people jump to the conclusions. What order do you read a paper, OK? Let's see how we're doing. We need, we have space. Go ahead, there's space here. And we have space for two more volunteers. Here we go, thank you, sir. And one more volunteer. All right, good. And let's just see if we, so you need this sheet of paper. So just write the abbreviation. And each person can write the numbers one through nine for the nine different topics here, OK? And you know, yes, sir, did you not get a sheet of paper? Sure. And everyone else get a sheet of paper. Does anyone need one? OK. Oh, I've done it before. OK. All right. OK, everybody has a sheet of paper. We're almost done. All right, OK. Thank you. Yeah, OK. Right, just turn me on. You're saying you don't like the references to people on the side of you? Oh, yeah, it's another one. Yeah, yes, as Michael White, All's House has just said, are you one of the authors, as many are, who, when they get to a paper, look at the references first? Why would you look at the references first? Do they cite the top of the list? To see if they cited your important work, right? If they didn't cite your important work, obviously it's not a very good paper. Right, and if they did, did you search student fair? What did they say about it? Yes, yes, what did they say about it? It wasn't very important, obviously they cited it. OK, so we see many people, in fact, most people have written abstracts. Here's someone who reads the methods first. Here, the figures first. And abstract, but look, let's look at the order in which people look at the figures. Second here, figures first here. Where are the figures here? Yeah, well, let me put a circle around. Figures, OK. So figures, where are figs down here? Figs, fig, figs, figs. Second, where are the figures? Oh, quite late there. OK, later. OK, figures. How about the conclusions? Let's just look at that. When do conclusions, the very last thing in the paper? So if you just read it serially from beginning to the end, you'd read the conclusions. Conclusions, conclusions. Introduction to the conclusions, OK. Conclusions, conclusions, conclusions, conclusions. Where are the conclusions? Conclusions again. Second, OK, conclusions, and conclusions. Almost no one waits till the end to read the conclusions. You want to read how the novel comes out, right? You jump to the conclusions. Well, the point is that if you write your paper in the sense that we think about writing a paper, it's like a novel. You start at the beginning, and you read, write the whole thing. You write your abstract. You write your background. You write your methods, describe the methods. And now you're ready to present your results, and then maybe have a discussion, and then finally the conclusions. And you make figures somewhere along the way. Well, these people who jump from figures to conclusions will have not read the definitions of the different quantities. They really don't have much idea of what you're talking about. The definitions and the basic ideas, if you jump from the figures directly to conclusions, leave you wondering, the conclusions don't make any sense because you don't have the background. So the point is you really need to think about this result that's on the board that most people jump around. Most readers skip around. And so you want to make each part of the paper as readable as possible without assuming that the reader has gone through serially, section by section. If you spread out definitions of quantities throughout the paper, it's very hard to read because you go to, you, like other readers, skip around and you're reading the results. And there are acronyms there. There are various quantities. And then you have to find, if you want to understand the results, you have to find where do they define what this quantity is. So you start searching back in the paper and it could be all over the place. So it's nice to have the definitions and basic ideas grouped together so it make it clear that these quantities may be in a table. There are different ways of doing it. There are many different ways of writing paper that are perfectly appropriate. But whatever way you choose, you should keep in mind that people skip around when they read and that it is, if the paper is difficult to read when you skip around, if you skip from the introduction to the conclusions and then to the methods, it's hard to understand what's going on. And when it becomes hard to understand whatever the topic, whatever the authors, you're likely to skip to the next paper. You're not going to finish reading that paper. Now, my personal view is that figures are extremely important. That figures are often the second thing a person looks at, that readers look at after reading the abstract. Maybe they'll read the introduction and maybe not. I don't always look. When I get papers to review, I skip to the figures. See if the figures tell the story more or less. Each figure should be a vignette, a short story, should tell a story together with its caption or even better without its caption. You look, the labels are clear, you have a graph that you can understand and have some idea about the subject matter of the paper. OK, so readers skip around. When you write a paper, keep that in mind. Now, the abstract. We said you lose 90% of the readers, potential readers, and going from the title to the abstract. The abstract is short. How short depends on the journal. But you've got a lot to tell in a short abstract, which is only a few sentences. If it's a physical view letters, it's fewer than 600 characters, including spaces. So that's not much. If you have a paper in some of these other journals, most other journals, your abstract can be a bit longer, but it's still short. Now, what do you want to include in the abstract? Well, first, what is the problem? What is the subject you're writing about? What's the problem? Pose the problem. And why is it interesting? Why should anyone care? Then you're writing a paper about it. So you would want to know, how did you address this problem? What have you done? Have you done a calculation? Have you done an experiment? Do you have a new theory? Have you done some numerical simulation? What did you do? And, of course, most importantly, what have you found? What is new? What have you found? And try to be quantitative. Saying we found new results that are interesting has zero content. It wouldn't be a paper if it didn't have new results that are interesting. You want to say something specific. Why are these results interesting? Why should I care? You can obtain results in any topic you take up. In study, do experiments and calculations, you can obtain some results. You can make some graphs and submit it to a publisher. And it'll probably be published somewhere. But why is it interesting? Why is it interesting? And what are the implications of your work? How does your result differ from previous work on this topic? And what are the implications of your work for future work? How should your discovery impact future studies? People reading your paper maybe have planned to do research or are conducting research on the same topic. How should they reconsider what they're doing and change the emphasis or change the approach? What are the implications of the results that you have obtained? OK. So as we've seen here, after reading the abstract, many people look at the figures. Many don't. Many do look at other parts of the paper. But it's not unusual to read the abstract and maybe the introduction and then turn to the figures. So as I've already said once, and maybe will repeat myself too many times, that every figure together with its caption should be, to the extent possible, self-contained. Be a little short story in itself which makes up part of the whole novel, the whole story. Now, you see figures with many different parts in them. Eight different parts. Each little graph has maybe several curves and a legend identifying the different curves. I reviewed recently a paper with three or four pictures across and four high 16 pictures in a panel. Each one with a different axis and several curves in each one. That is not a paper that people would likely read. If you have multiple parts of a figure, they should fit together in a natural way. You want the figure to tell a story. Now, we go to the figures rather than the text for many reasons. Figures tell a story quickly. Our brains have evolved for 65 million years since the beginning of primates and other animals. Eye, brain, system developed many times in evolution. But let's say 30 million, 3 million anyway, 3 million generations, many generations to evolve a very sophisticated system for interpreting images. For interpreting images, that's why we like figures, because we can interpret quickly and understand the meaning of a figure if it's well conceived and executed. Well, you would like the figure to be something you can understand even without a caption. Sometimes a caption is not necessary to understand a story. Now, if you look carefully, you might be able to understand what the situation is here. Do you need a caption? Standing on the pinnacle of this rock, and there's an obstacle to preventing his return off the rock. OK. So there are guidelines that have been written by many people on how to make figures. A set that I particularly like is this book by Edward Tufty, the visual display of information book that's now 30 years old in which he has advice on how to make a good figure. And I'm just going to repeat some of his advice here. And the first thing he says when you draw a figure, you want to maximize the amount of ink that shows the data relative to all other ink. The data should be what draws you to the figure, not extraneous marks, not a lot of labels, but the data. You want to make the graph as simple as possible. If you have 10 curves, some are red, some are green, some are dash, some are dotted, then it's hard to make sense of the graph, especially if different information is conveyed by different curves. There are times when you want to have multiple curves, but always try to make a graph as simple as possible so that the reader can readily interpret its meaning. Now as far as having graphs with legends, very common, to have a legend with maybe seven different curves, each different colors, and dashed, and dotted, and so forth, it's much easier to interpret the figure quickly if you have the labels on the curves themselves and not have a legend. I don't think one should ever have a legend. That's the extreme position. People like to have legends. Now, a curve doesn't have to follow all of the suggestions of Tufty to be an important curve or a curve obtained by an important person. And here is an example that Tufty gives. Nobel Laureate Linus Pauling has this figure where the atomic volume is plotted versus atomic number. Now if you think about Tufty's advice described in the previous slide, you see that this figure doesn't follow his advice very well. So I want you to divide up now in groups of three and find six or seven things, ways in which this graph could be improved. You want to join forces. Talk to your neighbor, and then we'll list on the board ways this figure can be improved. Look at it. This is Nobel Laureate Linus Pauling. Can you improve on Linus Pauling's work? Would you take a picture of this for your camera? Yeah, I want it. I want the statistics. Maybe two. Got it? OK, let's see what you came up with. What would you do to improve this figure? Anyone raise their hand. Yes. What? Yes. OK, what is the meaning of the crosses? Zero means, well, it lines with the axis, but they're totally unnecessary. So that is extra ink that Tufty talks. Eliminate all the extra ink. Eliminate all the plus signs. There's no information about the data, which is the quantity of interest. Good. We eliminate the good. Yes, sir. Yeah, why don't they have units? I don't even know what the units are. Maybe it's a volume. Cubic angstroms? I need Raj Roy. Raj Roy, the cubic angstroms? Maybe a cubic bore radii? It's a good question. What are the units? Have units on the graph. Thank you. Good. And we have to find out. Oh, error bars. Yes, error bars. Yes, some uncertainty would be good. Anything else? Yes. Yeah, are they some kind of fit? Or are they just drawn to guide the eye? I don't know. What does the dash line, the curve, mean? And why does it go up on this side and not on that side? And that one keeps going. Other suggestions for improvement? Eric, write the numbers horizontally rather than vertical. Yeah, and you could say that about atomic volume also. It's easier to read if everything is oriented in the same way. Anything else? Atomic volume of the, this is atomic number zero to must be 92. This must be uranium, I guess. So it's the naturally occurring elements on Earth. OK. All right. Well, let's, yes. OK. Eric would put tick marks along the right hand and top axis. Well, here is Tufti's version of this curve. And I think we can do better than this. But this has many of the changes. Well, notice the number of labels on the horizontal axis has been reduced. You typically need only two or three numbers. You have extra ink, and you sometimes see numbers that are very crowded together across the vertical or horizontal axis. Now, something Tufti did, which I don't agree with, is he dropped a zero. I like to see where the zero falls on both the vertical and horizontal axis. Because sometimes there's an offset to try to either hide something or work. And also, if you see a zero, you know it's not a logarithmic. Well, it depends on how it, it helps you to interpret immediately whether you have a linear or log axis. But having a zero, I think, is helpful. As we, now the atomic volume label is horizontal instead of vertical, as in Pauling's graph. And we have only two numbers here. Still have no units, as was pointed out, we should have. But look at these labels here. This is the alkali series now. Lithium, sodium, potassium all the way up to Francium. And you immediately see it. And you have not only the symbol for the element, you also have the atomic number. And you can immediately see from this, well, there's a gap here between this peak and this peak of eight atomic numbers. And from 11 to 19, again, eight. And from 19 to 37, 18. And at 37 to 55, again, 18. You see regularities. You can see so much more and understand. And something funny here in this region, this is the rare earths. So he added the label rare earths. OK? So we have a much improved graph. And we've heard suggestions for making it better, particularly adding the units, a number of things. We still have these dashed lines. We don't know exactly. I don't think Tufty knew what the dashed lines were. They were in the graph that Pauline had. OK. So now we have a pretty good graph. If you saw that in a paper, you could immediately interpret a lot, learn a lot from that graph. Now, Tufty has a number of illustrations at the other extremes, that is the worst graph. And he said, maybe this is the worst ever. All right? Now, again, talk with your neighbor and come up with seven things wrong with this graph or seven ways in which it might be improved. This is an awful graph. Talk with your neighbor and come up. I've only been. OK. In interest of time, let's move ahead but get some ways in which this graph is misleading or wrong can be improved. What comments do you have about this graph? Anyone? Yes. There's no color bar. Worse yet, why is there color? You don't have much information. The color conveys nothing. It's what we would say is gratuitous color, unnecessary. It doesn't add anything to the understanding. All right? Thank you. Yes? Why is it three-dimensional? You know what? There are five pieces of data here. Here they are. One, two, three, four, five. That's all. Five pieces of data. We have three dimensions. We have how many colors? OK. Any other comments? Yes? Right, this is the year it's unlabeled. And it says age structure. There is a label at the top, percent of the total enrollment of the students, 28%, 30%, and so forth. OK. It's not well-labeled. Other comments? Yes, there's still something that hasn't been mentioned, which what you have on the top, the very same information that's on the bottom. Why is there a top graph? Why is it? Oh, these curves. Where are these curves? This is totally made up. And yes, the curves are worse than bad. OK. And you have this break in the curve to separate the top, which duplicates the information which is in the bottom. OK. So I've written some of the things wrong. And I think we've covered them. There's this break in the ordinate from 34 to 66. We have this curve that then becomes a straight line. We have fonts, as was said, too small. We have duplication at the top and bottom, gratuitous color, and three-dimensionality. OK, let's take these same data that you obtained and you want to present and see if we can present them better. Here are two ways that could be done. You only have five data points. You can make a table. If you prefer a graph, here's a graph, which I labeled this morning. I put three here, 72, 74, 76, and three numbers on the ordinate. And there are your five data points. All right? If you look at that graph, it takes you how many seconds to understand it? Very few. So compare. Now, we work in our group very hard to make good graphs. Here was a graph that was made years ago, but we tried consciously to follow Tufti. You see just a few numbers on the axes and simple labels on the axes. And look at the data, big black dots and curves, curves to guide the eye. These aren't theoretical curves. OK. Now, a recent graph, which I thought was very nice, was made by my colleague Michael Marder and two of his student Frank Mail and another professor. And this one has been reprinted many places. It appeared two years ago. One issue with this method of fracturing the shale to withdraw gas or oil, particularly gas, natural gas, that was developed in the last two decades by George Mitchell. Fracking has enormously increased the yield of wells of gas, natural gas. And the biggest shale reserve so far that has been drilled is called the Barnett shale in Texas. So this is the Barnett shale. And they've drilled 6,237 wells. And they've studied them. And this is a physics department study. Professor Marder and his physics graduate student found a scaling relationship in which every, you don't see the individual dots, there's 6,237 dots here representing each of the wells. And they all, with the proper scaling relation, fall on the same line to some accuracy. And this is very important if you want to predict the yield of these wells 20, 25 years from now. And the uncertainty that has existed by the rule of thumbs that the petroleum industry has had is enormous, or is a magnitude. This really nails it in one scaling graph. And for this, they received a prize, basically for the paper, but this is the key graph, which conveys so much information in a single graph. So we interpret figures, if they're well-prepared and executed, we interpret figures quickly and understand relationships. So that's why figures are, so read the paper. Now, we also have to really understand a paper. We read the text, but we understand text with much more difficulty. So for three million generations, we have developed ability to interpret images. How many generations have passed since we've been able to interpret the writing on a piece of paper? These symbols and convert them into ideas and relate the ideas. That's a much harder task. Well, the first writing goes back five centuries, and you have symbols which convey ideas that are related to one another here in writing. Now, five millennia is a lot of generations, but nothing like 65 million years. But just 150 years ago, only 10% of the world's population was literate. So that's only a few generations, six generations ago. Now, the world literacy rate is about 90%. But we've been training the brain-eye system to interpret the symbols to have meaning. And these meanings are related to one another through our thinking process only for, say, few generations for most of us. Maybe some of us had Egyptian ancestors 5,000 years ago could read, but many of us, myself included, have ancestors that few generations ago couldn't interpret these symbols and give them meaning. So our brain system is not as sophisticated in interpreting writing as it is images. OK. Now, changing the subject. I didn't start it too, OK? A paper is not scientific unless it contains all the information that is needed so that the results presented, whether it's theory or calculation or experiment, can be replicated. Now, you read papers, a surprising number. I would say even a shocking number. Do not have all of the information you need to replicate what's reported in the paper. Now, for different kinds of papers, experiments, simulations, theoretical analysis, there are different things that need to be included. Now, the tips I'm presenting here are in the Google Docs, in a short, three-page description of the tip, so you don't need to write all of that down. If you think about whatever you're doing, what is needed to duplicate your work? When you read a paper and you want to duplicate the work, when you get very serious about duplicating it, you find some information is missing. It's often true. And when you set up an experiment or start to do a calculation, you don't know the boundary conditions or you don't know the initial, if it's a nonlinear system, then the initial conditions are important because different initial conditions can lead to different long-time behavior. You can have multiple detractors in a nonlinear system, so all of that information must be there if it's science. Otherwise, it is not science. OK. Now, referencing, it's obvious that you reference the work of those who've gone before and studied a related topic or the same topic. No excuse for not having complete set of references to the relevant prior work. You can go to Google Scholar or you can go to the ISI Web of Science and do a search in keywords. You can do a search in authors and find all of the work that is relevant to your paper. There have been known to be scientists who deliberately did not include references to the name of a person who they happen not to agree with. That's completely unscientific. It is unethical to do so. Any prior work should be referenced. And in preparing this two days ago, I made a discovery that I showed this figure and then I was looking for articles on granular media and I found this journal, Granular Matter, published by Springer. And I looked at that picture there and it looked somewhat familiar. And then I got this picture and they published on the cover of the journal a picture that Paul Umbanauer had taken in our laboratory. And you can see. Well, they made it monochrome instead of colored and they flipped it. Say the two balls are on the right side here and the left side there. So then I looked in the journal to see if there was any acknowledgement of the source of the figure. I didn't find any. OK. All right. Now, almost at the end. I'm going to skip this. I talk in the tips. I think the impact factor should be abolished. It's very misrepresentative of the importance of the science and it's totally misused mostly by administrators who were once scientists and should know better, but they look at the impact factor of the journals. Well, the impact factor is for journal not for the individual article. And it's based on the citations in the first two years of a paper. And if a paper is really novel, if it's something really new, there's not a community of scientists working on the problem. It might not be cited very much. You look at Einstein's 1905 paper. How long papers? How long did it take before the citations start to build? If something is really important and novel, new, then citations build in time, but not immediately. And even for the average paper, only 6% of the total number of citations over the life of the paper are in the first two years. And in some fields, if you're studying cancer and you just have found some gene related to cancer, you immediately have, when you report it in journal nature or cell, you have a huge number of citations where if you're proving a theorem that few mathematicians understand, but yet could turn out to be very important for science later on, but the community is small and there would be very few citations. So for many reasons, I object to the whole idea of the impact factor. All right, now let me conclude. So the most important thing in writing is you write a draft and you rewrite it and you rewrite it, making it more precise, shorter, and more readable by a broad audience with each rewriting. Get others, people that are not in your field even, to read your paper and to criticize it and listen with an open ear, listen to their comments. Don't be defensive about criticism. You can even read it aloud. You find that useful. You've written a manuscript. Now you read it aloud in something that you've read many times in your writing and just glossed over the paragraph. Now you read it aloud, you realize it doesn't read smoothly. The flow is poor. It's hard to understand the antecedents or whatever. And the last point is very important is, in the course of doing research, you do a little number of side projects. You solve a problem that you once thought was really key to your subject, but turns out not to be so important. But you spent three months working on it, and you want to include it. You work so hard, and you solve that little problem. It just happens not to be relevant to your core result. Cut it out. The longer a paper is, the fewer readers you will have. Now, last point, and I'm done. Often the writing is left to the end of the research. And someone will say, I've worked three years on this project, and I'm finally finished. Now next week I'm going to write the paper and then go on vacation. That's not the way it should work. Writing is an integral part of research. In the course of writing, you realize you don't fully understand some aspects of the work, and some more experiments or calculations are necessary. And you start trying to explain it to someone, and they don't understand it, and you realize there are holes in your logic. So writing is a continuous process. Start it early and continue it until it's really tight. The writing itself will lead to a much better understanding of the science. And I guess that's it. Yeah, you can just Google this when he tips on writing, and you get what I've said. Or you can go to the Google Docs. But this is the main point, too. All right, thank you very much.