 I'm back again. This is Dr. Harriet Hall with lecture 9 in a series of 10 lectures on Science-Based Medicine. In the last few lectures, I've covered a lot of things in alternative medicine and shown that they were not based on science. Now back to the main topic, science-based medicine itself. We depend on science for our knowledge, but science isn't perfect. We can't just automatically accept something is true just because there's a scientific study that says so. Studies are carried out by imperfect humans, and humans make mistakes. In this lecture, I'm going to talk about how research can go wrong in some of the pitfalls that can lead to false conclusions. All scientific studies are not equal. That's something that's hard to explain to CAM advocates. They say, you want science? We'll give you science. And they do a study that proves to their satisfaction that their treatment works. And when we reject it, they think we're being unfair since they think we accept the same kind of study for pharmaceutical drugs. Some randomized control trials are good science. Some are not so good science. And some of them are junk science. True believers are unable or unwilling to recognize the fatal flaws in their own studies. They don't understand why we won't accept their poor quality evidence. And it's not just CAM practitioners. There are lots of doctors who don't understand what constitutes good science. They may just read the conclusion or the abstract of a study and assume the findings are true. They're too busy to read the whole study, and they probably wouldn't know what to look for if they did. Most doctors weren't taught how to critique a study in medical school. I didn't learn how until long after I graduated. Junk science includes tooth fairy studies that researchers shouldn't have bothered to do, and studies that were so poorly done that the results are useless and we can't draw any conclusions from them. Joe Schwartz's definition of junk science is, any argument that claims to have greater support than the evidence actually justifies, usually to advance a political or commercial agenda or to buttress a personal conviction. Chemistry and physics experiments are relatively straightforward. Chemical compounds and physical phenomena don't lie. Chemicals always behave the same. When you mix chemical A with chemical B, you always get compound C, and you can even calculate exactly how much C you're going to get. Human experiments are more complicated. Molecules of a chemical are all identical, but humans are a mixed bunch of individuals that can be very different from each other. And humans don't cooperate like chemicals do. They may not follow instructions. They may try to figure out if they're in the placebo group by opening the capsules to taste for sugar. They may report what they think the researcher wants to hear, and they may even misrepresent their symptoms in order to qualify for a study. Dr. House was right. Patients lie, either deliberately or inadvertently. People don't remember accurately, they forget significant information, and they try to hide information that might make them look bad. A study showed advocates of any CAM treatment can cite studies showing that it works, but consider this. There are 6,500 peer-reviewed scientific journals that publish 2 million papers every year. That works out to 4 new scientific papers every minute of every day. So it's not surprising that a study can be found to back up virtually any point of view. This is Dr. John Ioannidis. He wrote a landmark paper showing that most published research findings are false. I'll say that again. Most published research findings are false. He showed that they're particularly likely to be false if they're small studies, if the effect size is small, if there are multiple endpoints, if financial interests and bias are involved, and if more teams are in competition in the same area of research. And they're a lot more likely to be false if they're studies of improbable treatments like homeopathy. Now, that doesn't mean we can't trust science. After all, Ioannidis had to use science to know which research findings turned out to be wrong. The scientific process is self-correcting in the long run. Early studies with false findings are followed by better studies that are more reliable, and a scientific consensus gradually builds based on the entire body of evidence. So, if most published research findings are wrong, how can we decide whether to trust the results of a given study? What can we do to minimize the chances of being fooled? Well, in the first place, don't trust the media. Media reports are notoriously unreliable. You can't trust them to give you accurate scientific information. I'll come back to that in the next lecture. Learn what sorts of things can go wrong in research. Never believe one study in isolation. Have the findings been replicated? Do other studies support it or contradict it? Has there been a meta-analysis? Consider prior probability. Remember what Carl Sagan said about extraordinary claims. When you hear about a study, don't just accept it at face value. Ask questions. I'm going to give you a whole toolkit of questions you can ask. The first question to ask, perhaps the most important question, is did they study people or fruit flies? Was the study done in vitro, in animals, or in humans? Now, you can destroy cancer cells in a test tube or a petri dish in the lab with a blowtorch. But you can't very well use a blowtorch to treat cancer in people. Well, I suppose you could. It would get rid of the cancer, but it might get rid of the whole patient in the process. Animals have a lot in common with humans, but they're not exactly the same. If a treatment works on animals, it still has to be tested on humans. So we can't treat patients on the basis of lab studies and animal studies. Those are only a starting point. All you can do is file them away, reserve judgment, and wait for human studies. The next thing you can do is to ask the four questions on this quick checklist proposed by R. Barker Basel. I mentioned his book before, Snake Oil Science, The Truth About Complimentary and Alternative Medicine. He's a research methodologist who designs research studies for a living, so he knows what to look for. He thinks these are the four most important questions to ask. I'll go over each one of them. One, is it randomized with a credible control group? We know that doing anything is usually better than doing nothing. So we need to compare the test treatment to an appropriate control group. For an acupuncture study, a good placebo control might be sham acupuncture. For a drug study, a good placebo control might be a sugar pill that looks exactly like the drug pill. Some examples of poor controls might be comparing the treatment group to a waiting group that gets no treatment, comparing the results of a new study to historical data or to a group that gets usual treatment, and comparing different modalities to each other like acupuncture and drugs. Lots of studies get published that don't even have a control group. Look for the words randomized control trial in placebo controlled. A control group should eliminate as many confounders as possible so that you can be reasonably sure that the results were due to the test treatment rather than to some other unrecognized variable. Number two on Bowcell's checklist, were there at least 50 subjects in each group? That would be 100 total for a study with a treatment group and a placebo group, or 150 for a study that also included a no treatment group. In general, the larger the sample size, the more likely you can trust the results. There's an old story about a chicken study where one-third of the chickens got better and one-third stayed the same. If you're paying attention to the math, you should ask, what about the other third? The answer is, that chicken ran away. Obviously, you can't draw any conclusions from a study that only had three chickens. Item three, is the dropout rate 25% or less? Here's why that's important. Say there are 10 subjects and six of them quit because the treatment wasn't working. Of the four who were left, three improved and one didn't. So it looks like the success rate was 75%. But if you include the failures of all the ones who had dropped out, the true success rate was only 30%. Item number four, was it published in a high quality, prestigious peer-reviewed journal? If it got published in the New England Journal of Medicine or the Lancet, you know it had to meet pretty high standards. If it only managed to get published in the Laura Slobovian Acupuncture Weekly, you should wonder why. If you've never heard of the journal or don't know if it's a high quality journal, you can look it up online on impact factor lists to get a numerical estimate of its relative importance in its field. Here are some examples from a list of the 40 top-ranked medical journals. The top four are the New England Journal of Medicine, the Lancet, the Journal of the American Medical Association, and the British Medical Journal. The Mount Sinai Journal of Medicine is near the bottom of the list, and some journals are not on the list at all. This list is for general medical journals, and there are individual lists for specific areas like cardiology. There are also page-ranked listings and other ways of scoring. If a study meets Basel's four criteria, that still isn't a guarantee that you can trust the results, but if it doesn't meet these four criteria, it's an indication that it might not be trustworthy. If a study passes Basel's checklist, there is still a lot of other questions you should ask. First, who are the subjects? Are they a representative sample? Do they include both sexes, all ages, people like you? The results of studies done on men might not apply to women and children. Results in Africans might not apply to Americans. Results in young adults might not apply to the elderly. People with other diseases or on other medications may not get the same results as the carefully selected study participants. Was it a study of blood pressure medications and diabetics? The results might not apply to people who don't have diabetes. Could the subjects be biased? If it was a study of acupuncture, who do you think is likely to volunteer? People who believe in acupuncture are more likely to sign up. If they don't believe it works, they're not as likely to volunteer to let people stick needles in them. I read one study where the researcher used his own students as subjects. They might have wanted to please the teacher and get a good grade, and that might have skewed their reporting. Next question, who's paying? Studies have shown that when a drug company is paying for the research, the results are more likely to be positive for the company's drug. That doesn't mean they're deliberately being dishonest. The researcher might subconsciously want to please the company so that they'll get funding for more research, and the company might exert subtle influence in various ways. But that doesn't mean you should dismiss a study just because it was funded by the manufacturer. Most researchers are far more concerned with maintaining the integrity of their scientific reputation than in pleasing their funding sources. And in most cases, the funds are simply handed over with no attempt to influence how the study is designed or carried out. But it's something to keep in mind, and you might want to give more weight to a study that was funded by an unbiased source. Next question, who are the authors? Could the researchers be biased? Is there a conflict of interest? Financial conflicts of interest are usually divulged in a statement at the end of the article. Do the authors have stock in the company or get lecture fees from them? Other conflicts of interest may be subtle and undeclared. If the researchers have strong beliefs about what they're testing, or if they have a vested interest in proving the truth of a claim, it's more likely that they'll get positive results. Do they make a living based on the treatment they're studying? Are they acupunctures? Studying acupuncture? Homeopaths studying a homeopathic remedy? Chiropractors studying chiropractic treatment? Is the subject a pet project of the researcher, one that he has staked his reputation on? Would his career be ruined if the claim were proven false? Was randomization adequate? Did they use proper procedures? If randomization isn't strictly controlled, researchers might be tempted to assign sicker people to the control group to make the treatment look better. Or they might feel sorry for a sicker patient and want to get him in the treatment group so that he'll have a chance. Even if proper randomization procedures are followed, the groups might end up with significant differences just by chance. For instance, did a lot more men than women end up in one group? Did the group getting the treatment have a lower blood pressure than the control group? Good studies usually publish a table of comparisons showing the age, sex, and other characteristics of subjects in both groups. So you can see at a glance if too many people with that characteristic ended up in one group. Was blinding effective? Was there any way the researchers might have guessed which group a subject was in? Could the subject have guessed? The best studies do an exit poll. After the study, they asked subjects whether they thought they got the active treatment or the placebo. Maybe they could tell because they got side effects from the active treatment. Maybe the researchers inadvertently gave the subject some clues to which group they were in. If they can guess better than chance, you didn't have an adequate placebo control and you should go back to the drawing board. Were there multiple endpoints? For instance, in a study of heart disease, you might have a single endpoint like death from heart attacks. Or you might also look at a lot of other outcomes like non-fatal heart attacks, chest pain, the need for interventions like bypass surgery, time in the coronary care unit, exercise tolerance, length of hospitalization, quality of life, blood pressure, the need for pain medication, etc. If you measure enough things, one is almost guaranteed to show a positive correlation just by chance, even if the treatment doesn't work. They might have gotten 14 negative results and chosen to declare the treatment a success by reporting the 15th one that was positive. It's like the Texas sharpshooter fallacy, where a Texan fired at the side of a barn and then drew a target where the bullets hit and claimed to be an expert marksman. There are statistical methods to correct for the problem of multiple endpoints. Were they used? Was there inappropriate data mining? Sometimes when researchers don't get the results they want, they mine the data. They go back and look at different segments of the data and twist the numbers every which way and torture the data until they get something that agrees with their expectations, and then they report that. It's a way of deceptively turning negative results into positive results. Where was the study done? The percentage of published trials with positive results varies widely from country to country. In Canada, 30% of acupuncture trials have positive results. In the U.S. it's 53%. In Asia, 98% of acupuncture studies are positive. That's not just for acupuncture studies, but the results of all clinical studies vary according to where the study was done. That should make you suspicious because science should be the same everywhere. In China, negative results are almost never published. The culture sees negative results as failures, and the researcher loses face and may lose his job. Russia is another big offender. I'm reluctant to believe any research out of China or Russia until the findings are replicated in a country with a better track record of publishing negative results. Failure to publish negative results is called the file drawer effect. You might have ten negative studies and only one positive study. If only the positive study gets published, people will believe that the evidence shows that the treatment works, even though the majority of the evidence was actually negative. In addition to the file drawer effect where the researchers don't submit negative studies for publication, there's also publication bias where journal editors are less likely to publish studies with negative results. Were the studies clinically meaningful? What was the effect size? If a drug lowers the blood pressure by 40 millimeters, it might be able to prevent a lot of strokes. If it only lowers it by one to two millimeters, that's probably not enough to make much difference. Was the endpoint a lab value or a clinically meaningful event? We don't just want to know if cholesterol levels went down on blood tests, we want to know if patients had fewer heart attacks. We're not treating lab tests, we're treating people. We don't want to have to say the surgery was a success, but the patient died. We're looking for what we call poems, patient-oriented evidence that matters. What does statistical significance really mean? Statistical significance is very commonly misunderstood. P values lower than 0.05 are said to be statistically significant. That cut-off is just an arbitrary convention. Some people think it means there's a 95% probability that the results of the study are true, but that's not what it means at all. A value of P equals 001 does not mean that the findings are more true than if it equals 0.05. The P value only measures the probability that you would get the same results you did if there was really no difference between the two groups, in other words, if the null hypothesis was true. A low P value only allows you to reject the null hypothesis that A equals B. It doesn't necessarily prove the experimental hypothesis is true that A is better than B. Now, this is important. Remember this. Statistical significance is not the same as truth. And truth is not the same as clinically significant. Another question. What are the confidence intervals? If you repeat a study, you won't get exactly the same numbers every time. Look at the set of bars on the left. On this particular trial, they got a value of 18 for the brown bar. The red lines show the confidence interval. For the brown bar on the left, they calculated that if they did repeated trials, they could be 95% confident that the results would fall somewhere between 16 and 20. In other words, there's a 95% chance that the true value falls somewhere on that red line. In the set of bars on the left, the red lines don't overlap. The lowest red value in the brown bar is still higher than the highest point for the white bar. Brown clearly wins. On the right, some of the higher red values for the brown bar are higher than some of the lower red values for the white bar. So even though it looks like white is the winner, it's possible that brown might actually be the winner. And it just didn't happen to show up that way in this particular trial. Here's an example of how it might be reported in a scientific article. The reduction in mortality was 35% with a 95% confidence interval of 21% to 58%. Did they report relative or absolute risk? They might report that people who eat kumquats are twice as likely to have the disease very cyclitis. The relative risk is twice as much or 200%. But if only one person and a million people has the disease very cyclitis, twice as much would only mean that two out of a million people who eat kumquats will have the disease. The absolute risk is actually just one more person out of every million. 200% or one in a million. These sound very different, but they're actually the exact same statistics just expressed in a different way. So when you read a study, look for actual numbers rather than percentages or proportions. Did they report NNT and NNH? NNT is the number needed to treat and NNH is the number needed to harm. In the early days of Lipitor, one study found a 19% reduction in the risk of heart attacks. Now, you might think that if you took the drug, it would reduce your personal risk of a heart attack by 19%, but that's not how it works. The numbers are for populations, not individuals. The study found that in order to prevent one heart attack, you would have to treat 250 patients. If you're not that one patient out of 250, the drug won't prevent you from having a heart attack. It won't do you any good, and it might cause side effects. And NNH of 200 means that out of every 200 patients who took the drug, one had a significant adverse effect. Those things are nice to know. Did they mistake correlation for causation? This is one of the biggest mistakes people make. Correlation does not equal causation. Remember this graph from lecture one? The rise in the number of diagnoses of autism correlates almost perfectly with the sales of organic food, but that doesn't mean organic food causes autism. If they tell you that playing video games is correlated with violent behavior in kids, and then recommend that you prevent your kids from playing video games, they've made that error. They've shown a correlation, but they haven't proven causation. It could be that kids who are likely to behave violently are also more likely to want to play video games. Or it could be that some other factor predisposes kids to both violent behavior and playing video games. And they have no way of knowing whether preventing your kids from playing video games would actually do anything to reduce violence. It might even make them angry and more likely to act out. Did they tell you the base rate of the disease? A news report says, Breaking news, new blood test, 87% accurate in detecting Alzheimer's with an 85% specificity rate. That sounds really good, but it may not be. You can't tell what it means unless you know the base rate of Alzheimer's and the population they studied. 87% accurate means that the test is positive and 87% of those who have Alzheimer's. We call that the sensitivity of the test. 85% specificity means the test is negative and 85% of those who don't have Alzheimer's. But sensitivity and specificity don't give you any useful information until you know the prevalence of Alzheimer's in the population. Let's say a population consists of 10,000 people and 5% of them have Alzheimer's. That means 500 people in this population have Alzheimer's. A sensitivity of 87% means that 87% of them will test positive. That's 435 people. And 65 of them will test negative and will be led to believe they don't have Alzheimer's when they really do. There are 9,500 people in this population who don't have Alzheimer's. A specificity of 85% means that 85% of 9,500 or 8,075 people will test negative and the other 1,425 will test positive and will be led to believe they have Alzheimer's when they really don't. What you really want to know is if you test positive, what's the likelihood that you actually have the disease? To get that, you divide the true positives, 435 by the total positives, 1,860 and you get a positive predictive value of 24%. And if you test negative, what's the likelihood that you really don't have the disease? Out of 8,140 people who test negative, all but 65 of them really don't have the disease. So the negative predictive value is 99%. So this test would be very good at ruling out the disease, but it would be useless for ruling it in since only about a quarter of those who test positive actually have the disease. Remember that the rarer the disease, the less likely that a positive test result is true. Lots of other things can go wrong in research. Here are just a few of them. Errors of math, using the wrong statistical test, contaminants in the lab, poor compliance, patients didn't take all the pills, technicians may have manipulated the data either consciously or unconsciously to please the boss. Procedures may not have been carried out properly. The data may be good, but the conclusion may be wrong. And there could have been fraud. Sometimes researchers deliberately fudge the data or even fake a study completely. In 2001, the Journal of Reproductive Medicine published a study by three researchers from Columbia University. They found that praying for women undergoing infertility treatments doubled the rate of pregnancy. Being prayed for can make you pregnant. That sounds like what Carl Sagan would call an extraordinary claim requiring extraordinary evidence. And this was one study not corroborated by anything else. Bruce Flam, a professor of OBGYN, was skeptical. He evaluated the study and he found that the experimental methods were seriously flawed. Then he investigated the authors and he became really alarmed. The lead author, Rogerio Lobo, didn't even know the study had been done until six to twelve months after it was completed and they asked him to provide editorial assistance with the write-up. The second author, Quang Cha, refused to respond to any inquiries. The third author, Daniel Worth, turned out to be a parapsychologist with no medical training who was also a con man and a convicted felon and who was strongly suspected of having faked data in other studies. So at this point we aren't sure any study was actually done. It may have been entirely fabricated or at least the data may have been altered or misreported. The journal never retracted it and it's still cited in the literature as proof of the efficacy of prayer. No, it isn't proof of anything. You may not have any way of knowing about fraud until it's revealed several years later or it may never be revealed. But at least you can keep the possibility of fraud in the back of your mind, especially if the reported findings are unexpected or if the authors have been involved in questionable behavior in the past. Two kinds of study are particularly likely to be wrong. Tooth fairy science and pragmatic studies of implausible treatments. Tooth fairy science is where you try to study things that don't exist like the tooth fairy or the human energy field. I've covered that in lecture 7. And pragmatic studies of implausible treatments are also likely to get false positive results. The setting of clinical trials is artificial with a select group of subjects and careful monitoring in a research environment. When the treatment is used in real world settings like a doctor's office with a broader mix of patients it may turn out to be less effective. That's where pragmatic trials come in. For example, clot buster drugs worked very well for strokes in clinical trials. But when doctors started using them in emergency rooms some of the patients weren't ideal candidates. Some of them had other diseases or were on other drugs. Sometimes doctors were district about treatment guidelines. There were delays in treatment and other factors interfered. In actual practice the drugs were less effective in the clinical trials and they were more likely to cause complications like bleeding. The pragmatic studies showed that in real world settings the benefit to harm ratio was much lower than in the clinical trials. So pragmatic studies give us valuable information and real world comparative effectiveness studies can help determine treatment policies. But pragmatic studies are not designed to prove that a treatment works. They were meant to be done on treatments that had already been proven to work. Things start to go wrong when you do a pragmatic study on an improbable treatment. For instance, you might compare acupuncture to conventional treatment for back pain. Acupuncture may appear to outperform conventional treatment in pragmatic studies because acupuncture is a theatrical placebo and patients are suggestible. But that doesn't mean acupuncture works. If the same theatrics could be added to the conventional treatment it would appear to work even better. Cam advocates love doing pragmatic studies because it makes their ineffective treatments look better than they are and it allows them to skip the essential step of proving that their treatment works better than placebo. I have a rule, the skeptox rule. Before you accept any claim make sure that you understand who disagrees with it and why. When you understand the arguments on each side it's usually obvious which side makes more sense. If Jenny McCarthy had followed that rule when she first heard the myth that vaccines cause autism she might have found solid refutations of all the anti-vaccine arguments and she might not have been so easily misled. You can ask whether other studies have been done on the same subject and whether they confirmed or contradicted the results of this study. You can check PubMed to find out. And you can check the science blogs to see if anyone has ever written about it and found serious flaws in the study. Steven Novella has pointed out that there's a double standard. Science-based medicine and CAM have different thresholds for establishing proof. Science-based medicine requires scientifically plausible interventions and rigorous trials showing replicable, statistically significant and clinically significant benefits. CAM puts the threshold of proof much lower. It softens standards and makes excuses. It often suggests that randomized control trials are inappropriate for CAM. It uses pragmatic studies as if they were efficacy trials. It interprets placebo effects as if they were proof of efficacy. It has a much more flexible concept of evidence. The last question and an important one is whether you might be tempted to accept or reject the study because of your own prior bias. Did you have an opinion about the subject or about the researcher before you read the study? You should examine your own thinking as carefully as you examine the study. You can learn a lot from bad examples. It might be useful to go through a lot of junk science. This was a study about homeopathy. It looked at children who had a cough from a viral URI, upper respiratory infection. In other words, a common cold. It had asked whether adding antibiotics to a homeopathic cough remedy would improve the resolution of the cough in children with URIs. Every published study starts with an introduction, covering the background of the subject and describing previous research in the area to demonstrate the reason for the new study. In this study, the background research for the homeopathic remedy consisted of only one study in adults showing that the homeopathic remedy worked, a study that had never been replicated. For antibiotics, they cited numerous studies showing that antibiotics don't work for cough in URIs. There had never been a single study showing that antibiotics did work to treat a cough. They were assuming, on the basis of a single, unreplicated study, that since the homeopathic remedy worked for adults, it would work the same way for children. And that's an unjustified assumption. And they knew the evidence showed that there was no rationale for using antibiotics. So why on earth would they use them in this study? All the children in the study got the homeopathic cough remedy, and half of them also got an antibiotic. Here is what was in the homeopathic cough syrup. There were 10 ingredients at all. According to homeopathic philosophy, you ought to treat a cough with a dilute solution of something that caused healthy people to cough. None of these ingredients would qualify. None of these ingredients has been shown to be effective for cough. The Natural Medicine's Comprehensive Database says there is insufficient information to assess their effectiveness, and it rates some of them as likely unsafe. Bryonia is rated as likely unsafe, as few as 15 berries can be fatal in children. Ipacac is one of the ingredients it's used to make people vomit after they've swallowed a poison. Some of these ingredients are not even real ingredients, but they're proprietary mixtures sold by the Boyron Company that were already in a dilute form. A solution of 3C equals one part per million. A solution of 6C means one part per trillion. Drosera is provided as a mother tincture rather than as a homeopathic dilution, so it's strong enough that it might possibly have a direct effect. It has been claimed to work for coughs, but the Natural Medicine's Comprehensive Database says there's insufficient reliable information to assess its effectiveness, and it rates it as only possibly safe. How do you suppose they came up with this particular combination of 10 remedies? I can't see any rationale for mixing these particular ingredients. Even if they had proved it worked, they wouldn't know whether one of the ingredients did the job or whether all 10 were really required to get the effect. So what did the study find? After a week, half of the subjects in each group were symptom-free. That's no surprise. Most cold symptoms resolve in a week or so without any treatment. They didn't compare children treated with homeopathic syrup to children who got no treatment, but we can assume that the untreated children improved just as much. There was no difference between those who got cough syrup and those who also got antibiotics. And that's also no surprise since we already knew that antibiotics don't make a difference in viral infections. Their conclusion? The homeopathic treatment worked and adding antibiotics didn't help. Their data don't support their conclusion. We can't tell from this study whether the homeopathic remedy worked. We have no way of knowing whether the subjects who got the homeopathic remedy did any better than they would have if they'd been given a placebo or if they'd been left untreated. Without treatment, they might have improved just as much from the natural course of the disease. And we already know that antibiotics don't work. They call the homeopathic treatment symptomatic treatment, which is pretty ironic because CAM claims to treat certain causes rather than the symptoms. Where was the study published? In an obscure journal. Who paid for it? It was financed by Boiron, the company that manufactured the homeopathic remedies. No meaningful information could be learned from this study, and it was unethical. That study violated the principles of ethical research on human subjects. Ethics require that a study have scientific validity. It should be designed to answer a meaningful question, and it should be done by good scientific methods. Ethics require protection of subjects to prevent harm from treatments already known to be ineffective. Like antibiotics for URIs. The potential benefits to subjects should be greater than risks. In this case, there was no reason to expect benefits from antibiotics, and we know that antibiotics can cause side effects. And there should have been proper informed consent. What do you suppose those subjects were told about antibiotics? This study was a travesty. Nothing meaningful could be learned from it, and the subjects were exposed to risk. It should never have been done. Peer reviewers should have spotted the flaws. But in this case, the peer reviewers were probably other homeopaths who were happy to believe anything and no reputable journal editor should have accepted this for publication. According to an article in the Journal of the American Medical Association, there seems to be no study too fragmented, no hypothesis too trivial, no literature citation too biased or too egotistical, no design too warped, no methodology too bungled, no presentation of results too inaccurate or too obscure and too contradictory, no analysis too self-serving, no argument too circular, no conclusions too trifling or too unjustified, and no grammar and syntax too offensive for a paper to end up in print. A lot of good studies get published, particularly in the most prestigious medical journals like the New England Journal of Medicine. But a lot of really awful studies get published too. In fact, researchers can even pay to get their study published in an online journal when they wouldn't be able to get it published anywhere else. Here's a list of some of the problems with research. Poor quality studies get published. The publish or perish climate in academia encourages scientists to do ill-advised research. Publication bias, negative studies are less likely to be published. Lack of replication, studies are seldom repeated and the replications are not likely to be accepted for publication. Mistakes are missed by peer reviewers who are sloppy or biased. Terrible studies get published and pay to publish journals. There have been examples of big pharma malfeasance with suppression or distortion of data. The scientific community recognizes those problems and is seeking solutions. Here are some of the solutions that have been proposed. Education of researchers Better quality control at journals Publishing replications in negative studies Registering all studies so the negative ones can't be forgotten or swept under the rug and there should be full disclosure and media reporting should be improved. Someone proposed that every research study should carry this label. Warning, taking any action on the basis of this research could result in injury or death. The results described in this study have not been replicated and the long-term effects of this treatment are unknown. Past performance is no guarantee of future results. When subjected to further investigation most published research findings turn out to be false. I hope when you read in the newspaper that eating kumquats prevents heart disease you will remember this warning and won't rush it. All of this is very discouraging but it's not hopeless. So many things can go wrong with research. You can never trust one study in isolation. But scientists eventually evaluate the whole body of evidence, sort out the wheat from the chaff and reach a consensus that we can rely on. Science isn't perfect but no other way of knowing even comes close. As described it as the slow lumbering beast we call science. The behemoth is clumsy, it stumbles and it meanders in its quest but its course is ultimately self-correcting and it inexorably trudges towards its final goal, the truth. Richard Dawkins said science, it works bitches. He didn't invent that saying but it's particularly delightful to hear him say it in his cultured British accent. In the next lecture I'll talk about the behem that often ensues when science meets the media and politics.