Science for Science Reformers

In November 2019, Tal Yarkoni set psychology Twitter ablaze with a fiery preprint, “The Generalizability Crisis” (Yarkoni, 2019). Written with direct, pungent language, the paper fired a direct salvo at the inappropriate breadth of claims in scientific psychology, arguing that the inferential statistics presented in papers are essentially meaningless due to their excessive breadth and the endless combinations of unmeasured confounders that plague psychology studies.

The paper is clear, carefully argued, and persuasive. You should read it. You probably have.

Yet there is something about the paper that bugs me. That feeling wormed its way into the back of my mind until it has become a full-fledged concern. I agree that most verbal claims in scientific articles are often, or even usually, hopelessly misaligned with their instantiations in experiments such that the statistics in papers are practically useless as tests of the broader claim. In a world where claims are not refuted by future researchers, this represents a huge problem. That world characterizes much of psychology.

But the thing that bugs me is not so much the paper’s logic as (what I perceive to be) its theory of how to change scientist behavior. Whether Tal explicitly believes this theory or not, it’s one that I think is fairly common in efforts to reform science — and it’s a theory that I believe to be shared by many failed reform efforts. I will devote the remainder of this blog to addressing this theory and laying out the theory of change that I think is preferable.

A flawed theory of change: The scientist as a logician

The theory of change that I believe underlie’s Tal’s paper is something I will call the “scientist as logician” theory. Here is a somewhat simplified version of this theory:

  • Scientists are truth-seekers
  • Scientists use logic to develop the most efficient way of seeking truth
  • If a reformer uses logic to identify flaws in a scientist’s current truth-seeking process, then, as long as the logic is sound, that scientist will change their practices

Under the “scientist as logician” theory of change, the task of a putative reformer is to develop the most rigorously sound logic as possible about why a new set of practices is better than an old set of practices. The more unassailable this logic, the more likely scientists are to adopt the new practices.

This theory of change is the one implicitly adopted by most academic papers on research methods. The “scientist as logician” theory is why, I think, most methods research focuses on accumulating unassailable evidence about what are the most optimal methods for a given set of problems — if scientists operate as logicians, then stronger evidence will lead to stronger adoption of those optimal practices.

This theory of change is also the one that arguably motivated many early reform efforts in psychology. Jacob Cohen wrote extensively and persuasively on why, based on considerations of statistical power, psychologists ought to use larger sample sizes (Cohen, 1962; Cohen, 1992). David Sears wrote extensively on the dangers of relying on samples of college sophomores for making inferences about humanity (Sears, 1986). But none of their arguments seemed to really have mattered. 

In all these cases, the logic that undergirds the arguments for better practice is nigh unassailable. The lack of adoption of their suggestions reveal stark limitations in the “scientist as logician” theory. The limited influence of methods papers is infamous (Borsboom, 2006) — especially if the paper happens to point out critical flaws in a widely used and popular method (Bullock, Green, & Ha, 2010). Meanwhile, despite the highly persuasive arguments by Jacob Cohen, David Sears, and many other luminaries, statistical power has barely changed (Sedlmeier & Gigerenzer, 1989), nor has the composition of psychology samples (Rad, Martingano, & Ginges, 2018). It seems unlikely that scientists change their behavior purely on logical grounds.

A better theory of change: The scientist as human

I’ll call my alternative to the “scientist as logician” model the “scientist as human” model. A thumbnail sketch of this model is as follows:

  • Scientists are humans
  • Humans have goals (including truth and accuracy)
  • Humans are also embedded in social and political systems
  • Humans are sensitive to social and political imperatives
  • Reformers must attend to both human goals and the social and political imperatives to create lasting changes in human behavior

Under the “scientist as human” model, the goal of the putative reformer is to identify the social and political imperatives that might prevent scientists from engaging in a certain behavior. The reformer then works to align those imperatives with the desired behaviors.

Of course, for a desired behavior to occur, that behavior should be aligned with a person’s goals (though that is not always necessary). Here, however, reformers who want science to be more truthful are in luck: scientists overwhelmingly endorse normative systems that suggest they care about the accuracy of their science (Anderson et al., 2010). This also means, however, that if scientists are behaving in ways that appear irrational or destructive to science, that’s probably not because the scientists just haven’t been exposed to a strong enough logical argument. Rather, the behavior probably has more to do with the constellation of social and political imperatives in which the scientists are embedded.

This view, of the scientist as a member of human systems, is why, I think, the current open science movement has been effective where other efforts have failed. Due to the efforts of institutions like the Center for Open Science, many current reformers have a laser focus on changing the social and political conditions. The goal behind these changes is not to change people’s behavior directly, but to shift institutions to support people who already wish to use better research practices. This goal is a radical departure from the goals of people operating under the “scientist as logician” model.

Taking seriously the human-ness of the scientist

The argument I have made is not new. In fact, the argument is implicit in many of my favorite papers on science reform (e.g., Smaldino & McElreath, 2018). Yet I think many prospective reformers of science would be well-served in thinking through the implications of the “scientist as human” view. 

While logic may help in identifying idealized models of the scientific process, reformers seeking to implement and sustain change must attend to social and political processes. This includes especially those social and political processes that affect career advancement, such as promotion criteria and granting schemes. However, this also includes thinking through the processes that affect how a potential reform will be taken up in the social and political environment, especially whether scientists will have the political ability to take collective action to take up particular reform. In other words, taking seriously scientists as humans means taking seriously the systems in which scientists participate.


  • Anderson, M. S., Ronning, E. A., De Vries, R., & Martinson, B. C. (2010). Extending the Mertonian Norms: Scientists’ Subscription to Norms of Research. The Journal of Higher Education, 81(3), 366–393.
  • Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71(3), 425–440.
  • Bullock, J. G., Green, D. P., & Ha, S. E. (2010). Yes, but what’s the mechanism? (Don’t expect an easy answer). Journal of Personality and Social Psychology, 98(4), 550–558.
  • Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. The Journal of Abnormal and Social Psychology, 65(3), 145–153.
  • Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.
  • Rad, M. S., Martingano, A. J., & Ginges, J. (2018). Toward a psychology of Homo sapiens: Making psychological science more representative of the human population. Proceedings of the National Academy of Sciences, 115(45), 11401–11405.
  • Sears, D. O. (1986). College sophomores in the laboratory: Influences of a narrow data base on social psychology’s view of human nature. Journal of Personality and Social Psychology, 51(3), 515–530.
  • Sedlmeier, P., & Gigerenzer, G. (1992). Do studies of statistical power have an effect on the power of studies? In Methodological issues & strategies in clinical research (pp. 389–406). American Psychological Association.
  • Smaldino, P. E., & McElreath, R. (n.d.). The natural selection of bad science. Royal Society Open Science, 3(9), 160384.
  • Yarkoni, T. (2019). The Generalizability Crisis [Preprint]. PsyArXiv.

Examining whether science self-corrects using citations of replication studies

As scientists, we often hope that science self-corrects. But several researchers have suggested that the self-corrective nature of science is a myth (see e.g., Estes, 2012; Stroebe et al., 2012). If science is self-correcting, we should expect that, when a large replication study finds a result that is different from a smaller original study, the number of citations to the replication study ought to exceed, or at least be similar to, the number of citations to the original study. In this blog post, I examine this question in six “correction” studies in which I’ve been involved.1I did not include any of the ManyLabs replication studies because they were so qualitatively different from the rest. This exercise is intended to provide yet another anecdote to generate a discussion about how we, as a discipline, approach self-correction and is by no means intended as a general conclusion about the field.

Sex differences in distress from infidelity in early adulthood and later life.
In an article in 2004, Shackelford and colleagues (2004) reported that men, compared to women, are more distressed by sexual than emotional fidelity (total N = 446). The idea was that this effect generalize from young adulthood to later adulthood and this was taken as evidence for an evolutionary perspective. In our pre-registered replication studies (total N = 1,952) we did find the effect for people in early adulthood but not for later adulthood. In our replication study we also found that the disappearance of the effect was likely due to sociosexual orientation in the older adults that we sampled (in the Netherlands as opposed to the United States). In other words, the basic original effect seemed present, but the original conclusion was not supported.
How did the studies fare in terms of citations?
Original study (since 2014): 56 citations
Replication study (since 2014): 23 citations
Conclusion: little to no correction done (although perhaps it was not a conclusive non-replication given the ambiguity of the theoretical interpretation)

Does recalling moral behavior change the perception of brightness?
Banerjee, Chatterjee, and Sinha (2012; total N = 114) reported that recalling unethical behavior led participants to see the room as darker and to desire more light-emitting products (e.g., a flashlight) compared to recalling ethical behavior. In our pre-registered replication study (N = 1,178) we did not find the same effects.
How did the studies fare in terms of citations?

Original study (since 2014): 142 citations
Replication study (since 2014): 24 citations
Conclusion: correction clearly failed.

Physical warmth and perceptual focus: A replication of IJzerman and Semin (2009)
This replication is clearly suboptimal, as this was a self-replication. This study was conducted in the midst of the beginning of the replication crisis so we wanted to self-replicate some of our work. In the original study (N = 39), we found that when people are in a warm condition, they focus more on perceptual relationships than individual properties. In a higher-powered replication study (N = 128), we found the same effect (with a slightly different method to better avoid experimenter effects).
How did the studies fare in terms of citations?
Original study (since 2014): 323
Replication study (since 2014): 26
Conclusion: no correction needed (yet; but please someone other than us replicate this and the other studies, as these 2009 studies were all underpowered).

  • Perceptual effects of linguistic category priming
    This was a particularly interesting case as this paper was published after the first author, Diederik Stapel, was caught for data fabrication. All but one of the (12) studies were conducted before he got caught (but we could never publish them due to the nature of the field at the time). In the original (now retracted) article, Stapel and Semin reported that priming abstract linguistic categories (adjectives) led to more global perceptual processing, whereas priming concrete linguistic categories (verbs) led to more local perceptual processing. In our replication, we could not find the same effect.2Note: technically, this study was not a replication, as the original studies were never conducted. After Stapel was caught, the manuscript was originally submitted to the journal that originally published the effects. Their message was then that they would not accept replications. When we pointed out that these were not replication, the manuscript was rejected for the fact that we had found null effects. Thankfully, the times are clearly changing now.
  • How did the studies fare in terms of citations?
  • Original study (since 2015): 12
  • Replication study (since 2015): 3
  • Conclusion: correction failed (although citations slowed down significantly and some of the newer citations were about Stapel’s fraud).

Does distance from the equator predict self-control?
This is somewhat of an outlier in this list, as this is an empirical test of a hypothesis in an theoretical article. The hypothesis of this article that was proposed is that people who live further away from the equator have poorer self-control and the authors suggested that this should be tested via data-driven methods. We were lucky enough to have a dataset (N = 1,537) to test this and took up the authors’ suggestion by using machine learning. In our commentary article, were unable to find the effect (equator distance as a predictor of self-control was just a little bit less important than whether people spoke Serbian).
How did the studies fare in terms of citations?
Original article (since 2015): 57
Empirical test of the hypothesis (since 2015): 3
Conclusion: correction clearly failed (in fact, the original first author published a very similar article in 2018 and cited the article 6 times).

A demonstration of the Collaborative Replication and Education Project: replication attempts of the red-romance effect project
Elliot et al (2010; N = 33) reported a finding that women were more attracted to men when their photograph was presented with a red (vs. grey) border. Via one of my favorite initiatives that I have been involved in, the Collaborative Replications and Education Project, 9 student teams tried to replicate this finding via pre-registered replications and were not able to find the same effect (despite very high quality control and a larger sample with total N = 640).
How did the studies fare in terms of citations?

Original study (since 2019): 17
Replication study (since 2019): 8
Conclusion: correction failed.

Social Value Orientation and attachment
Van Lange et al. (1996; N Study 1 = 573; N Study 2 = 136) reported two findings that people who are more secure in their attachment are also more prone to give more to other (fictitious) people in a coin distribution game. These original studies suffered from some problems: first, the reliabilities of the measurement instruments ranged between alpha = 0.46 and 0.68. Second, the somewhat more reliable scales (at alpha = 0.66 and 0.68) only produced marginal differences in a sample of 573 participants, when controlling for gender and after dropping items from the attachment scale (in addition, there were problems with one of the measure’s translation to Dutch). In our replication study (N = 768) that we conducted with better measurement instruments and in the same country, we did not find the same effects.
How did the studies fare in terms of citations?

Original study (since 2019): 110
Replication study (since 2019): 8
Conclusion: correction clearly failed (this one is perhaps a bit more troubling, as 1) the replication covered 2 out of the 4 studies, and the researchers from
ManyLabs2 were also not able to replicate Study 3. Again, the first author was responsible for some (4) of the citations). 

[EDIT March 8 2020]: Eiko Fried suggested I should plot the citations by year. If you want to download the data and R code, you can download them here.

A couple of observations:

  • 2020, of course, is not yet complete. I thus left it out of the graph as having 2020 in may be a bit misleading.  
  • When plotting per year, it became apparent that 2016 for Banerjee et al. had the “BBS effect” (the article was cited in a target article and received many ghost citations in Google Scholar for the commentaries that were published [but that did not cite the article; the citations for 2016 are thus inaccurate]. This does not take away from the overall conclusion).
  • Overall, there seemed to be no decline in citations.

Overall conclusion
Total for original studies (excluding 1 and 3): 338
Total for replication studies (excluding 1 and 3): 46
For the current set of studies, we clearly fail in correcting our beloved science. I suspect the same is true for other replication studies. I would love to hear more about experiences of other replication authors and I think it is time to generate a discussion how we can change these Questionable Citation Practices.

La société devrait exiger davantage des scientifiques : lettre ouverte à la population française

This blog was written to originally appear in “Le Monde” and so was initially aimed at the French public. However, people from all countries can sign to show their support for the integration of open science into grants and hiring practices. The French version is first, after which the English version follows. If you want to sign the petition, please sign it with the name of the country where you live. If you want to petition your own scientific organizations/governments, then we will share the data of our signers per country upon request (

L’étude scientifique des comportements humains fournit des connaissances pertinentes pour  chaque instant de notre vie. Ces connaissances peuvent être utilisées pour résoudre des problèmes sociétaux urgents et complexes, tels que la dépression, les discriminations et le changement climatique. Avant 2011, beaucoup de scientifiques pensaient que le processus de création des connaissances scientifiques était efficace. Nous étions complètement dans l’erreur. Plus important encore, notre domaine a découvert que même des chercheurs honnêtes pouvaient produire des connaissances non fiables. Il est donc temps d’appliquer ces réflexions à nos pratiques afin de changer radicalement la façon dont la science fonctionne.

En 2011, Daryl Bem, un psychologue reconnu, mit en évidence la capacité de l’être humain à voir dans le futur. La plupart des scientifiques s’accorderaient sur le caractère invraisemblable de ce résultat. En utilisant les critères de preuves partagées par de nombreuses disciplines, Bem trouva des preuves très solides en apparence et répliquées sur 9 expériences avec plus de 1000 participants. Des études ultérieures ont démontré de façon convaincante que l’affirmation de Bem était fausse. Les psychologues, en réalisant des réplications d’études originales dans des dizaines de laboratoires internationaux, ont découvert que cela ne se limite pas à ces résultats invraisemblables. Un membre de notre équipe a mené deux de ces projets, dans lesquels des affirmations sont testées sur plus de 15 000 participants. En rassemblant les résultats de trois de ces projets internationaux, seuls 27 de ces 51 effets préalablement rapportés dans la littérature scientifique ont pu être confirmés (et des problèmes similaires sont maintenant  détectés par des projets de réplication en biologie du cancer) .

Le point de vue des scientifiques (et pas seulement des psychologues) sur la robustesse des preuves scientifiques a drastiquement changé suite à publication de Joe Simmons et de ses collègues démontrant comment il est possible d’utiliser les statistiques pour prouver n’importe quelle idée scientifique, aussi absurde soit-elle. Sans vérification de leur travail et avec des méthodes routinières, les chercheurs peuvent trouver des preuves dans des données qui en réalité n’en contiennent pas. Or, ceci devrait être une préoccupation pour tous, puisque les connaissances des sciences comportementales sont importantes à l’échelle sociétale. 

Mais quels sont les problèmes ? Premièrement, il est difficile de vérifier l’intégrité des données et du matériel utilisé, car ils ne sont pas partagés librement et ouvertement. Lorsque des chercheurs ont demandé les données de 141 articles publiés dans de grandes revues de psychologie, ils ne les ont reçu que dans 27% des cas. De plus, les erreurs étaient plus fréquentes dans les articles dont les données n’étaient pas accessibles. Ensuite, la majorité du temps nous n’avons pas connaissance des échecs scientifiques ni même des hypothèses a priori des chercheurs. Dans la plupart des domaines scientifiques, seuls les succès des chercheurs sont publiés et leurs échecs partent à la poubelle. Imaginez que cela se passe de la même façon avec le sport : si l’Olympique de Marseille ne communiquait que ses victoires et cachait ses défaites, on pourrait penser (à tort) que c’est une excellente équipe. Nous ne tolérons pas cette approche dans le domaine sportif. Pourquoi devrions-nous la tolérer dans le domaine scientifique ?

Depuis la découverte de la fragilité de certains de leurs résultats, les psychologues ont prit les devants pour améliorer les pratiques scientifiques. À titre d’exemple, nous, membres du « Co-Re lab », au LIP/PC2S de l’Université Grenoble Alpes, avons fait de la transparence scientifique un standard. Nous partageons nos données dans les limites fixées par la loi. Afin de minimiser les erreurs statistiques nous réalisons une révision de nos codes. Enfin, nous faisons des pré-enregistrements ou des Registered Report qui permettent de déposer une idée ou d’obtenir une acceptation de publication par les revues avant la collecte des données. Cela assure la publication d’un résultat, même s’il n’est pas considéré comme un « succès ». Ces interventions permettent de réduire drastiquement la probabilité qu’un résultat insensé soit intégré dans la littérature.

Tous les chercheurs ne suivent pas cet exemple. Cela signifie qu’une partie de l’argent des impôts français finance une science dont l’intégrité des preuves qui soutiennent les affirmations ne peut être vérifiée, faute d’être ouvertement partagées. Plus spécifiquement, nous appelons à ce qui suit :

  •  Pour toute proposition de subvention (qu’elle repose sur une recherche exploratoire ou confirmatoire) adressée à tout organisme de financement, exiger un plan de gestion des données.
  • Pour toute proposition de subvention adressée à tout organisme de financement, rendre par défaut accessible ouvertement codes/matériel/données (à moins qu’il n’y ait une raison convaincante pour laquelle cela soit impossible, comme dans le cas de la protection de l’identité des participants)
  • Le gouvernement français devrait réserver des fonds dédiés à des chercheurs pour vérifier l’exactitude et l’intégrité des résultats scientifiques majeurs.
  • Les universités devraient accorder la priorité d’embauche et de promotion aux chercheurs qui rendent leur matériel, données, et codes accessibles ouvertement.

C’est à l’heure actuelle où la France investit dans la science et la recherche qu’il faut agir. Le premier ministre Édouard Philippe a annoncé en 2018 que 57 milliards d’euros seront dédiés à la recherche. Nous sommes certains qu’avec les changements que nous proposons, l’investissement français nous conduira à devenir des leaders mondiaux en sciences sociales. Plus important encore, cela conduira la science française à devenir crédible et surtout, utile socialement. Nous vous appelons à soutenir cette initiative et à devenir signataire pour une science ouverte française. Vous pouvez signer notre pétition ci-dessous. Veuillez signer avec votre nom, votre adresse e-mail et le pays dans lequel vous vivez.


Society should demand more from scientists: Open letter to the (French) public

The science of human behavior can generate knowledge that is relevant to every single moment of our lives. This knowledge can be deployed to address society’s most urgent and difficult problems — up to and including depression, discrimination, and climate change. Before 2011, many of us thought the process we used to create this scientific knowledge was working well. We were dead wrong. Most importantly, our field has discovered that even honest researchers can generate findings that are not reliable. It is therefore time to apply our insights to ourselves to drastically change the way science works. 

In 2011, a famous psychologist, Daryl Bem, used practices then standard for his time to publish evidence that people can literally see the future. Most scientists would agree that this is an implausible result. Bem used the standards of evidence for many sciences available at that time, and found seemingly solid evidence across 9 experiments and over 1,000 participants. Later studies have convincingly demonstrated that Bem’s claim was not true. Psychologists have now discovered that this is not just restricted to those implausible results, as they have conducted studies replicating original studies across dozens of international labs. One of us led two of these projects, in which claims are examined in over 15,000 participants. When we take the evidence of three of such international projects together, we could only confirm 27 out of the 51 effects that were previously reported in the scientific literature (and similar problems have now been detected through replication projects in Cancer Biology). 

Scientists’ — and not only psychologists’ — view of the solidity of their evidence changed quite dramatically when Joe Simmons and his colleagues demonstrated how, as a researcher, you could use statistics to basically prove any nonsensical idea with scientific data. Unchecked, researchers are able to use fairly routine methods to find evidence in datasets where there is none. This should be a concern to anyone, as insights from behavioral science are important society wide. 

So what are some of the problems? One is the difficulty of even checking a study’s data and materials for integrity because these data and materials are not openly and freely shared. Many labs regard data as proprietary. When researchers requested the data from 141 papers published in leading psychology journals, they received the data only 27% of the time. What is more, of papers of which data was not shared, errors were more common. But we also often don’t know people’s failures, nor do we know what their a priori plans were. Within most of the sciences, we only learn about their successes, as researchers publicize their successes and leave their failures to rot on their hard drive. Imagine if we were to do the same for sports: if Olympique Marseille only told us about the games that they won, hiding away games that they lost, we would think — erroneously — that OM has a great team. We do not tolerate this approach in sports. Why should we tolerate it for science? 

Since discovering that their findings are not always robust, psychologists have led the charge in improving scientific practices. For example, we members of the “Co-Re” lab at LIP/PC2S at Université Grenoble Alpes have made transparency in our science a default. We share data to the degree that it is legally permitted. To limit the occurrence of statistical errors we conduct code review prior to submitting to scientific journals. Finally, we do pre-registrations or registered reports, which is a way to deposit an idea or to obtain a publication acceptance by journals before data collection. This ensures the publication of a result, even when this is not considered a “success”. Because of all these interventions the chance of a nonsensical idea entering the literature becomes decidedly smaller. 

Not all researchers follow this example. This means that a lot of tax money (including French tax money) goes to science where the evidence that supports its claims cannot be checked for integrity because it is not openly shared. We strongly believe in the potential of psychological science to improve society. As such, we believe French tax money should go toward science (including psychological science) that has the highest chance of producing useful knowledge — in other words, science that is open.

Specifically, we call for the following:

  • For all grant proposals (whether they are relying on exploratory or confirmatory research) to any funding agency demand a data management plan. 
  • For all grant proposals to any funding agency, make open code/materials/data the default (unless there is a compelling reason that this is impossible, such as in the case of protecting participants’ identity). 
  • The French government should set aside dedicated funding for researchers to check the accuracy and integrity of major scientific findings
  • Universities should prioritize hiring and promoting researchers who make their materials, data, and code openly available 

The time for change is now, because France is investing into science and research. The French prime minister Édouard Philippe announced in 2018 to invest 57 billion into investment and innovation. Importantly, Minister of Higher Education Frédérique Vidal’s has committed to making science open, so that the knowledge we generate is available to the taxpayer. We believe we can maximize this money’s return on investment for society by ensuring that these open principles also apply to the materials, data, and the code generated by this money. Only with our proposed changes, we have the confidence that the French investment will lead us to become world leaders in social science. What’s more important, it will lead (French) science to become credible, and, importantly, socially useful. We call for your action to support this initiative and to become a signature for (French) open science. You can do so below.

Written by Patrick Forscher, Alessandro Sparacio, Rick Klein, Nick Brown, Mae Braud, Adrien Wittman, Olivier Dujols, Shanèle Payart, and Hans IJzerman.

Our department/labo will add a standard open science statement to all its job ads!

The Co-Re Lab is part of the Laboratoire Inter-universitaire de Psychologie Personnalité, Cognition, Changement Social (LIP/PC2S) at Université Grenoble Alpes. In France, “laboratoire” or “labo” (laboratory) is used for what researchers in the Anglo-Saxon world would call “department”. During our labo meeting yesterday one of the agenda points was to vote on the following statement:

« Une bonne connaissance et une volonté de mettre en œuvre des pratiques de science ouverte (au sens par exemple de pre- enregistrement, mise à disposition des données…) sont attendues, une adoption de ces pratiques déjà effective (lorsque le type de recherche le permet) sera en outre très appréciée »

This can be roughly translated as: “A good knowledge and the willingness to put in place open science practices (for example, pre-registration or sharing of data) are expected. It will be highly valued if one has already adopted these practices (when the research permits it).” The statement was adopted by an overwhelming majority. We at the Co-Re lab are thrilled that this statement will be communicated to future job candidates.

Many Labs 4: Failure to Replicate Mortality Salience Effect With and Without Original Author Involvement

December 10th, 2019. Richard Klein, Tilburg University; Christine Vitiello, University of Florida; Kate A. Ratliff, University of Florida. This is a repost from the Center for Open Science’s blog.

We present results from Many Labs 4, which was designed to investigate whether contact with original authors and other experts improved replication rates for a complex psychological paradigm. However, the project is largely uninformative on that point as, instead, we were unable to replicate the effect of mortality salience on worldview defense under any conditions.

Recent efforts to replicate findings in psychology have been disappointing. There is a general concern among many in the field that a large number of these null replications are because the original findings are false positives, the result of misinterpreting random noise in data as a true pattern or effect.

Plot summarizing results from each data collection site. Length of each bar indicates the 95% confidence interval, and thickness is scaled by the number of participants at that lab. Blue shading indicates an In House site, red shading indicates an Author Advised site. Diamonds indicate the aggregated result across that subset of labs. Plots for other exclusion sets are available on the OSF page (

But, failures to replicate are inherently ambiguous and can result from any number of contextual or procedural factors. Aside from the possibility that the original is a false positive, it may instead be the case that some aspect of the original procedure does not generalize to other contexts or populations, or the procedure may have produced an effect at one point in time but those conditions no longer exist. Or, the phenomena may not be sufficiently understood so as to predict when it will and will not occur (the so-called “hidden moderators” explanation).

Another explanation — often made informally — is that replicators simply lack the necessary expertise to conduct the replication properly. Maybe they botch the implementation of the study or miss critical theoretical considerations that, if corrected, would have led to successful replication. The current study was designed to test this question of researcher expertise by comparing results generated from a research protocol developed in consultation with the original authors to results generated from research protocols designed by replicators with little or no particular expertise in the specific research area. This study is the fourth in our line of “Many Labs” projects, in which we replicate the same findings across many labs around the world to investigate some aspect of replicability.

To look at the effects of original author involvement on replication, we first had to identify a target finding to replicate. Our goal was a finding that was likely to be generally replicable, but that might have substantial variation in replicability due to procedural details (e.g. a finding with strong support but that is thought to require “tricks of the trade” that non-experts might not know about). Most importantly, we had to find key authors or known experts who were willing to help us develop the materials. These goals often conflicted with one another.

We ultimately settled on Terror Management Theory (TMT) as a focus for our efforts. TMT broadly states that a major problem for humans is that we are aware of the inevitability of our own death; thus, we have built-in psychological mechanisms to shield us from being preoccupied with this thought. In consultation with those experts most associated with TMT, we chose Study 1 of Greenberg et al. (1994) for replication. The key finding was that, compared to a control group, U.S. participants who reflected on their own death were higher in worldview defense; that is, they reported a greater preference for an essay writer adopting a pro-U.S. argument than an essay writer adopting an anti-U.S. argument.

We recruited 21 labs across the U.S. to participate in the project. A randomly assigned half of these labs were told which study to replicate, but were prohibited from seeking expert advice (“In House” labs). The remaining half of the labs all followed a set procedure based on the original article, and incorporating modifications, advice, and informal tips gleaned from extensive back-and-forth with multiple original authors (“Author Advised” labs).* In all, the labs collected data from 2,200+ participants.

The goal was to compare the results from labs designing their own replication, essentially from scratch using the published method section, with the labs benefitting from expert guidance. One might expect that the latter labs would have a greater likelihood of replicating the mortality salience effect, or would yield larger effect sizes. However, contrary to our expectation, we found no differences between the In House and Author Advised labs because neither group successfully replicated the mortality salience effect. Across confirmatory and exploratory analyses we found little to no support for the effect of mortality salience on worldview defense at all.

In many respects, this was the worst possible outcome — if there is no effect then we can’t really test the metascientific questions about researcher expertise that inspired the project in the first place. Instead, this project ends up being a meaningful datapoint for TMT itself. Despite our best efforts, and a high-powered, multi-lab investigation, we were unable to demonstrate an effect of mortality salience on worldview defense in a highly prototypical TMT design. This does not mean that the effect is not real, but it certainly raises doubts about the robustness of the effect. An ironic possibility is that our methods did not successfully capture the exact fine-grained expertise that we were trying to investigate. However, that itself would be an important finding — ideally, a researcher should be able to replicate a paradigm solely based on information provided in the article or other readily available sources. So, the fact that we were unable to do so despite consulting with original authors and enlisting 21 labs, all of which were highly trained in psychology methods is problematic.

From our perspective, a convincing demonstration of basic mortality salience effects is now necessary to have confidence in this area moving forward. It is indeed possible that mortality salience only influences worldview defense during certain political climates or among catastrophic events (e.g. national terrorist attacks), or other factors explain this failed replication. A robust Registered Report-style study, where outcomes are predicted and analyses are specified in advance, would serve as a critical orienting datapoint to allow these questions to be explored.

Ultimately, because we failed to replicate the mortality salience effect, we cannot speak to whether (or the degree to which) original author involvement improves replication attempts.** Replication is a necessary but messy part of the scientific process, and as psychologists continue replication efforts it remains critical to understand the factors that influence replication success. And, it remains critical to question, and empirically test, our intuitions and assumptions about what might matter.

*At various points we refer to “original authors”. We had extensive communication with several authors of the Greenberg et al., 1994 piece, and others who have published TMT studies. However, that does not mean that all original authors endorsed each of these choices, or still agree with them today. We don’t want to put words in anyone’s mouth, and, indeed, at least one original author expressly indicated that they would not run the study given the timing of the data collection — September 2016 to May 2017, the period leading up to and following the election of Donald Trump as President of the United States. We took steps to address that concern, but none of this means the original authors “endorse” the work.

**Interested readers should also keep an eye out for Many Labs 5 which looks at similar issues. The Co-Re lab was involved in Many Labs 5 as well.

Many Labs 4: Failure to Replicate Mortality Salience Effect With and Without Original Author Involvement

Greenberg, J., Pyszczynski, T., Solomon, S., Simon, L., & Breus, M. (1994). Role of consciousness and accessibility of death-related thoughts in mortality salience effects. Journal of Personality and Social Psychology, 67(4), 627-637.

Older Announcements

Here are some of our older announcements.

03/2021 – Hans IJzerman talked about his recent book ‘Heartwarming’ in the online show Pillowtalk.

03/2021 – Hans published a popular science article on social thermoregulation, in ‘Psychology Today’.

02/2021 – Hans & Patrick published an article on the difficulty experienced by Science to provide practical advice, in ‘The Inquisitive Mind’.

02/2021 – Hans has been interviewed by embrlabs to talk about him, his work, and embrlabs.

02/2021 – Hans & Olivier published an article on social thermoregulation, in ‘The Inquisitive Mind’.

02/2021 – Patrick Forscher is hosting an SPSP free-form Friday titled ‘Encouraging team-based science in psychology‘.

02/2021 – Hans served as a fact-checker for science news in the Dutch newspaper ‘De Volkskrant’.

02/2021 – IJzerman et al.’s Nature Human Behaviour paper was covered in the Dutch newspaper ‘De Volkskrant’.

02/2021 – Hans IJzerman talked to the ‘Next Big Idea Club‘ about his recent book ‘Heartwarming’.

02/2021 – We find the ‘hostile priming effect’ did not replicate across close (n = 2123) and conceptual (n = 2579) replications.

02/2021 – Hans IJzerman has been interviewed on kpcw radio to talk about his recent book ‘Heartwarming’.

12/2020 – Patrick Forscher talked to the bbc on the efficiency of training in reducing unconscious bias.

11/2020 – Patrick & Alessandro taught a workshop on Meta-analysis for the Kurt Lewin Institute.

11/2020 – Our Lab Philosophy and our Research Templates have been awarded with a SIPS Commendation.

11/2020 – Our recent blogpost on African psychology has been awarded a SIPS commendation.

02/2020 – Society should demand more from scientists: We propose ways to improve the methodological quality of scientific work.

01/2020 – After investigators couldn’t repeat key findings, researchers are trying to establish what’s worth saving.

01/2020 – Patrick Forscher consider that this area of research is not yet ready for application in every day live.

10/2019 – Patrick Forscher and Hans IJzerman were involved in a major grant proposal (Synergy) to promote the PSA and team science.

9/2019 – We’ve released the 2019 update to our lab philosophy/workflow. Key changes: Sprints, CRediT contributorship, code hygeine.

5/2019 – We’re organizing an EASP conference and Facebook+AdopteUnMec Hackathon in Annecy, France.

4/2019 – We’re giving a workshop on the Open Science Framework, and a deep dive on power analysis.

12/2018 – 135 labs around the world replicate 28 studies. Find minimal variation in results depending on the sample.

10/2018 – Developing a scale to measure individual differences in social thermoregulation.

9/2018 – We organized a workshop training scientists on doing reproducible, open science. Videos and materials available on the OSF.

9/2018 – International collaboration investigating the link between social environment and core body temperature.

8/2018 – Past experiences, and a current manipulation of physical temperature, affect thinking about our loved ones.