Olivier Dujols and Hans (Rocha) IJzerman gave a joint talk for the virtual International Conference on Environmental Ergonomics on its first birthday. Olivier talked about his STRAEQ-2 project and Hans talked about his book Heartwarming. A handout for the talk can be found here and the video of the talk can be found below. There are a ton of fascinating talk on thermoregulation on vICEE’s website. Go check them out when you have a chance!
I find the Psychological Science Accelerator one of the more exciting initiatives in psychological science. The PSA can potentially solve many known complicated challenges within our discipline. From complex problems pertaining to replicability, generalizability, strategy selection, inferential reproducibility, and computational reproducibility, PSA’s Big Team Science approach has the potential to tackle them all. And yet, despite the many obvious strengths and its promise, I fear for the PSA’s future if the next few years will not be dedicated to the development of an organization that has long-term relevance and sustainability. Below I outline what I think should be the PSA’s main priorities, but please read my final paragraph.
The PSA should become more globally diverse
It is my firm belief that without better representation from across the globe, the PSA has little of value to contribute. We know that a skewed representation in our field is a problem; the Society for Personality and Social Psychology, hands ~80% of its awards to US-based researchers. In the top 5 developmental psychology journals published between 2006 and 2010, less than 3% of the authors came from countries outside North America and Europe. In the PSA’s first published report, most contributing labs were from North America and Europe (e.g., only three African countries participated). Our recent study capacity report (Paris, Forscher, IJzerman, 2020) showed that North Americans were overrepresented in leadership (12/19), while Western Europeans were overrepresented in terms of membership (41.44%). By building better PSA infrastructure across the globe, the PSA will be better positioned to generate theories that more accurately reflect the human psyche.
The PSA should focus on financial sustainability
Organizations with lofty goals like the PSA’s can run on volunteerism only for so long. In Chris Chartier’s 2020-2022 strategic plan, he estimated that the PSA currently runs on 26,930 hours of volunteer time. Relying on so many volunteer hours, on top of our normal jobs, is unsustainable. What’s more, needing to rely on volunteer hours means that only researchers in more luxurious positions (i.e., those in richer countries, with a lower teaching load, and in situations where they don’t have to worry about their immediate physical safety) can be a candidate for leadership roles.
Without financial compensation of key positions, we run the risk of imploding the PSA. We run the risk of burning out our contributors, we run the risk of mismanaging the research process, we run the risks of running into major interpersonal conflicts, and we run the risk of making major mistakes in the research we conduct. Of particular import is that we also cannot attain the diversity goals from the previous section.
In order to meaningfully contribute to societal questions, financial sustainability should be a second key priority. As Patrick Forscher and I have outlined in a recent blog, various strategies to obtain funding exist, each with different balances of risk versus reward (e.g., grant writing, membership fees, support from industry). The upcoming year should be dedicated to finding an optimal balance of the winning strategies. Which strategies we favor should be discussed through surveys and panel discussions with our membership, so as to not to estrange those we serve.
The PSA should reflect on its research priorities
If I honestly reflect on the PSA’s abilities to solve problems, I am pessimistic. For a crisis like the one we are in at the moment, do we have the ability to develop an equivalent to a vaccine? My answer is probably no. When I reviewed the past research projects of the PSA, I found us conservative in terms of content (albeit not in methodology), only modestly invested into theoretical development, and quite US-focused. I personally have come to favor quantitative exploratory research methods to help develop formal theories. Other PSA members favor problem-focused research. Yet other PSA members focus on qualitative research. Still other PSA members have suggested that the PSA should have regional priorities, rather than research priorities determined by a single proposer from a single country. How can we ensure that the PSA delivers on its promise to help psychology develop formal theories as should be the norm within a mature science? I believe we should engage in a period of reflection to determine these priorities, driven again by a discussion between members through surveys and panel discussions.
Such a period of reflection can let us ask important questions about how the PSA should function, such as:
- Should we have various quotas for different types of research (e.g., exploratory, confirmatory, replication) and/or for research that is currently underserved (e.g., qualitative or domains other than social psychology)?
- Should we focus on regional initiatives (as we could have) or have a centralized system (like we have now)?
- Should our research primarily be problem- as opposed to theory-focused?
These are but a few questions we can ask ourselves. I believe asking these questions is necessary to fulfill the PSA’s problems to solve major societal problems and/or theoretical questions and will ultimately also be determinant for the PSA’s chances for receiving funding.
Why should you vote for me?
I believe you shouldn’t and I endorse Sandy Onie in my stead. When I nominated myself for associate director, I did so with the goal to lift up researchers outside of North America and Europe to become part of PSA leadership. In the days following my nomination, I reflected on what I would do if a capable researcher had been nominated for the same position from outside NA/EU. When I learnt that Sandy decided to run for the position, it wasn’t difficult; as a European researcher with strong North American ties, the relevance behind my self-nomination disappeared. While his expertise in research and outside EU/NA are extremely important for the PSA, I also believe his general skill and his expertise in clinical psychology make him an excellent candidate to further develop the PSA.
I hope that if you had considered voting for me, you will give Sandy your vote.
This blog post was written by Ivan Ropovik and Hans IJzerman. This blogpost is cross-posted at PsyArxiv.
In Spike W. S. Lee and Norbert Schwarz’ recently published BBS target article “Grounded procedures: A proximate mechanism for the psychology of cleansing and other physical actions” (2020), the authors outline proximal mechanisms underlying so-called cleansing effects. In this blog post, we present a rejoinder to their rejoinder, first briefly discussing the portion of the target article we commented on, then discussing the disagreement we voiced in our commentary on their target article, then we discuss Lee and Schwarz’ rejoinder, and we finish with our rejoinder to their rejoinder. Before anything else, we want to voice our appreciation for Lee and Schwarz’ meta-analytic assessment of the literature and their rejoinder to our voiced criticism. These disagreements are vital to identify stronger versus weaker theories in our science. In this rejoinder to their rejoinder, we make explicit the differences between our appraisal of evidence and theirs and briefly discuss why authors cannot make their set of inferences a “moving target”. All in all, it is clear to us that the empirical foundations for cleansing effects, as Lee and Schwarz present them in their BBS target article, are extremely shaky.
Context of the target article we commented on
In their target article and elsewhere, Lee and Schwarz acknowledge that there are more than 200 experiments on cleansing effects yielding more than 500 effects (Lee et al., 2020). Although Lee and Schwarz acknowledge numerous replications of cleansing effects that failed to find an effect, they argued that several successful replications make it difficult to dismiss cleansing effects off hand. In rebuffing our critique, Lee and Schwarz made it appear that our selection criteria were unclear or that we cherry-picked evidence (see their RA.3); because of this, we repeat here the effects we included, which were entirely based on their own presentation.
Because they did not give us access to the data underlying their in-progress meta-analysis (see also Explanation 1 below), we identified all publicly available empirical evidence that Lee and Schwarz verbally presented to rebut replicability concerns. As they were unwilling to share their data that formed the basis of their claims in their target article, it left us in the dark whether their claims were backed up by robust evidence. As a result, we simply took all the studies that they identified as successful replications.1Unfortunately, their conceptual definition of “replication” was vague. For example, when they try to address replicability concerns, they write: “For example, regarding Schnall et al. (2008), one paper (Johnson et al., 2014b) reported direct replications using American samples (as opposed to the original British samples) and found effect sizes (Cohen’s ds) of .009 and -.016 (as opposed to the original .606 and .852). Another paper (J. L. Huang, 2014) reported extended replications of Schnall et al.’s Experiment 1 by moving the setting from lab to online and by adding a measure (Experiment 1) or a manipulation (Experiments 2 & 2a) of participants’ response effort” and “Yet another paper reported a conceptual replication of the original Experiment 3 by having participants first complete 183 ratings of their own conscientiousness and nominating others to rate their personality (Fayard et al., 2009). This paper also reported a conceptual replication of the original Experiment 4 by changing the design from one-factor (wipe vs. no wipe) to 2 (wipe vs. no wipe) x 2 (scent vs. no scent) x 2 (rubbing vs. no rubbing). Relevant effect sizes were .112 and .230, as opposed to the original .887 and .777.” Thus, to rebut replicability concerns they included both conceptual and extended replications. We thus followed their lead by including these in the p-curve. In one specific instance, this vague conceptual definition meant that our selection led to the inclusion of a different effect than theirs. Specifically, for the following sentences, where they claim a replication: “This finding was replicated with a German sample (Marotta & Bohner, 2013).A conceptual replication with an American sample showed the same pattern and also found that it was moderated by individual differences (De Los Reyes et al., 2012)”, we picked the effect that reported a replication with a US sample, but they include the p-value for a different, moderation effect. We are unsure why they would prefer the moderation by individual differences over the closer replication.
This body of evidence on the replicability of publicly available cleansing effects, selected by Lee and Schwarz, was therefore the focus of our inference as we also stated in our commentary.
Disagreement voiced in our commentary
In our commentary, we thus examined the empirical evidence behind the replication studies that Lee and Schwarz cite as evidence for their claims. Based on the assessment of the evidential value using the p-curve technique (Simonsohn et al., 2014), as well as a data simulation, we concluded the following: based on the evidence Lee and Schwarz lay out in the target article there is a lack of robust evidence for the replicability of cleansing effects and the pattern of data underlying the successful replications of cleansing effects is improbable and most consistent with selective reporting.
The p-curve that we generated based on their own focus on rebutting replicability concerns looks like this:
Lee and Schwarz’ Rejoinder
Lee and Schwarz wrote a rejoinder to our commentary as well as to the other commentaries. We invite you to read their well-crafted response in full (rejoinder, supplementary material). They had a couple of points of criticisms on our approach2Lee and Schwarz identified a mistake on our part concerning Camerer et al’s (2018) failed replication and mentioned it in Footnote 2 in their Response Supplement. We included them as independent, which they should not have been. However, as the results were non-significant, they did not end up in any of our analyses anyways. Our dataset also incorrectly contained a note saying that Arbesfeld et al. (2014) and Besman et al. (2013) did not disclose the use of a one-tailed test. We are sorry about both of those slips and thank Lee and Schwarz for pointing them out. Both of these mistakes leave the p-curve identical.:
- “[Ropovik et al.] draw [their] conclusion [that there is lack of robust evidence for the replicability of cleansing effects] on the basis of a p-curve analysis of a small subset of the entire body of experimental research on the psychological consequences and antecedents of physical cleansing (namely, seven out of several hundred effects)
- “…, which included only some of the replication studies and excluded all of the original studies.”
- “The procedures they applied to the selected studies did not follow core steps of best practice recommendations (Simonsohn, Nelson, & Simmons, 2014b, 2015).”
- “[They] included p-values that should be excluded”
- “[They] excluded p-values that should be included.”
As they suggest we made mistakes that ostensibly completely invalidates our conclusion, they conducted a new curve analysis, which looks like this:
These two p-curves demonstrate the completely opposite pattern. While our p-curve shows evidence of selective reporting, theirs shows evidence of evidential value. How can this be?
Rejoinder to Lee and Schwarz’ Rejoinder
Their critique 3 is easily rebuffed, as we simply took what they saw as replications of cleansing effects (and we did provide a disclosure table, see also Explanation 2). Beyond that, we thought it might be helpful to make the assumptions and interpretational consequences of both of our approaches explicit and list the resulting changes to the p-curve data needed to get from our p-curve to theirs p-curve two to address their concerns 1, 2, 4, and 5. We will also articulate why we see our approach as a more adequate way to appraise the merits of a set of published claims – one that is also much more in concord with the substantive inferences drawn in the original studies and the target article by Lee and Schwarz. Here is a summary of the changes to our data what Lee and Schwarz did to arrive at their p-curve from the target article to their rejoinder, which led to the more favorable p-curve:
- In their original article, they describe three effects as successful replications: “[this effect was] successfully replicated in two other direct replications (Arbesfeld et al., 2014; Besman et al., 2013)” and “This finding was replicated with a German sample (Marotta & Bohner, 2013)”. Marotta and Bohner (2013) reported a significant effect (at p = .05), which was treated as a successful replication by both the original authors and by Lee and Schwarz in their target article. Yet, for the rejoinder Lee and Schwarz recomputed the p-value and treated it as no longer significant. This is problematic for two reasons. First, they changed the interpretation from target article to rejoinder. Second, based on the available information, it is unclear whether the p-value was truly above or below .05 (as these studies are not published as full papers, there is very little information about the design and analysis). Similarly so, Arbesfeld et al. (2014) and Besman et al. (2013) each formulated one-tailed hypotheses themselves, which Lee and Schwarz inappropriately transformed into two-tailed hypotheses.
- Another mistake they made is that they selected a different effect than what was appropriate for the target of inference – evidence on the replicability of cleansing effects. In their p-curve, they selected a three-way interaction instead of what was the replication effect from De Los Reyes et al. (2012). For the three-way interaction (which included an addition of individual differences, which was not part of the original concept of the cleansing effect) a p-value of .021 was reported; for what they cite in their target article as a replication, the replicated two-way interaction had a p-value of .048.
- They included three p-values from a single, publicly unavailable conference poster (it was linked in the reference list, but produced an error when visiting the link). As it was not publicly available and Lee and Schwarz were unwilling to share their data, it did not form part of our inference as we clearly stated in our commentary. Nevertheless, after examining their dataset, it shows a N = 10 per cell, all yielding significant and small p-values with extraordinarily – and unbelievably – large effect sizes, equivalent to d = 1.55, 1.49, and 1.84. Furthermore, for their rejoinder, they decided to add Experiment 1 to their p-curve, while in the target article they only considered Experiment 2 as a conceptual replication.
For the full list of L&S’ changes to our p-curve set, see Table 1.
Table 1. L&S’ changes to our p-curve data
Note. Grey color = effects presented by Lee and Schwarz’ target article as successfully replicated and used in our p-curve analysis. Orange color = changes to our p-curve set by L&S. Green color = effects added by Lee and Schwarz. P-values in bold represent the effect set used by L&S. Italicized p-values were common to both analyses.
But let’s try to accept their transformations from target article to rejoinder. If, per Lee and Schwarz, there are two published articles and one conference poster, yielding a mere 7 effects evidencing a successful replication with a median N = 12 per cell3The median N for the non-significant replication effects happens to be 12 times higher, N = 144., we honestly don’t see why L&S denote our argument (“lack of robust evidence for the replicability of cleansing effects”) as being “strong” or even controversial. In fact, even if their p-curve demonstrates an effect, such extremely modest sample sizes with incredulously large effect sizes in only a few studies that they describe in a target article should prompt anyone to investigate more carefully and question the efficacy of the p-curve under such conditions. The meta-analytic techniques we suggested in our commentary allow them to do just that.4In our commentary, we critiqued the analytical methods they describe in their target articles, as “both their bias tackling workhorses, fail-safe N and trim-and-fill, are known to rest on untenable assumptions and are long considered outdated”. We reasoned that the authors should instead apply state-of-the-art correction methods like the regression-based (Stanley & Doucouliagos, 2014) and especially the multiple-parameter selection models (e.g., McShane et al., 2016) by default to examine their claims. Such methods can help detect extremely shaky evidence, such as the case for a mere 7 effects with a median N = 12 per cell.
All in all, we have clearly shown here again why there is a lack of evidence for the replicability of cleansing effects based on the evidence Lee and Schwarz present. We again show why the successful replications are in fact consistent with the non-successful replications. The answer to our challenge of the data is not to apply an analysis approach that changes the inference criteria they themselves set forth. Instead, close, pre-registered replications will provide a better answer.
Finally, we wanted to comment on these constantly changing inference criteria. Lee and Schwarz likely did not consider the discrepancies between their article and their re-computation, as well as the addition of a study, important enough to warrant a mention in their response. So, the interpretation from the target article that these effects are evidence in favor of the replicability of cleansing effects remains unchallenged and may continue to misguide readers. What this process illustrates is one of the symptoms of an all-too-common problem – a loose derivation chain from theoretical premises, to statistical instantiation of these premises, to substantive inferences. In such instances, the same evidence can be used as a rhetorical device to support exactly the opposite stances. Such moving targets create weak theories, and rebuff solid critiques of one’s work. In the end, it all comes down to what one considers adequate empirical evidence for a scientific claim.
As an additional response to their first critique (“[Ropovik et al.] draw [their] conclusion [that there is lack of robust evidence for the replicability of cleansing effects] on the basis of a p-curve analysis of a small subset of the entire body of experimental research on the psychological consequences and antecedents of physical cleansing (namely, seven out of several hundred effects)”), we wanted to complement our critique by discussing the history of our conversation with Lee and Schwarz.
After reading their target article and before writing our commentary, we asked Lee and Schwarz to share the data underlying their recent meta-analysis of which the conclusions they incorporated in the target article. We strongly believed that their evidence, based on what they described, was not as strong as they claimed it to be. As the meta-analysis was one of the core components of their target article, we deemed independent verification to be of crucial importance. They declined our invitation for independent verification as “the meta-analytic review was still being written up and any quantitative presentation of its results would prevent them from submitting the manuscript to Psychological Bulletin”.
We accepted their refusal to share the data. As we believed their bias-tackling workhorses to rest on untenable assumptions and to be outdated5In their rejoinder to our commentary, they indicated that our observation that trim-and-fill and fail-safe N are long considered outdated more reflects our sentiments than the standards of the field, because recent meta-analyses published in Psychological Bulletin still employ these methods. We thought this was a pretty funny way of arguing a point. Perhaps the authors missed this part of our commentary, but Becker (2005), Ferguson and Heene (2012), and Stanley and Doucouliagos (2014) have clearly shown these methods to be outdated and we prefer to rely on science over engaging in argumentum ad populum. It is however true, as they state, that psychology sometimes uses outdated methods. For example, while McDonald’s Omega should be used instead of Cronbach’s Alpha in most instances (Dunn et al., 2014; Revelle & Zinbarg, 2009; Sijtsma, 2009), some researchers stubbornly resist from updating their methodology (e.g., Hauser & Schwarz, 2020). Or consider the fact that it has been known for years that sufficiently powering one’s studies is necessary to reduce the chance of obtaining a false positive. Researchers still stubbornly persist in underpowering their research, even years even after the Bem (2011) and Simmons et al. (2011) articles (e.g., Lee & Schwarz, 2014)., we chose instead to assess the evidential value of the replication evidence as Lee and Schwarz present it. There are very few replications of cleansing effects (with only a minority showing success). Being the leading experts in their field, Lee and Schwarz either reported all the successful ones or chose to present a subset that we reasonably thought would be the best ones.
Maybe there are other replication studies with feeble evidence or problematic designs. Maybe there are far more failed replications. We don’t know as we did not receive the data from the authors and we simply analyzed their “qualitative insights” (Communication with original authors, 2020) . On the one hand, that makes it a non-standard way of synthesizing the evidence. But on the other hand, we regard it as a steel-man way to appraise only the merits of the evidence behind the studies that Lee and Schwarz themselves hand-picked as prominent examples of the literature to support the vital auxiliary assumption of their theory – replicability6That said, we fully agree that we drew the conclusion about the evidence for the replicability of cleansing effects based on a small (rather tiny) subset of the relevant literature. We regard it self-evident that if the target of inference is evidence of replicability, original studies are to be excluded. Why didn’t we search for all conducted replications? Because the target of our inference was replication evidence that Lee and Schwarz presented as such and because a sizeable proportion of studies were not part of public record. Apparently there are only a handful of studies that set out to replicate an experiment on cleansing effects, and the only ones that seemed successful were severely underpowered..
Lee and Schwarz claimed that we didn’t follow best practice because we (1) haven’t put together a p-curve disclosure table and because (2) we did not re-compute the p-values that represent the input for the p-curve. The first claim is simply false. Still, this point detracts from the main point of disagreement. The point of a disclosure table is to identify the target effect in a study to ensure that the synthesized effect was the focal effect of the study. In this case, Lee and Schwarz, not us, were the ones who identified the focal effects in their target article. We just followed their lead. For every individual effect, our table clearly identifies the paper and study it comes from, quotes the text string where the effect is reported in the text, effect size, reported p-value, N for the given test, and the author’s inference whether the effect was found or not. We also coded numerous other data about the measurement properties of the dependent measure.
Regarding the second objection, this is a more interesting disagreement. Of course, we understand the importance of re-computing effect sizes or test statistics for any other ordinary evidence synthesis as we have done elsewhere. Refraining from re-computing the focal results of the studies listed by Lee and Schwarz, and taking the reported evidence at face value was, however, a conscious choice. There were two reasons for that. As we explicitly stated, our goal in this very specific analysis was to appraise the merits of a finite set of empirical evidence, as used by Lee and Schwarz to support their proposed theory. There was no goal to infer beyond that finite set or estimating some true underlying effect size. In such a case, it makes most sense to take the relevant evidence as it stands.
First and foremost, the biasing selection process is not guided by re-computed p-values. Second, few practitioners or members of the public re-compute p-values when they read the conclusions of a study and adjust their reading accordingly. So do few colleagues making decisions about what hypotheses to pursue next or creating theories (just like the one concerning “Grounded Procedures”). In their target article, Lee and Schwarz seemed to form no exception (but now reading their rejoinder, we sometimes wonder whether the target article and the rejoinder were written by a different set of persons).
What’s more, a sizable proportion of significant effects that both – Lee and Schwarz (in the target article) as well as replication authors – presented as successful replications, turn non-significant after their re-computation. Lee and Schwarz likely did not consider the discrepancy between their article and their re-computation important enough to warrant a mention in their response. So, the interpretation from the target article that these effects are evidence in favor of the replicability of cleansing effects remains unchallenged and may continue to misguide readers. What this process illustrates is one of the symptoms of an all-too-common problem – loose derivation chain from theoretical premises, to statistical instantiation of these premises, to substantive inferences. In such instances, the same evidence can be used as a rhetorical device to support exactly the opposite stances.
Further, any evidence synthesis requires that there is at least basic information regarding the study design and analytical approach. In this case, half of the successfully replicated effects came from unpublished studies with no full-fledged empirical paper available. Re-computation of p-values would require taking a leap of faith because critical pieces of information were frequently missing. For instance, the replication authors may have not assumed equal group variances in a t-test (like the re-computation assumes) and instead of reporting the df for the Welsch’s test (not a whole number), they just reported N – 2 as df. The analytic sample size may not have been equal to, e.g., df + 2 in a two-sample t-test. The replication authors might have excluded some participants for legitimate reasons.
Lee and Schwarz also assumed zero effect of rounding the test statistics on the p-value by their re-computation. They further presume that a two-tailed value was always the proper statistical translation of the substantive hypothesis. In some instances, it was not clear what exact statistical model they used and whether it was parametric at all. Lastly, as is obvious from the p-curve, our conclusion was not contingent upon the decision to re-compute the p-values. Namely, we would have arrived at the exact same conclusion even if we had re-computed the p-values – lack of robust evidence for the replicability of cleansing effects.
We think that it is just fair that we gave the authors of the replication studies (and Lee and Schwarz) the benefit of the doubt and take the reported results of inferential tests at face value. So, if an exact p-value was available, in agreement with the authors’ inferences at the given alpha level, and in agreement with the substantive inference made by Lee and Schwarz (as leading experts in the field) in their target article, we took it at face value. Just like the integrity of that replication evidence itself.
All the above-discussed culminates in our final explanation – the critique that we included p-values that should be excluded, and excluded p-values that should be included. Despite the apparent eloquence of Lee and Schwarz’ critique, we thought it might be helpful for a reader to see a transparent and more detailed presentation of changes to the set of p-values by L&S.
- Arbesfeld et al. (2014) and Besman et al (2013) both tested a directional hypothesis for which they found support (p = .030 and .039). They claim that the effect replicated. So do Lee and Schwarz. However, by re-computing the p-value, Lee and Schwarz effectively ignored the fact that the replication authors regarded a one-tailed test as a proper instantiation of the substantive hypothesis. As the biasing selection process functions at a different alpha level for directional hypotheses, the application of the selection model should not force an irrelevant publication threshold. In this case, by forcing a two-tailed test, these effects dropped out of the p-curve set, as this method only includes significant effects. Of course, there are sometimes issues with the use of one-tailed tests in general and in p-curve in particular7These include the general bias towards evidential value and different density in the upper part of the p-value distribution under the alternative hypothesis which is, however, irrelevant in this context., and one can discuss how to deal with them. But more importantly, Lee and Schwarz did not find it sufficiently noteworthy to notify the reader about the disconnect between what is claimed in the target article (“successfully replicated”) and the implication of their reanalysis (“these two effects ceased to be successful replications”).
- Marotta and Bohner (2013) is not a part of the public record. The result is publicly reported only in several of the lead author’s (Spike Lee) papers. In Lee and Schwarz (2018) NHB paper, they report this effect as being associated with p = .054. In the present table, the re-computed p-value equaled .0575. However, in their 2018 mini meta-analysis as well as in some other papers (Dong & Lee, 2017; Schwarz & Lee, 2018), it was explicitly stated that the result replicated the original finding. Because it was unclear and the fact that .054 and, e.g., .04999999 is statistically the same effect (Gelman & Stern, 2006), we once again consistently applied the benefit-of-the-doubt principle and regarded it as a significant effect. Namely, it is the substantive inference that practically matters way more than minuscule differences at the 3rd decimal place. Regardless of whether the reader sees this decision as substantiated or not, it is unfortunate that Lee and Schwarz claim successful replication when it suits them.
- For the De los Reyes study (2012), they synthesized the wrong effect (F(1, 44) = 5.77, p = .021; page 5) when in fact, the results of the replication study are reported on p. 4, section “Replicating Lee and Schwarz’s (2010) Clean Slate Effects” where the following (attenuation) interaction effect (F(1,46) = 4.14) should have been selected. The former was the moderation by an individual difference variable, the latter the ostensible replication. This focal replication effect is, however, associated with a much higher p-value of .048.
- The ultimate game-changer was, however, the inclusion of publicly unavailable data from another conference poster (Moscatiello & Nagel, 2014). Again, the target of our inference was publicly available information and we thus did not include this. Nevertheless, let’s look at these experiments in more detail. First, in their target article, they only considered Experiment 2 as a conceptual replication. Thus, Experiment 1 should not have been included. Nevertheless, they included both Experiment 1 and 2 (where it is not even clear whether the samples were independent), which yielded 4 p-values (.0061, .3613, .0036, and .0006).
Given that all of these p-values were based on an N = 10 per cell design, the effect sizes had to be relatively very large for the three significant effects, with an equivalent of d equal to 1.55, 1.49, and 1.84 (we assume between-subjects design). As an additional note, the latter two are the main effect for a focal reversal 2×2 interaction with an effect size that is so large as to be incredulous, d = 1.66 (np2 = .434). We leave it up to the reader how to judge the merits of such study and the probability of observing 3 such uncommonly large effect sizes using N = 10 per cell in this research domain.
Before publishing our blog post, we gave Lee and Schwarz 1.5 weeks to address our concerns. After posting, they published a response on PsyArxiv (available here and in the comments below). We think that at this point, the reader has sufficient information to judge the replicability of cleansing effects and we will write no further reply. We only note that Lee and Schwarz have now concluded twice that they don’t consider Arbesfeld et al. (2014), Besman et al. (2013), and Marotta and Bohner (2013) as significant, while they considered them successful replications in their BBS target article. We think that at the very least this warrants a correction of their BBS article, as Lee and Schwarz no longer consider them successful replications.
This is a cross post from the Inquisitive Mind blog
One of the things we will miss possibly the most this pandemic winter in the Northern Hemisphere is gezelligheid [ɣəˈzɛləxɛit]. No real English equivalent of gezelligheid exists; the closest word in the English vernacular – coziness – still doesn’t capture the same feeling of intimacy and belonging. What does communicate a similar sentiment and is more familiar to US ears is the Danish concept hygge [hʊɡə] and the Swedish concept of lagom. Wikipedia describes gezelligheid as “’conviviality’, ‘coziness’, ‘fun’” or “just the general togetherness that gives people a warm feeling”. Perhaps the best description of gezelligheid is the sense of belonging when sitting around a warm fire during a cold Christmas. But gezelligheid undeniably extends beyond the nuclear family; when you are outside in the shops where strangers gather and you experience that conviviality so typical of the current time of the year it is also “gezellig”. None of this will be something we will encounter much this winter.
Gezelligheid, as it may be, is one way to socially cope with a cold and dark winter. Regulation of temperature is a driving force behind feeling close, being intimate, and feeling loved. Across the animal kingdom, the regulation of temperature is crucial to survival. Not being able to regulate one’s temperature leads to certain death. When temperatures drop, animals (including humans) use more energy to warm themselves up. The cost of temperature regulation is often countered by distributing it across kin, a phenomenon called social thermoregulation. Take penguins, for instance. When dealing with the harsh winters of Antarctica, they get together and “huddle” in a circle. Even if the ambient temperature is -40 degrees the temperature at the center of the huddle can rise to a whopping 99.5 degrees.
But penguins are not humans and humans are not penguins. Humans deal with fluctuating temperatures in modern times in many more sophisticated ways than penguins (i.e., relying on central heaters or warm clothes). But this has not been true for most of human history, so the imprint of what we call social thermoregulation has definitely left its mark on human culture: Different languages, for instance, have a different number of temperature terms. Some languages distinguish between cold, lukewarm, and warm; others only have two temperature terms, like cold and warm. And languages differ whether they use metaphors where they combine warmth and affection. In Dutch, for example, you can refer to someone that you are not particularly fond of as “Zij laat me koud” (literally “she leaves me cold”), while in English one can say that you have a fond and warm memory of someone. Out of 84 languages the linguist Masha Koptjesvkaja-Tamm sampled in her research, 52 use metaphors combining warmth and affection – the others do not.
The imprint of social thermoregulation is not limited to language. A team of researchers for example found that people who live in more “clement” climates (climates with an average temperature close to 71.6 °F) score higher on personality factors related to socialization and stability as well as on personality traits related to personal growth and plasticity. In one set of our own studies, we observed to what extent people would like to call or email a loved one and relate this to outside temperatures. We repeatedly find that if temperatures are lower, people have a higher desire to call or email other people. In other studies, when we experimentally manipulated temperature to be lower or higher, in lower temperature conditions people tend to think more about people they feel close to. Perhaps one of the effects that represents the Dutch concept of gezelligheid we detected in a sample of over 1500 people in 12 countries, where we found that one of the best predictors of people’s core body temperature was the diversity of their social network, meaning the more types of social groups they participated in (e.g., a volunteer group, a group at work, a sports team, a family group) the better protected their core body temperatures were if they lived in a colder climate. Gezelligheid, after all, is not just sitting around the warm fire during Christmas, it is also being in a shopping mall and sharing stories with your sports team or your local faith group.
It won’t come as a surprise that as psychologists, we worry. We worry about the lack of gezelligheid this pandemic winter. And we worry that people miss out on gezelligheid because we suspect that social thermoregulation is mostly achieved successfully when being physically proximate, which is why Zoom calls likely won’t cut it. And yet, humans have been flexible in adapting to nearly every physical environment in history, which is why there must be ways we can cope, even if it is temporarily. A startup out of MIT has developed a bracelet that allows users to send each other warmth, years back, Sony patented a controller for its playstation for temperature feedback, and Japanese engineers are developing a robot that transmits warmth while hand-holding. We had imagined developing a “relationship simulator” to reduce the lack of intimacy intimate partners may have had while apart during the pandemic.
Yet the research we are engaged in cannot (nor should) rival the research currently being conducted on vaccines, where BioNTech received a 375 EUR Million grant from the German government to solve one the worst crises in recent human history to develop a vaccine against Covid-19. Compared to that, our research is on a shoestring budget. That means that the work we have done does not achieve what researchers in other disciplines would call a high Technology Readiness Level, or, better, we don’t have comparable confidence in our work as one can have for a Phase III trial for the Moderna or Pfizer/BioNTech vaccines. Providing concrete advice on how to resolve the lack of gezelligheid is not psychology’s forte, but having an insight into the mechanisms behind social thermoregulation will make you all the wiser on how to make educated guesses to help you deal with the lack of gezelligheid.
Hans Rocha IJzerman is the author of Heartwarming: How Our Inner Thermostat Made Us Human.
This is a cross post from the Inquisitive Mind blog
During this pandemic winter, many of us will be away from the people we love most. The absence of the physical presence of loved ones deprives us of hugs, physical touch, and feelings of physical and psychological warmth that no amount of Skype or Zoom ever seems to fully replace.
In his forthcoming book, Heartwarming, one of the authors of this editorial (Rocha IJzerman) explores the science of why this is, a field called social thermoregulation. This science points to promising technologies like the Embr Wave, smartphone-controlled warmth-producing bracelet, that seem to provide ways to compensate for the physical and psychological warmth that Skype and Zoom lack. Yet the book stops short of recommending these technologies, or indeed offering any concrete advice at all. Why is this?
One reason stems from the sordid tale of Brian Wansink. A former marketing researcher at Cornell, Wansink excelled at conducting clever-sounding studies on the psychology of eating and packaging them into bit-size, actionable pieces of advice tailor-made for public consumption. He would then pitch this advice in outlets ranging from O Magazine to the Today Show. His work even served as the basis for federal policy in the form of the Smarter Lunchroom Movement. Yet, proper scrutiny revealed a shocking level of sloppiness in Wansink’s research. Although Wansink had spun his research into advice for millions of people and a federal program funded to the tune of $22 million dollars, all these products were built on foundations of sand.
Nor is Wansink’s work the only example. Take the example of research on implicit bias. Defined as a set of fast, relatively automatic mental pairings between social groups and other concepts, the concept of implicit bias was popularized largely due to the efforts of Anthony Greenwald and Mahzarin Banaji. These two researchers developed a test, the Implicit Association Test, and demonstrated it in a press conference in 1998 in which they claimed to have data demonstrating that 90 to 95 percent of people harbor “the unconscious roots of prejudice. The test became an immediate sensation, receiving favorable coverage by the wildly popular science journalist Malcolm Gladwell, NPR correspondent Shankar Vedantam, and, more recently, even then-presidential candidate Hillary Clinton in the rarified halls of a US presidential debate.
Yet a closer look at the evidence on implicit bias reveals a field that is woefully unready for application. The Implicit Association Test is plagued with measurement problems (as are, for that matter, other alternative measures of implicit bias). For example, too often the same person who takes the test multiple times gets a different result, probably because the test is contaminated by large amounts of measurement error. The relationship between people’s score on the IAT and their actual behavior is also quite weak. Finally, a closer look at studies that ought to be most relevant to policy — those that try to change implicit bias — reveals that this field relies very heavily on student samples, rarely assesses whether the changes persist, and contains little to no evidence that changes in implicit bias lead to changes in behavior.
In Heartwarming, Rocha IJzerman wanted to avoid the mistakes of Brian Wansink, Tony Greenwald, and Mahzarin Banaji. We believe psychology can learn from other research fields, like rocket science and drug development, that excel at translating research findings into safe and effective applications. Rocket science uses a framework called the Technology Readiness Levels to evaluate the state of the evidence behind an application and to guide the research program. Drug development uses a series of phased trials to evaluate both the effectiveness and the safety of a novel drug. As we have seen from the accomplishments of these fields — from putting people on the moon to creating a safe and effective Covid-19 vaccine in record-breaking time — this systematic process works.
Heartwarming judges the science of social thermoregulation using standards closer to those of rocket science and drug development rather than those of Brian Wansink. According to those standards, social thermoregulation, though promising, is not yet ready to serve as the basis of advice and other applications. Our unpublished quantitative review suggests the available evidence on social thermoregulation can support the broad conclusion of Heartwarming: there are intriguing links between the regulation of temperature and social relationships that promise to provide unique insights into the roots of human sociality. Yet social thermoregulation research still largely relies on samples of participants that are very peculiar — college students from Europe and the United States. We also know little about the effectiveness of different dosages of warmth for changing psychology. Nor, as would be routine in drug development, have the technologies based on social thermoregulation research been evaluated in household settings for things like unintended side effects.
Thus, while Heartwarming does present science that is promising enough that it could in the future serve as the basis for advice and application, it does so with caveats. The book therefore notes where social thermoregulation research relies too heavily on unusual and non-representative samples, and it also notes where its findings have not been subjected to the type of big-team, large-scale testing that you might get in drug development.
Psychology may yet initiate the reforms necessary to make it robust enough for application. Psychology researchers (ourselves included) have developed our own frameworks to help evaluate just when psychology evidence is ready for application. Moreover, psychology researchers have created organizations and proposed broad-based reforms to make the big, team-style research more commonplace.
In the meantime, beware psychologists bearing overconfident advice. You are bound to be disappointed.
Hans Rocha IJzerman is the author of Heartwarming: How Our Inner Thermostat Made Us Human.
March 11, 2021, Patrick Forscher gave a talk (online) at the Applied Face Cognition lab, led by Meike Ramon and for the Swiss Reproducibility Network. Below first the video of his talk and then the abstract.
Progress in psychology has been frustrated by challenges concerning replicability, generalizability, strategy selection, inferential reproducibility, and computational reproducibility. Although often discussed separately, I argue that these five challenges share a common cause: insufficient investment of resources into the typical psychology study. I further suggest that big team science can help address these challenges by allowing researchers to pool their resources to efficiently and drastically increase the amount of resources available for a single study. However, the current incentives, infrastructure, and institutions in academic science have all developed under the assumption that science is conducted by solo Principal Investigators and their dependent trainees. These barriers must be overcome if big team science is to be sustainable. Big team science likely also carries unique risks, such as the potential for big team science institutions to monopolize power, become overly conservative, make mistakes at a grand scale, or fail entirely due to mismanagement and a lack of financial sustainability. I illustrate the promise, barriers, and risks of big team science with the experiences of the Psychological Science Accelerator, a global research network of over 1400 members from 70+ countries.
Patrick S. Forscher
“There’s nothing as practical as a good theory.”
This quote is often attributed to the person hailed as the “father of social psychology”, Kurt Lewin. In social psychology textbooks, the quote is used to justify a mode of social psychology whereby theories that are developed and tested in the lab “naturally” and “inherently” lead to useful social applications (Billig, 2015).
Here I will argue that theory is not always helpful for solving practical social problems. This is because fixating overmuch on theory can lead to what I call theory blindness, a singular focus on the aspects of the problem that are relevant to the theory at the expense of other aspects that are just as important when the problem is considered in its entirety. I review a case study that illustrates the dangers of theory blindness and close with an argument for a focus on the practical problem as it exists in its original context rather than a focus on any particular theory.
Theory application as a model for practical impact
The dominant model of how social psychology comes to have a useful impact on society, at least as portrayed in social psychology textbooks (Billig, 2015), is what I will call the theory application model. In this model, theory development may be somewhat informed by a particular social problem, but theory testing happens largely in tightly controlled environments, such as the laboratory or an online survey platform. The process proceeds something like this:
- The social psychologist develops a theory. The theory can come from many sources: introspection, personal experience, the literature, or perhaps as a response to current events. Although this is rare in social psychology, the theory may even follow a process of formal theory development. These theories may be somewhat informed by a social problem, but not necessarily the problem in its full, original context.
- The social psychologist tests hypotheses derived from the theory in tightly controlled environments. The theory is then used to derive hypotheses that, in social psychology at least, are usually tested using experimental methods, most often in the laboratory or a tightly controlled online platform like Qualtrics. The results of these experiments are used to support the theory, refine it, and add boundary conditions. The usual justification for the high degrees of control in this step is that this control is necessary to isolate and manipulate the psychological processes that are relevant to the theory (Billig, 2015).
- The social psychologist applies the theory to a practical problem as a way to test its generalizability. Once theory-derived hypotheses have been tested in tightly controlled environments, the social psychologist may choose to conduct a study (for example, a field experiment) specifically designed to address some practical social problem. Usually, this exercise is framed as an application of theoretical principles already established in the previous two steps; the implication is that the study cannot directly refute the theory but merely demonstrate how its principles generalize to new, somewhat uncontrolled and messy settings.
Theory highlights some things and de-emphasizes others
The theory application model of practical impact illustrates how theory might, at least in principle, help solve social problems. Psychological theories provide working models of how psychological processes produce behavior. Insofar as individual behaviors “add up” to produce a social problem, psychological theories can highlight the psychological processes that produce those individual behaviors. Theories therefore also provide ready explanations for social problems and, due to their usefulness in identifying causal mechanisms behind individual behaviors, a guidebook for intervention via disrupting those mechanisms.
For example, the appraisal theory of emotion argues that emotions are produced by a person’s interpretations of events – in other words, their appraisals. This theory highlights these appraisals as a cause of emotion. The implication is that, to solve problems related to widespread negative emotions, such as the emotional fallout of a natural disaster, one should look to identify and change people’s interpretations of the natural disaster. This can lead scientists to develop and deploy specialized measures of people’s appraisals of the natural disaster and apply interventions that target those appraisals. Under the spotlight of appraisal theory, aspects of a social problem related to appraisals become valid observations while other aspects of the problem become de-emphasized.
Herein lies the danger of the theory application model of practical impact: if the theory does not fully capture the problem’s original context, it can leave important aspects of the problem outside its theoretical spotlight. For example, the lens of appraisal theory might cause well-meaning scientists who wish to provide mental health assistance in the wake of a natural disaster to neglect the real mental burdens caused by the economic and social fallout of the disaster. Focusing too much on people’s appraisals of a natural disaster as a cause of poor mental health is a demeaning way to help people who have just lost their homes and loved ones. In addition to leading to misguided forms of help, theory blindness can have real material costs if lives depend on the help’s effectiveness. In the most extreme situations, these costs can include lost lives and livelihoods.
The curious blindness caused by theory is part of a phenomenon that philosophers of science call the theory-dependence of observation. Theories define what counts as a valid observation. Theories can lead people to develop entire instruments dedicated to the measurement of the processes posited by theory (see Greenwald, 2012, for examples). In the context of a social problem, everything outside the measurement paradigm becomes “noise” that is outside the theory’s scope. When the theory-derived measures are deployed to solve a social problem, these “extraneous” factors can therefore be interpreted as irrelevant to the target problem.
I am not the first to note the peculiar dangers of theory for application. Daniel Kahneman called this phenomenon “theory-induced blindness” and linked it to the mental heuristics and biases that dominated his decision-making research. Curiously, Kurt Lewin himself may not have endorsed theory as a route to application (see Lewin, 1931) – at least in the way social psychologists currently go about it. (Even curiouser, Lewin was also not the originator of the quote that leads this blog.)
Despite this history of similar critique, I believe the dangers of theory for application are not widely recognized. I therefore illustrate these dangers with a concrete case study.
Case study on Implicit bias
The concept of implicit bias was developed to explain a particular contradiction in the United States: in large, national surveys racial attitudes appeared to consistently improve from the 1960s through the early 1990s (Schuman, Steeh, Bobo, & Krysan, 1997). Yet, despite these improvements, racial disparities remained more stable than many scholars would like (Lee, 2002). These stable patterns in disparities were reflected in the laboratory, where even participants who claimed to value equality seemed to act unfairly when assessed on “unobtrusive” measures of bias (i.e., measures where, from the participant’s perspective, no particular response could be clearly labeled as “prejudice” or “discrimination”; Crosby, Bromley, & Saxe, 1980). Solving the puzzle of why people’s self-reports contradict their laboratory behavior provided an enticing means of making a practical impact.
Dual process models, such as the “prejudice habit model” (Devine, 1989), arose as an explanation for this contradiction. In these models, people’s beliefs had changed throughout the 1960s up until the 1990s, but a fast, relatively automatic mental process had not. When national surveys asked people how they felt about Black people, those surveys measured beliefs. However, the studies of “unobtrusive” bias measured behaviors that were more influenced by the automatic process.
At first, research in the dual process tradition was stymied by the fact that there was no independent measure of the automatic process. This changed with the creation of “implicit measures”, especially the Implicit Association Test. These measures were important for two reasons. First, they gave researchers a tool that purported to quickly, easily, and directly assess the automatic process (which came to be known as “implicit bias”). Second, they created a “palpable experience” (Monteith, Voils, & Ashburn-Nardo, 2005) that, when made widely available through websites like Project Implicit, greatly enhanced the standing of implicit bias in the public imagination.
The result was an explosion of research on implicit bias and a steady increase in the reach of the concepts into the public sphere. This research included observational studies designed to validate the new class of implicit measures and demonstrate the potential consequences of implicit bias (Cunningham, Preacher, & Banaji, 2001). The research also included manipulations designed to investigate the procedures that could change implicit bias (Dasgupta & Greenwald, 2001).
A funny thing happened over the course of this research on implicit bias: the implicit measures, but especially the IAT, became a target of change in their own right. In effect, the IAT became a stand-in for the problems related to social disparities. Thus, the existence of bias on the IAT became a convenient way of representing those disparities, and demonstrating the presence or absence of change on the IAT became a way of demonstrating which procedures might have promise for changing social disparities.
The result was a cottage industry of interventions and trainings all developed around the concept of implicit bias, especially as represented by the IAT. This has come to a head in the public sphere, where large companies like Starbucks and Delta Airlines have rolled out company-wide training programs focused on implicit or unconscious bias. Governmental agencies have also taken an interest; the New York Police Department has instituted its own training, as has the UK civil service (this program was recently scrapped).
Meanwhile, the evidence is muddled at best that implicit bias does indeed cause real-world disparities, and in some settings there is strong evidence of the importance of non-psychological factors. Take racial disparities in who is subject to police misconduct. In many US jurisdictions, firing problematic officers is extremely difficult because weaker disciplinary oversight is one of the main concessions that police unions have extracted through collective bargaining. This concession makes it difficult to remove officers who are acting out of line, with a disproportionate impact on communities of color. The implication is that reforming systems of collective bargaining might be an effective means of reducing police misconduct, especially in communities of color.
This insight comes not from a particular theory, but from a careful examination of the historical and policy environments of US policing. However, an observer who approaches problems in US policing wearing the blinders of dual process theories might miss this important insight, pursuing instead reforms such as mandatory implicit bias training that do not effectively alleviate the suffering caused by police misconduct.
The problem-solving model as an alternative to the theory application model
The police union example illustrates an alternative means of achieving practical impact that is different from theory application model: a focus on a particular, substantive problem as it exists in its original context. The process that I suggest bears some similarities to action research – a research method initially developed by, ironically, none other than Kurt Lewin (Lewin, 1946). My suggested process goes like so:
- Analyze the target setting to develop a definition of the target problem. The first step is developing a careful analysis of the problem setting. This analysis should, at a minimum, draw on a consultation with stakeholders, but it can also draw on historical and policy analysis and pre-existing administrative data (if available). Any and every tool that provides insight into the problem is on the table; this stage may therefore require tools from many disciplines and levels of analysis. At this stage, the problem-to-be-solved may not yet be specifically defined. A major goal of this stage is to integrate the various sources of information to come to a more firm definition.
- Select measures of the problem and define success and failure using those measures. A definition of the problem is little help if there are no systems for assessing progress in solving the problem. This second stage involves defining the problem in terms of a measure that can be readily deployed in the problem setting. In some cases, no pre-existing monitoring system exists; in those cases, this second stage may involve devising and implementing such a monitoring system.
- Define the universe of interventions you can implement. Not all interventions are within the realm of possibility in all target settings; what is feasible will be constrained by pre-existing policy, law, and available resources. This third stage involves surveying the range of possible interventions you can deploy given these practical constraints. The interventions themselves can come from anywhere, whether from psychological theory or from the careful analysis of the target setting developed in the first step.
- Implement one or more interventions with a plan for progress monitoring. The final step is to implement one or more of the interventions identified in the third step with a plan for progress monitoring using the measures identified in the third step.
As in action research, these four steps can form a feedback loop: if the intervention tested in the fourth step is unsuccessful, that can form the basis for a new analysis of the target problem in the first step.
Theories may indeed have a role to play at different stages of the problem-solving model. However, and unlike the case with the theory application model, the orientation of the problem-solving model is on a particular problem in its original setting rather than on a theory and its degree of empirical support. The problem setting therefore serves not just as a venue for demonstrating a particular theory’s generality but as the entire focus of the research process. Moreover, the model embraces the “messiness” of the problem setting’s history, politics, and resource constraints rather than attempting to control the messiness away, as understanding this messiness is critical to developing interventions that work.
Conclusion: Theories are not always helpful for generating practical impact
Theory has been enjoying something of a renaissance in psychology research, and a large list of scholars have put forward persuasive arguments that theory development is critical for advancing knowledge about psychology (Muthukrishna & Henrich, 2019; Fried, 2020; Robinaugh et al., 2020; Navarro, 2021; van Rooij & Baggio, 2021 1; 2). As a response to this resurgence, a small but growing list of scholars have started to identify limits to the usefulness of theory in knowledge advancement (Barrett, 2020; Eronnen & Bringmann, 2021).
Personally, I do not dispute the usefulness of theory – if the primary goal is indeed knowledge advancement. If the primary goal is pragmatic, I worry that processes like theory blindness may interfere with a clear-eyed view of the target social problem. For this reason, I believe that a problem-focused approach is a more effective means of achieving pragmatic goals.
These ideas were conceived at the Relationship Preconference at the 2021 meeting of the Society for Personality and Social Psychology. Thanks to Farid Anvari, Hans IJzerman, Daniël Lakens, Duygu Uygun-Tunç, Peder Isager and Leonid Tiokhin for their helpful comments on previous drafts of this blog.
During our journal club, we discuss a variety of articles. Some of them are focused on specific topics, but many of them are focused on broader methods (for a full list, see the featured image).
During last meeting (January 22, 2021), we discussed the value of computational modeling. We used Wander Jager’s article “Enhancing the Realism of Simulation (EROS): On Implementing and Developing Psychological Theory in Social Simulation” as the base for our discussion.
Through an example of social simulations, the author asked how we can use modeling in psychological research. The idea behind artificial societies, social simulations and multi-agent systems is simple – building a virtual system where many agents with certain characteristics (such as preferences, behaviors, and psychological traits) interact with each other. The difficulty of implementing the idea increases with the complexity of individual agents.
We reflected on these issues and while we agreed that the tool has a great potential there are still many obstacles that need to be overcome before we can use simulated agents in psychological research. For example, because psychological theories are often too vague at the moment, we are still pretty far from reaching “psychological realism” of agents. This is because often ideas are not sufficiently well operationalized and/or measured, they may only apply to single situations (and do not generalize to others), and are often only tested in a subset of the population. There are also problems related to questionable research and/or measurement practices.
Another obstacle we discussed is the fact that psychologists lack sufficient mathematical and computational training to be able to translate theories into a language a computer can understand.
Finally, we agreed that simulation and modeling are necessary steps to take in the future, but adding another tool to the toolkit won’t solve the other problems we are already facing in psychology. There’s still plenty of work to do before we can reliably use artificial societies in psychological research.
We also identified a few additional resources for those who want to expand their knowledge on the subject:
- A short video explaining social simulations: https://www.youtube.com/watch?v=2XvbuEugkIA
- NetLogo, a software for computational modelling: https://ccl.northwestern.edu/netlogo/
- An online course: https://www.coursera.org/learn/model-thinking
- A book on models: https://sites.lsa.umich.edu/scottepage/home/the-model-thinker/
- An article on agent-based model exploring the scientific process: https://royalsocietypublishing.org/doi/10.1098/rsos.160384
- An article on agent-based model exploring competition in science: https://osf.io/preprints/metaarxiv/x4t7q/
- An article on agent-based model used in personality research: https://www.nature.com/articles/s41562-019-0730-3
This is a cross post from the PSA blog
Patrick S. Forscher and Hans IJzerman
Note: This post accompanies three surveys, which are embedded as links in the post’s body. For convenience, you can find direct links to them here.
In our previous post, we argued that the PSA has a grand vision and a budget that cannot easily support it. If the PSA is to fulfill its aspirations, the PSA must increase its funding so that it can support an administrative staff.
In this post, we will assume that the PSA wants to fulfill its grand vision. We will therefore explore ways that the PSA could create the funding streams that are necessary to achieve that vision. We will review four options:
- Grant funding
- Charitable giving
- Membership dues
Most of our energy to date has focused on grant funding and charitable giving. A sustainable funding model will require a mix of all the options so that we can balance their strengths and weaknesses. We can think of the PSA’s funding as a sort of investment portfolio where we devote a percentage of effort to each potential income stream based on the PSA’s financial strategy and risk tolerance.
In the remainder of this post, we will evaluate the strengths and weaknesses of all four funding streams. For each funding stream, we will also review the activities that we have already attempted and what else we can do in that category.
The post is thorough and therefore long. Here are some big-picture takeaways:
- We have already tried, quite vigorously, to secure grant funding and charitable giving, with modest success
- Large-dollar grant funding is high-risk, high-reward and likely cannot be counted on to secure the PSA’s financial future; small-dollar grant funding cannot easily support staff
- Charitable giving mostly yields small amounts of money, though donated research assistants could help provide labor for the PSA
- Fees-for-service offer some potentially interesting funding streams, including conference organization (continuing the success of PSACON2020) and translation
- Membership dues arguably fit the structure of the PSA best out of all funding streams, though we should think carefully about how to implement these without compromising our value of diversity and inclusion
We will argue that the PSA should pursue fees-for-service and membership dues, small-dollar grants, and charitable giving. Large-dollar grants should only be pursued in periods when the PSA can tolerate risky funding strategies. Finally, the PSA will need a bookkeeper and Chief Financial Officer to formalize its finances.
Grant-writing involves writing a specific proposal to a formal agency. The agency judges the proposal on its merits and its fit with the agency’s mission and, if the proposal is sufficiently meritorious, issues an award. This is the conventional way that scientists try to fund their work.
We argue that grant writing is high effort, high risk, potentially high yield, but with high volatility. Grant-writing thus makes for bad planning long-term.
- Grants can be very big. The big granting institutions have a lot of money. For example, one of the grants we wrote in the past three years was for €10 million. This dictum is not always true — many grants are for amounts closer to €1,000 — but when grants pay off big they pay off big.
- Many scientists have grant-writing expertise. Because grants are the traditional means through which science is funded, grant-writing expertise is easy to find among the scientific community. This makes grant proposals easier to get off the ground than many other funding activities.
- Grants usually fund projects, not infrastructures. Funding an infrastructure like the PSA requires an ongoing investment, whereas projects complete within a few years. For this reason, granting agencies view projects as lower-risk funding priorities. Most grant mechanisms are therefore project-focused rather than infrastructure-focused.
- Grant-writing is high risk.At the US National Institutes for Health, the US’s largest funder, the overall success rate for its most common grant is 23%. This means that, if the PSA relies exclusively on grant funding, we ought to expect 3 out of every 4 grant proposals to not yield any money. It is hard to build a staffing plan for an organization under such risky conditions.
- Grant-writing is inefficient. Grant-writing requires so much time and energy with such a low probability of paying off that the expected return may not justify the expected time investment. One computational model suggests that these inefficiencies are inherent to any funding scheme structured as a competition between proposals because scientists waste as much or more effort on the proposals as they would gain from the science that the proposals fund.
- The money is not flexible. The PSA wants to use its money for many things: grants to under-resourced labs, paying staff, funding competitions to make scientific discoveries, and many other things. Grants have strict accounting rules that rule out many or most of these activities.
What have we done already?
- One proposal to the European Research Council’s Synergy mechanism. This proposal was project-based and very large (€10 million). Some of the money would have gone directly to the PSA, but making the PSA funding work within the constraints of ERC accounting rules was challenging. Writing this grant took years of preparation, including one month of very intense effort by five PSA members. The proposal was rejected.
- Two proposals to the US National Science Foundation. Both were somewhat large (~$200K) and may have been able to support PSA staff. Both were rejected.
- Two proposals to the John T Templeton Foundation. Both proposals are project-based and somewhat large (~$200K). One proposal was rejected and one is in limbo due to Covid-19.
- Two proposals to the European Research Council’s Marie Curie mechanism. Both proposals were project-based and would have supported postdocs (~€120K). Although the postdocs would not have worked directly for the PSA, some of the proposed work would have indirectly benefited the PSA. Both proposals were rejected.
- One proposal to Université Grenoble Alpes. This proposal was designed to fund grant-writing for both the PSA and UGA. This proposal was funded and has supported a postdoc at a cost of ~€120K. Although the postdoc officially works for UGA, around half their time is donated to the PSA.
- One proposal to the Association for Psychological Science. This proposal was for $10,000 USD. All the money had to go toward participant recruitment for the PSACR project. This proposal was funded.
- One proposal to Amazon. This proposal was small ($1,352 USD) and strictly dedicated to paying for server hosting for PSACR. This proposal was funded.
- One proposal to the Society for Personality and Social Psychology. This proposal was small ($1,000 USD) and could only be used to support participant recruitment for PSA005. This proposal was funded.
- One proposal to the Society for Improvement of Psychological Science. This proposal was small ($900 USD) and devoted to offsetting the costs of applying for nonprofit status. This proposal was funded.
The above grants translate to around $12,862,252 in funds sought and $163,252 received. This demonstrates the risk and volatility in grant funding.
In addition to the above activities, PSA leadership has submitted a large number of letters of inquiry and less formal communications seeking funding opportunities. These did not reach the stage of formal proposals.
What else can we do?
The PSA has explored most of the available options related to grant funding. Thus far, it has had the most success writing small, focused grants. These can support specific activities, such as participant recruitment, but cannot easily support staff.
Grant-writing for large grants is a high-risk, high-reward activity. Grant-writing for small grants usually requires investment of resources, but also cannot easily fund PSA staff. Grant-writing is also the activity into which the PSA has spent most of its funding energy. The PSA may want to focus its grant-writing activities into smaller, project-focused grants.
Charitable giving involves contacting individuals to solicit money and other resources. Charitable giving is similar to grant-writing in that some entity gives money to the PSA of their own free will. The method differs from grant-writing in that the giver lacks a structured organization that announces funding guidelines and priorities.
We argue that charitable giving is moderate effort, low risk, low to mediumyield, and medium volatility. Charitable giving allows for somewhat better planning long-term than does grant-writing.
- Allows direct outreach to supportive members of the community. The PSA has generated a lot of support and goodwill. Direct outreach allows members of the community to express this goodwill in the form of direct financial support.
- Unconstrained by deadlines. Many other financial streams, such as grants, come on tight deadlines. Solicitations of charitable giving can happen at any time.
- The money is (usually) flexible. Grant funding comes with strict accounting rules. Charitable giving sometimes comes with strings attached, but less often.
- Donation amounts are often small. The cost for staff salary runs in the tens of thousands of dollars. Most gifts run in the tens of dollars.
- The overhead for these gifts is often large. This limitation is linked to the first. In contrast to grants, where a payment from a granting agency only needs to be processed once, in a donation drive, payments need to be processed as many times as you have donors. Sometimes these payment processes need to go through a third party, such as PayPal. This can create substantial overhead in both labor and finances.
What have we done already?
- Set up a Patreon. At the time of writing, this Patreon collects €200 per month. We have used this money to help fund special projects, such as the PSA001 Secondary Analysis Challenge, data collection for PSA001, PSA006, and PSACR, and other miscellaneous expenses.
- Funding drives. We have conducted at least four funding drives in the past three years. These are usually structured around funding a specific project or activity, such as PSACR, and are often announced via the PSA blog. However, we also conducted a funding drive during the first PSA conference, PSACON2020. It is difficult to estimate precisely how much money these drives have generated, but the sum is likely less than $1000.
- Direct outreach. We have also done direct outreach to specific people who we have thought might be willing and able to donate to the PSA. Usually we have relied on personal networks for these direct solicitations. This process is smoother now that the PSA is an official nonprofit and can therefore accept direct tax-deductible donations.
- Speaking fee donations. A few members of PSA leadership have donated fees they’ve received to speak about multi-site research. A typical speaking fee is around $200. This source of income generates money, but is hard to scale to the point where it would support full-time staff.
- Received donations of research assistant time. Once special form of charitable giving is staffing time that is donated from an organization to the PSA. This system allowed the CO-RE Lab, for example, to donate the time of one of its interns to the PSA. This donation enabled, among other things, the creation of the first Study Capacity Report. This system works best when the research assistant or intern can be directly embedded within a PSA supervisory structure.
What else can we do?
The PSA’s new status as a pending nonprofit gives it more options that fit under charitable giving. The option to accept donated time from research assistants is especially attractive — especially if the donation can be formalized in an official agreement between the PSA and the donating organization.
The PSA can also do more direct outreach to members, both through the PSA mailing list and at the time of registration with the member website (an option that bears some similarities to “membership fees”, discussed below).
Charitable giving is highly flexible and has generated some money for the PSA, though often the specific monetary amounts are small. One attractive option for the PSA now that it is a nonprofit is to accept more donations of time from member lab research assistants. This option works best when the agreement is formalized through an official agreement and when the research assistant can be embedded within a PSA supervisory structure.
A fee-for-service venture involves directly selling something, rather as a for-profit company would. Some of the proceeds from sales go toward maintaining the fee-for-service venture; the rest go toward supporting other activities within the organization.
One particularly interesting service the PSA can provide is translation. The PSA has extensive experience translating psychological tasks and measures into multiple languages. Moreover, the PSA has experience implementing the translations in software. Finally, if the PSA can help researchers recruit participants from more countries and cultures, that furthers its mission of ensuring that psychological science is more representative of humanity writ large. We will have more to say on this subject below.
Fees-for-service ventures are difficult to classify in terms of their effort, risk, yield, and volatility; much depends on the specific venture under consideration. Nevertheless, we believe they are underexplored and could usefully complement the PSA’s other efforts.
- Can leverage already-existing strengths of the organization. If an organization must already be skilled at a specific task, developing a business around that skill allows the organization to leverage its existing strengths to generate a sustainable income. For example, the PSA must accomplish many things to conduct a multi-site study, including develop a large, global community of scientists and researchers around a common mission, coordinate agreements across many institutions, develop materials that are appropriate across many cultures, translate those materials into many languages, create and maintain large databases about its members, and disseminate knowledge about multi-site studies. Many of these skills can be leveraged into an income-generating enterprise.
- Builds a sustainable income stream. Once established, a fee-for-service income stream can sustain itself over long periods of time, buffering against the risks associated with other income streams (for example, a grant running out after three years).
- The money is flexible. Because the money that comes from a fee-for-service venture is earned rather than donated by a third party, there is no donor who can attach strings to how the money is used. Thus, this money is very flexible and can be used to support staff salaries and any other activities the PSA desires.
- Startup risks. Startups require up-front investments and there is no guarantee that a market will materialize. Thus, any investment made into a fee-for-service venture may, in the worst case, be lost.
- Mission capture. On the other hand, if a fee-for-service venture is very successful, that venture may capture the mission of the non-profit as a whole. Thus, fee-for-service ventures work best when they are well in line with the primary aims of the non-profit.
- Conflicts of interest. Some fees-for-service ventures may create conflicts of interest for PSA members who work on them, as these ventures may create incentives that conflict with the PSA’s core mission.
What have we done already?
- A PSA conference. This year, the PSA hosted its first conference, PSACON2020. The PSA charged a $60 registration fee with a generous waiver policy — the conference was planned on the assumption that ⅔ of the participants would receive waivers. The conference had 62 paying attendees (plus many more who received waivers) and brought in $3,720. This money went towards supporting the salary of the PSA’s junior administrator (and the main organizer of the conference), Savannah Lewis.
What else can we do?
Fees-for-service are a largely untapped opportunity for the PSA. Conducting multi-site studies is difficult and the PSA has developed a broad array of institutional knowledge to complete these studies. Much of this institutional knowledge can be leveraged into marketable services.
The success of PSACON2020 suggests that we could continue this conference in the future. If attendance remains stable or grows, this could become a small but consistent funding stream for the PSA.
A second interesting option is starting a dedicated PSA journal. This option entails some risk, as projects published there cannot be published in journals that are currently very prestigious (such as Nature Human Behaviour); authors on projects therefore take on some personal risk in the decision to instead publish at the PSA journal. It is also unclear whether university libraries would be willing to pay subscription fees for a PSA journal. On the other hand, the success of society journals suggests that once this revenue source is established, it could be a large and reliable one.
One last interesting option is translation. Translation of psychological measures and tasks is difficult, yet the PSA has lots of experience pulling off this task. PSA member Adeyemi Adetula has proposed a PSA-affiliated translation service with a first focus on Africa. This blog contains two surveys, one for people interested in being paid translators and one for people who want measures and tasks translated. If you are interested in this idea as a means of funding the PSA, head over to the blog, read the proposal, and complete one or both of the surveys.
Fees-for-service involves leveraging the PSA’s strengths to sell a service, generating a new income stream. The PSA has started some limited versions of fee-for-service funding, but this is still a largely untapped opportunity.
In addition to continuing the success of PSACON, the PSA could explore starting a service to provide translations of psychological tasks and measures. Both these activities are in line with the PSA’s core mission — the conference because it builds community and supports members, translation because it provides the translated measures and tasks necessary to recruit participants whose native language is not English.
Membership dues are a special case of fee-for-service funding, but the model fits the structure of the PSA so well that it is worth discussing in its own section. The membership dues model involves charging some amount of money to gain access to some or all of the benefits of PSA membership. This is the model followed by scientific societies, which often gate grant funding and conference participation in exchange for annual dues of, say, $247 (APA) or $237 (APS).
Membership dues also have some overlap with some forms of charitable giving. This is because one way to ask for a donation is at the time of registration through the PSA member website. The major difference between this sort of solicitation and a membership fee is whether the default membership is free or paid. Membership dues do have one major, somewhat hidden, advantage over a solicitation at the time of member registration: due to grant accounting rules, many PSA members who have their own grants cannot spend this money on donations. On the other hand, grant money can typically be spent on a membership fee. Instituting membership dues may therefore unlock grant money that is currently inaccessible to the PSA.
Some implementations of membership dues may conflict with a core value of the PSA, namely diversity and inclusion. If dues are required of all members, they risk pushing out prospective PSA members with less resources. There may be implementations of membership dues (such as a dues structure that allows for fee waivers) that can help mitigate this risk.
We believe that membership dues are low effort, moderate risk, medium yield, and low volatility. They therefore allow for better long-term planning than many other funding streams. However, we also believe that the specific kinds of risks that membership dues incur require feedback from PSA members before we rush into any decisions. We have created a survey to help seek this feedback; we lay out more context for why we see such promise in dues below.
- Many people have pots of money that cannot be given as gifts. Under the Grant-writing section, we noted that grant funds are inflexible. One consequence of this lack of flexibility is that PSA boosters who have grant funds are often not allowed to use them in charitable gifts because the expense cannot be justified to the granting agency. Membership dues provide a way for these people to justify giving money to the PSA to granting agencies, thus unlocking a resource for the PSA that would otherwise go untapped.
- Leverages the PSA’s best resource, its community. The great strength of the PSA is that it has created a large international community (1400 members and counting) that believes in its mission. Membership dues allow the PSA to leverage this strength to better secure its long-term sustainability.
- Dues are linked to PSA member benefits. The PSA provides many benefits to members, including a connection with a large international community of colleagues and collaborators, exposure to cutting edge methods, and access to high-impact that can be added to one’s CV. Membership dues would emphasize that the PSA is only able to provide these things by maintaining a large infrastructure.
- Precedent in scientific societies. The dues funding model is the same one used by scientific societies, so dues-granted membership should be a concept that is familiar to PSA members.
- Wide variety of potential implementations. Dues can be charged on a yearly basis or could be tied to joining a study (the cost of joining would be an administrative fee). The PSA could also consider a variety of waiver policies.
- The money is flexible. The money is earned, so it could go toward supporting everything from staff salaries to ad hoc projects.
- Builds a stable income stream. Once a membership dues program is in place, the income stream from the program should be relatively stable year-to-year.
- Risk of lowering membership. Gating membership by adding a joining fee could lower PSA membership, draining the PSA of its most valuable resource. Some policies could help manage this risk — for example, the membership dues could be suggested rather than required.
- Risk of driving away members with fewer resources. The PSA has a particular interest in increasing its membership in countries that provide less support for the sciences, such as those in the African continent. Imposing membership dues could put joining the PSA out of reach for these prospective members. Once again, policy could help manage this risk — for example, the PSA could provide generous waivers, just as it did for PSACON.
What have we done already?
What else can we do?
Membership dues come with some real risks, so we want to carefully investigate this option before we take any hasty action. We also want to ensure that whatever action we take has the broad support of the PSA membership.
We have prepared a survey to investigate what PSA members think about different models of membership dues. By taking this survey you can help us evaluate the feasibility of using this as an income stream to promote the sustainability of the PSA.
Membership dues is arguably the funding model that best fits the structure of the PSA. However, some implementations of dues may conflict with the PSA value of diversity and inclusion. Membership dues have great potential as a funding stream, but we should think carefully about whether and how to implement them.
We strongly believe that the PSA has the potential to do enormous good for psychological science, and perhaps the social sciences at large. However, fulfilling this potential will require dedicated funding, and to obtain this funding the PSA may have to look beyond its current focus on grant-writing and charitable giving. Of course, the PSA need not abandon these income streams; it can simply balance these activities with efforts in developing funding streams through, say, fees-for-service ventures and membership dues.
If the PSA does develop a more balanced financial strategy, it will need staff to manage this strategy. This probably means appointing a Chief Financial Officer and a bookkeeper. The Center for Open Science, for example, has staff to manage its financial strategy and its books.
The PSA is at a crossroads. We think it has a real opportunity to grow into a more formal and mature organization that carries the big team science torch for psychology at large.
In one of my projects of my PhD, I – Olivier – am working on developing and validating a scale (the Social Thermoregulation, Risk Avoidance, and Eating Questionnaire – 2, or the STRAEQ-2). In the first phase of this project, we involved people from different countries and asked them to generate items. We did so to better represent behaviors from people across the globe and less so just from the EU/US. This item-generation stage involved 152 authors from 115 universities. In this blog post we present this first phase and share the code we used for this first phase.
Is attachment for coping with environmental threats?
Based on the premise that environmental threats shape personality (Buss, 2010; Wei et al., 2017), the goal of this project is to measure individual differences in the way people cope with the environment and to discover if these individual differences are linked to attachment. Attachment theory postulates that people seek proximity with reliable others to meet their needs (Bowlby, 1969), and the psychological literature suggests that distributing threats on others is metabolically more efficient (Beckes & Coan, 2011). In a previous project findings (STRAQ-1) Vergara et al. (2019) showed something consistent with this: individual differences in the way people cope with environmental threats (temperature regulation and risk avoidance) were linked to individual differences in attachment (Vergara et al., 2019). But the reliability of the scale created and validated in this project was somewhat inconsistent across the countries in which they collected data. We therefore decided to extend the STRAQ-1 findings to make it more reliable across countries, now focusing on:
- Fluctuation in temperatures: generating a thermoregulation need (one’s need to maintain internal temperature within a comfortable range in order to survive),
- Physical threats: inducing risks avoidance (one’s need to avoid predators or people who want to do harm in order to survive),
- Lack of food: requiring food intake (one’s need to prevent from starvation).
We are also dividing each dimension into 4 subdimensions to make the scale more consistent with the attachment literature: sensitivity to the need, solitary regulation of the need, social regulation of the need, and confidence that others will help to cope with the need.
What are the behaviors that account for coping with environmental threats?
We first took the step to generate the items and we wanted to avoid the mistakes of the past. Do people in Peru, China, Nigeria, or Sweden deal with temperature the same way? Probably not. So to have scale items that reflect a diverse range of coping behaviors we asked our collaborators; we designed a Qualtrics survey in which they read a description of each construct and we showed them example items.
In total, 737 items were generated by 53 laboratories from 32 countries. To automatize the procedure, and to avoid copy-and-paste errors, we created an R script to import all the generated items from Qualtrics into a Google Document. We created a text document from the Qualtrics file, including the subscale name followed by every text entry (item) for each sub dimension juxtaposing the name of the country that generated the items. Here is the piece of code that we used:
# this code select the ‘temperature sensibility subscale’ (plus their respective countries) in the data frame ‘items’ that is the qualtrics files containing all the items generated, and write the items in a .txt document called ‘Items.txt’: library(tidyverse) items %>% select(thermo_sens, country) %>% write_csv("Items.txt", col_names = TRUE, append = TRUE) # we repeat this for all the subscales, adding them in the same file (Items.txt). For example here we add the solitary thermoregulation subscale: items %>% select(thermo_soli, country) %>% write_csv("Items.txt", col_names = TRUE, append = TRUE)
The lead team imported the list of items (.txt) in an easily shareable document that track changes (google doc) to correct misspelling, reformulate and remove doubles in the list of items (all modifications are available here, and a clean version is here).
How to select relevant and diverse behaviors (items) from a large list?
Because it would be too fatiguing for our participants to answer all of the 737 previously generated items in one sitting (on top of other questionnaires we offer them), we reduced the amount of items to be included in the main survey. We created a diverse advisory committee (including 9 researchers from Chile, Brasil, Morocco, Nigeria, China, the Netherlands, and France); via an online survey this advisory committee rated to what degree they thought the items were representative of their respective constructs.
Then, we computed the mean and standard deviation for each item. We selected the 10 highest means and lowest standard deviation items per subscales, and we replaced closely related items (~5 per subscales) to get a wider range of behaviors to be included in the scale. To facilitate the procedure, we created dynamic tables per subscale in an Rmarkdown document. These tables allow you to arrange per score (mean or sd), to select per country, or to search for specific words in the items. Here is the code that we used to generate the tables:
# from a data frame call ‘df’ that contains all the items and all the subscales we compute the mean and standard deviation of the expert ratings: df <- df %>% mutate (mean = rowMeans(cbind(expert_1, expert_9, na.rm=TRUE), sd = rowSds(cbind(expert_1, expert_9), na.rm=TRUE), mean = round(mean, digits = 2), sd = round(sd, digits = 2) ) # then we tidy the data (that is we switch from a short format to a long format) in order to have one items per rows: df_tidy <- df %>% gather("expert", "rating", -item, -subscale, - mean, -sd, -country_list, -country) # we create a new dataframe including only one subscale (here the sensitivity to temperature) and arrange the data frame having first the 10 highest means and lowest standard deviation: temp_sens <- df %>% filter(grepl('temp_sens', subscale)) %>% #select rows "temp_sens" in the subscale column arrange(sd) %>% arrange(desc(mean)) %>% select(item, mean, sd, country) # and finally we print an interactive table, that display automatically the 10 first rows (this can be change): library(DT) datatable(temp_sens)
We also plotted the distribution of the ratings (general, per experts, and per subscales) of the items to detect if there were some issues with specific subscales. Here is the ggplot code that we used to do that:
# first we create a color palette that will be used for the plot: palette <- c("#85D4E3", "#F4B5BD", "#9C964A", "#C94A7D", "#CDC08C", "#FAD77B", "#7294D4", "#DC863B", "#972D15") # and here we plot the rating of the experts per subscales: expert_plot <- df_tidy %>% ggplot(aes(rating, fill = expert, show.legend = FALSE)) + geom_bar(show.legend = FALSE) + facet_grid(expert ~ subscale, labeller = labeller(subscale = subscale.labs)) + scale_fill_manual(values = palette) expert_plot
We then created a world map of the countries that generated the final list of the 120 STRAEQ-2 items, to observe how diverse our list of items was (we ended up doing some replacements for the final project, to increase the diversity of the scale).
# first import the world.cities database from the ‘package maps' in order to have latitude and longitude of countries around the globe: library(maps) data(world.cities) # then you may want to rename some country in your data frame (here my data frame contains the 120 selected items amd is called ‘items_120’) to match the name of the cities included in the world.cities data frame, in our case we needed to rename three cities: items_120$country[items_120$country == "United Kingdom"] <- "UK" items_120$country[items_120$country == "United States"] <- "USA" items_120$country[items_120$country == "Serbia"] <- "Serbia and Montenegro" # the next step is to merge the desired columns from the world.cities data frame with your data frame by country in order to get the latitude and longitude next to your cities’ names: items_120 <- world.cities %>% filter(capital == 1) %>% select(country = country.etc, lat, lng = long) %>% left_join(items_120, ., by = "country") # we compute the number of items that we have per country, this is needed to vary the size of the dots in the final map: df_country_count <- items_120 %>% group_by(country) %>% summarise(n()) %>% rename(n = "n()") # we create the data frame to generate the map: items_120 <- left_join(items_120, df_country_count) # we create the map (depending on your data you may want to change the radius argument fonction in order to adapt the differences in dot size, and also the value of the fillOpacity argument): m <- leaflet(items_120) %>% addTiles() %>% addCircles( lng = ~lng, lat = ~lat, weight = 1, radius = ~ log(n + 5) * 100000, popup = ~paste(country, ":", item, "(", subscale, ")"), stroke = T, opacity = 1, fill = T, color = "#a500a5", fillOpacity = 0.09 ) # finally we can print the map: m
This is where we arrived in the project so far. We are now at the stage to translate the scales for the project into various languages to collect data for the final project.
Interested in participating in the project?
It is still possible for you to join the project. We will ask you to:
- Submit an IRB application at your site (if necessary). In order to make this step easier for you, we have written an IRB submission pack, which is available on the OSF page of the project,
- Translate the scales that are included in the main project via a forward-translation and back-translation method.
- Administer an online questionnaire to at least 100 participants at your site (more is always possible).
Please note that to account for authorship we are using a tiers author list combined to the CRediT taxonomy. We also expect to publish a data paper from the project that will help for future reuse of the data. We project to collect data from approximately 11 000 participants across the globe (we currently have 118 sites in 48 countries), to validate the STRAEQ-2 scale across countries, to measure individual differences in the way people cope with environmental threats, and to explore the links these differences maintain with individuals differences in attachment.
This blog post was written by Olivier Dujols and Hans IJzerman.
This is a cross post from the PSA blog
Patrick S. Forscher and Hans IJzerman
In 2017, Chris Chartier shared a blog post that revealed a grand vision for psychology research: psychologists could build a “CERN for psychology” that does for psychology what particle accelerators have done for physics. This “CERN for psychology” would be an organization that harnessed, organized, and coordinated the joint efforts of psychology labs throughout the world to take on large, nationally diverse big team science projects.
The months after Chris’s blog went live revealed that enough people believed in this vision to start building this “CERN for psychology”. These early efforts would evolve into the Psychological Science Accelerator, a network that, according to our recent study capacity report, now spans 1400 members from 71 countries. In these early months, the PSA also collaboratively developed a set of five guiding principles, namely diversity and inclusion, decentralized authority, transparency, rigor, and openness to criticism, that form a coherent vision for the type of science we want psychology to be. We want to help transform psychology to become more rigorous, transparent, collaborative, and more representative of humanity writ large.
Now, three years after its founding, the PSA stands at a crossroads. This crossroads relates to our broad vision of what the PSA is and should be and the means through which we achieve that broad vision. This post will cover the first issue. As we will describe, we believe our early documents point to a vision of the PSA as active, rather than passive, but that a lack of funding streams constrains our ability to achieve that mission.
Minimal and maximal visions of the PSA
Although the PSA was established to coordinate the activities of research labs, there are a wide range of options as to how this coordination is implemented. The specific implementations anchor two radically different visions of the PSA: a minimal vision and a maximal vision.
Imagine a PSA that is radically different from the one we have now: the PSA as a mailing list.
This mailing list contains the contact information of people who are willing, in principle, to participate in team science projects. To use the mailing list, people design a team science study and email a description of the study to the list. People who receive a study invitation through the mailing list and freely reply to the study proposer. The mailing list itself is unregulated, so there is no vetting process for any of the emails people send over it, nor is there any support for the people who send invitations through the mailing list. This vision of the PSA is highly minimal in the sense that the PSA plays very little role in coordinating or implementing team science projects. However, this vision is also very low cost, as mailing lists are cheap to set up and almost free to maintain. In a sense, this minimalist version of the PSA already exists in the form of StudySwap — a useful tool, but not a transformative one.
Now imagine a PSA that is a bit more similar to the one we have now: the PSA as an active implementer of big team science projects.
In this vision, the PSA completes all stages of the team science research process. This means that the PSA takes partial ownership over its projects and participates in project decisions. This includes selecting projects to undertake through a rigorous review process, assisting with the theoretical development of studies, and improving on the design of studies by soliciting multiple rounds of feedback from relevant experts. In this vision, the PSA also actively coordinates and manages study teams to ensure that relevant administrative procedures (such as ethics review) are followed and to ensure that the various stages of the project occur on a reasonable schedule. The PSA also takes an active role in communicating completed projects to the world, perhaps by managing its own journal (the Proceedings of the Psychological Science Accelerator) and through its own dedicated press team. Finally, in this vision, the PSA has a variety of procedures to proactively improve its processes, including novel methods research, team science best practice research, project retrospectives, and exit interviews of PSA staffers who decide to leave the organization. This is the deluxe vision of the PSA, a PSA that is active but that requires lots of money and staff to maintain.
These two visions of the PSA — the minimal and maximal — also anchor an entire universe of in-between visions that are not so extreme. However, what is arguably true is that the vision of the PSA laid out in Chris’s blog post, as well as the one implied the PSA’s five guiding principles, are both much closer to the maximal vision of the PSA than the minimal one.
Money is necessary to implement a maximal vision of the PSA
If we accept that fulfilling the PSA’s mission requires something closer to a maximal vision of the PSA, we need to find ways to build the PSA into this more maximal vision. At a minimum, building and maintaining a maximal vision of the PSA requires people who can do the activities involved in this maximal vision. These people need to be recruited, managed, and retained, otherwise they will work at cross purposes, get into interpersonal conflicts, and burn out. In short, in addition to the people who carry out the PSA’s vision, the PSA needs an administrative structure to help these people carry out their work.
The PSA does, in fact, have a defined administrative structure. We have, for example, defined a set of committees to govern its activities and a set of roles that should be filled for each project. These roles outline an aspiration for the PSA to proactively conduct team science research — further suggesting that the PSA has a maximalist vision for itself.
These roles are many. Our recent evaluation of the PSA’s administrative capacity identifies fully 115 of them. If we assume that each position requires 5 hours per week of work to complete the associated responsibilities, the 115 roles require 29,900 hours per year to staff. Unfortunately, maintaining this level of labor has, at times, been challenging because of our reliance on volunteers who have other daily commitments (such as jobs that pay them).
We can rely on volunteers to carry this load for a time, but doing so carries some real risks. For example, the heavy load can risk burning out the most active volunteers who take on the most labor and lead to large, costly mistakes if the labor requirements for large projects are not met. This load can also provoke interpersonal conflict if active volunteers feel that their labor is not properly recognized or credited. Finally, there are risks associated with who is able to be a volunteer: in a volunteer model, only those who can afford to will donate their time and efforts to the PSA. This squeezes out the voices and talents of some of the very people the PSA wants to elevate, such as those from Africa, Middle America, and South America. Our 2020-2021 study capacity report estimates that 90% of all people involved in administrative roles are from North America and Western Europe. The majority of these, 63% of all administrative roles, are located in North America.
Other growing organizations have managed the transition from an exclusively volunteer organization to one that is funded by some sort of income stream. We can learn from their history, which shows that the organizations that became sustainable did so by leveraging their already-existing strengths.
The first step down the path of creating a sustainably funded organization involves acknowledging that many of the positions outlined in PSA policies are best served, not as volunteer positions, but as paid positions. We must also acknowledge that the costs of paying for this labor may be high. If we assume that all 29,900 hours are paid, and paid at even a very low wage ($7.25/hour, or US minimum wage), we still get a labor cost of at least $216,775 per year. This does not consider taxes, vacation, or other overhead.
Of course, it may not be necessary to pay for all 29,900 hours, either because our staffing estimates are inaccurate, or because our labor becomes more efficient when we switch to a paid model. Yet the mere act of thinking through these considerations requires recognizing that the PSA’s maximalist vision has a financial cost.
What we can be is constrained by our ability to obtain resources
The vision of the PSA outlined in its founding documents is grand. If implemented successfully, this vision could have an impact on psychology that is transformative, creating a science that is more inclusive, more collaborative, more transparent, and more robust.
Yet the PSA cannot realize this vision of itself for free. Currently, the PSA attempts to be a maximalist institution on a minimalist budget. That has worked during the PSA’s early years, but such a model may not be sustainable long-term. If we wish to implement a maximal vision for the PSA, we will need to focus dedicated energy into obtaining the funding needed for this implementation. As we will describe in a follow-up post, this will likely require developing funding streams outside of the traditional grant mechanisms to which scientists are accustomed.
Funding Note: Patrick Forscher is paid via a French National Research Agency “Investissements d’avenir” program grant (ANR-15-IDEX-02) at Université Grenoble Alpes awarded to Hans IJzerman.
This is a cross post from the PSA blog.
Patrick S. Forscher, Bastien Paris, and Hans IJzerman
How many resources does the PSA possess? This is a question that affects many activities within the PSA — prime among them the annual decision of how many studies the PSA is able to accept. Here are a few other examples:
- People involved in the selection of studies must decide whether the PSA can feasibly support studies with special requirements, such as a proposal to conduct a multi-site EEG project or a project involving a low-prevalence (and therefore difficult to recruit) population.
- People involved in writing PSA-focused grants must be able to accurately describe the size and scale of the network to make their grant arguments and planning concrete.
- People involved in managing the PSA’s finances need to know the people and projects that have the highest financial need.
- People involved in regional recruitment need to know how many members are currently located in a specific world region and the number of participants those members can muster for a typical PSA study.
In its first three years, we have had to rely on ad hoc sources to answer questions about PSA resources. Today, with the release of the PSA’s first study capacity report, we now have a source that is more systematic. This blog describes the logic that underlies the report, gives some of its top-level findings, and outlines what we plan to do with the report now that we have it.
How to think about and report on PSA resources
The PSA’s most basic activity is running multi-site studies, and one of the most fundamental resource-dependent decisions PSA leadership must make is how many proposals for these multi-site studies the PSA will accept. Thus, a single multi-site study provides a useful yardstick for measuring and thinking about PSA resources.
The PSA’s newly-ratified resource capacity policy takes just such an approach. It considers PSA resources from the perspective of helping PSA leadership decide how many studies they should accept in a given year. From this perspective, the most basic unit of analysis is the study submission slot, a promise by the PSA to take on a new multi-site study. Study submission slots are limited by at least two types of resources:
- Data collection capacity. This is the PSA’s ability to recruit participants for multi-site studies. Data collection capacity is mainly governed by the number of PSA members located in psychology labs throughout the world. However, money can also expand the PSA’s data collection capacity; the PSA has occasionally contracted with panel-provider firms to recruit participants on its behalf.
- Administrative capacity. This is the PSA’s ability to perform the administrative tasks required to support multi-site studies. Administrative capacity is mainly governed by the availability of labor, whether that labor be paid or volunteer.
The resource capacity policy also allows for the possibility of study slots that add on special requirements or evaluation criteria. These special submission slots might require, for example, that any studies submitted for consideration to that slot involve EEG equipment. Alternatively, the slots might require that the submitted studies involve investigating the psychological aspects of the Covid-19 pandemic. We will go into more detail about how we think about these special submission slots in a later post. For the time being, we simply note that assessing our resource capacity will allow us to understand the sorts of special submission slots we can accommodate.
The PSA’s data collection and administrative capacities are both in flux. The PSA’s ability to accommodate more specialized types of studies also fluctuates on a yearly basis. Moreover, the PSA is committed to the cultural and national diversity of psychology research — activities that are dependent on its reach in under-resourced countries. Accurate assessment of all these capacities therefore requires ongoing documentation of its members, member characteristics (including country of origin), and its yearly activities. Currently, our documentation happens in a shared Google Drive, Slack, the PSA’s OSF project and the subprojects for each of its studies, and the recently-created PSA member website.
According to policy, these various sources of documentation are consulted in a comprehensive way to form a complete picture of the PSA’s resources. This consultation results in an annual study capacity report, which can inform decisions and activities involving the PSA’s resources.
Findings from the first study capacity report
The first PSA study capacity report is large and comprehensive. Here are some big-picture findings:
- The PSA currently has 1,400+ members from 71 countries.
- Out of seven studies, six are still underway collecting data.
- Based on our past data collection capacity, we have the ability to recruit a minimum of 20,000 participants over the upcoming scholarly year for new PSA projects.
- Two out of three PSA members come from North America (24%) and Western Europe (41%).
- We do not have sufficient information to accurately estimate the number of administrative hours available for each PSA role.
However, these big-picture findings hide a lot of important detail that may be important for PSA decision-making. For example, here are a few additional tidbits that come out of the report:
- The number of PSA member registrations almost tripled as a result of the COVID-Rapid project.
- At the time of the report’s writing, and excluding PSA007, the PSA will need to recruit 30,000 participants to complete its active roster of projects.
- About 20% of the PSA’s membership have a social psychology focus area.
- About 90% of people in active PSA administrative roles are located in North America (63%) and Western Europe (21%).
If you’re interested in digging into more of these details, you can find the full report here.
As outlined by policy, the main purpose of the report is to inform decisions about how many studies the PSA can accept in the next wave of study submissions. Thus, an important next step for this report is for the upper-level leadership to use the report to come to a decision about study submission slots.
However, the study capacity report has already catalyzed a number of ongoing conversations about what the PSA is, what it should be in the future, and how the PSA should go about meeting its aspirations for itself. Some of these conversations have resulted in their own dedicated blog posts, which will be posted to the PSA blog in the next few days.
In the meantime, we welcome your thoughts about the PSA’s study capacity and issues related to it. We believe that compiling this report has been a useful exercise precisely because the process of compiling the report has inspired so many useful conversations about the PSA’s direction and goals. This reinforces our commitment to maintaining this useful reporting structure in future years.
Funding Note: The study capacity report was made possible via the work of Bastien Paris; his internship at Université Grenoble Alpes is funded by a grant provided by the Psychological Science Accelerator. Patrick S. Forscher is paid via a French National Research Agency “Investissements d’avenir” program grant (ANR-15-IDEX-02) awarded to Hans IJzerman.
Note: We need feedback from potential translators and researchers who need translation services. If this describes you, fill out one of the two surveys below!
Psychological science is dominated by researchers from North America and Europe. The situation in Africa exemplifies this problem. In 2014, just 6 of 450 samples (1.4% of the total) in the journal Psychological Science were African. In Africa, language issues exacerbate the more general problem of underrepresentation; only 130 million out of 1.3 billion Africans are proficient in English, despite 24 out of the 54 countries having English as their official language.
We propose a paid translation service that can help overcome this problem. Our service will translate across many languages, but we will specialize in translations between English and African languages. Such a service can both help local African researchers access English-speaking people as research participants and allow English-speaking researchers to access over one billion Africans (~12% of the world population) as participants.
Furthermore, a paid translation service can help provide resources to translators who may lack resources, such as those in Africa. We expect to draw many of our translators from among the ranks of African researchers whose universities cannot provide them with sufficient salary or research money. Because the Psychological Science Accelerator has both a need for translation and experience with the full translation process, we propose that this paid translation service becomes a formal part of the Psychological Science Accelerator. In the blogpost, we link to surveys for users (typically researchers) and service providers (typically translators) to investigate whether sufficient demand and supply exist for such a service. Because we plan to focus on African languages, the remainder of this post focuses on Africa. However, we invite researchers and translators who focus on other languages to complete the two surveys at the top of the post, as we will also offer general translation services.
Why is there a need for a paid translation service?
A paid translation service can help build a network of translators proficient in African languages and coordinate the translation process. Particularly for the underrepresented and under-resourced African continent, having a paid translation service can help to improve the efficiency of translation, quickly identify and resolve roadblocks to the translation process, and timely delivery. A paid translation service can be likened to a business venture such as what we have as translation service companies and can hold financial, social, and educational benefits.
The Envisioned Partnership with the Psychological Science Accelerator
To ensure the highest-quality content, we envision this paid translation service to be integrated into the Psychological Science Accelerator (PSA) as a Service provider. The PSA is the largest network of researchers in the field of psychological science. At present, the PSA consists of 1021 member researchers in 73 countries. With this huge network of researchers, we envisage a steady demand for the paid translation service. Of course, researchers who are not a member of the PSA should be able to use the service as well. It is unlikely that this initiative can be started without a financial investment. Any business venture requires some risk-taking, and thus, investment, particularly if we want to ensure the involvement of African researchers.
From a business point of view, a paid translation service can be a source of revenue for the PSA. Additional benefits may include network expansion especially to underrepresented populations in Africa thereby promoting diversity and inclusion, creating a pool of experienced translators with which PSA can work, paid (fair) wages to support African researchers, while it can also help improve the generalizability in psychological science. Indeed, a paid translation service can help connect researchers from richer countries (like from Europe and North America) to African researchers and conduct research in currently understudied areas.
The process: users and service providers.
We can think of translation service operations in terms of the supply and demand for translation services. On the demand side are the Users. These are the buyers of the service provided by translators. They could include researchers, research institutes, companies, and any other entity that is interested in accessing translation service. They are the initiators and managers of these research projects and provide the funds for the translation service. At the other end is the supply system, Service providers. These are the organizations that provide the operations for the translation services. This is the main supply unit that manages the networks of translators, provides expertise and logistics, and financing.
What’s in it for users?
Users interested in solving generalizability problems in psychology can rely on the PSA’s expertise to conduct crowdsourced studies. For instance, the PSA has already developed a translation procedure employed for PSACR studies. Conducting research of this magnitude required a translation of research materials, especially when considering reaching a multilingual population like Africa. Africa is really, really large and has many local languages. Only relying on English will probably only reach the more elite and higher educated populations. However, unless all translations are just provided for free, this requires money to ensure fairness between users and service providers. With a paid translation service, researchers will benefit from access to a high-quality service from a network of translators at affordable cost and with timely delivery. As a side benefit, such a translation service can help connect users with local researchers to help with data collection.
To understand the demand for this service, we have created a brief (< 5 minute) survey which can be found here.
What’s in it for service providers?
Despite the translation infrastructure at the disposal of PSA, efforts to translate the PSACR research measures, resulted in only 2 (Yoruba and Arabic) of the thousands of indigenous languages in Africa were translated. With a paid translation service, there are benefits for service providers. First of all, fair wages can be provided to African service providers. Second, for those that are interested in participating beyond translation, one can become co-author of high-quality research projects and receive training in the newest open-science practices and psychology concepts.
How will the translation service work?
The translation service is an integral part of the supply system. These are groups of experts who work directly with the service provider and produce the translated copy of the original materials. This process will require translators, translation coordinators, and implementers. The primary tasks are to translate research materials, recruit and train contacts, and implement translated notes. We also envision that translators can become in data collection if desired by both parties.
Translation will occur in the following steps:
- the user provides an English version (or, if available translators exist, other languages, such as French, German, or Dutch) of a study or measures in .qsf, .pdf, doc, or other file format to the PSA.
- PSA through its internal working mechanism will identify the area of need for translation and transmits the original document to the translation service.
- the translation service translates this file into the target languages, coordinates, and implements the translated survey into the target software.
The translation would be financed by the users/customer. Payment would be made through the PSA to the translation service. Translation services are usually charged per word and can vary between $0.08 and $0.28 USD per word. Aside time spent, number of words to translate, and number of languages to be translated into, translation service that deals with research materials can be costlier. However, to cost these services, we have to consider a number of factors such as validation (i.e., back translation), editing and proofreading, specialty, urgency of the job, scarcity of language translation, language source (e.g., translators from English to other African languages are less scarce than French to any African language), coordination and implementation, and payment to PSA as the service provider.
Call for translators
We are trying to get information about the translation service for translators who are proficient in English and at least one non-English language. We are especially interested in recruiting translators from the African continent, though we welcome the input of translators from other countries as well.This search for information includes costing and availability of translators. We are looking for the opinion of academics, researchers, students, and other interested persons in the fields of the social sciences or linguistics with some experience in translation, as well as professional translators.
We invite people who are interested in translating for payment to fill in our survey here.
This blog post was written by Adeyemi Adetula.
October 2, 2020, Eiko Fried gave a talk via Zoom for our department (laboratoire in French) LIP/PC2s at Université Grenoble Alpes, discussing how lack of theory building and testing impedes progress in the factor and network literature. The abstract of his talk is available here and you can see his talk below!
Today, September 16, 2020, we organized the yearly lab philosophy/workflow hackathon of the CORE Lab. To get all the lab members on one page and to reduce error as much as possible, we have a lab philosophy that is accompanied by various documents to facilitate our workflow. But research standards evolve and we also often notice that the way we conduct our research is not as optimal as we would like it to be. As new tools seem to be created daily, we can also not keep up with new developments without hurting our research. And it is important that those who do research daily provide input in the procedures they work with. That’s why we decided on a meeting once a year, where all lab members can have their say in how we are going to work that academic year. The new product is available here.
This year, the procedure was the following:
- Hans wrote a blog post about the procedure before the hackathon.
- All lab members (Patrick Forscher, Bastien Paris, Adeyemi Adetula, Alessandro Sparacio, and Olivier Dujols) submitted to Hans in a direct message via our Slack what they like, what they don’t like, what they want to add, and what we should do better (in relation to what is already in the lab philosophy).
- Hans gathered all the information, organized them by category (commented on some), and then posted them in a public channel in Slack again.
- All lab members got a chance to discuss each other’s suggestions via Slack.
- Hans created a Google Doc and organized the information by category.
- This morning and after lunch, we discussed points that we were still not entirely in agreement on or that we thought were important enough to discuss in more detail (e.g., should we write a section on COVID-19? Should we include standard R Snippets on our OSF page? Do we think our Research Milestones Sheet is important, and, if it is, how do we ensure frequent updates? Do we want to keep our Box.com account? Do we want to start integrating GitHub for version control of our data and analysis scripts? Do we want to participate in the paper-a-day challenge or do we want to encourage regular reading in a different way? For all of these: how do we ensure adoption as well as possible?). We also went through a list of new tools that we had found via various sources (usually via Twitter, thanks Dan Quintana), and decided which ones we wanted to adopt.
- Then lab members claimed tasks they wanted to complete: writing new sections, getting rid of obsolete ones, or adding information to our workspace. All lab members also reviewed our lab canon.
- To get a sense of community that we usually miss, we ordered food and ate “together” via Skype.
- Once the document was finished, Patrick and Hans reviewed the changes and made final decisions.
Here are some things we really liked (this is a non-exhaustive list):
- Our code review section.
- Our section on work-life balance.
- Our clear communication of lab values.
- Our test week data.
- Our daily stand up sessions.
Here are some of our major changes (again, a non-exhaustive list):
- We expanded our onboarding section.
- We changed our Research Milestones Sheets (RMS) and added some tools. For example, by default, we will now use Statcheck to check p-values in our articles and use Connected Papers to help with our literature reviews. We also created a new way to ensure regular updating by integrating the RMS into the weekly meeting agenda of the PhD students (and emphasized that people can earn rewards four times a year if other lab members do not earn theirs).
- We added a section on intellectual humility, encouraging lab members to write Constraints on Generality and to rely on evidence frameworks to not go beyond their data.
- We added a section on COVID-19.
- We already had a journal club and a weekly writing block but did not describe it in the document. This has now been added.
- We lagged in regularly releasing releases on our website. We will do monthly releases of our products (e.g., articles, interviews, blog posts), and presentations and appointed lab members who are responsible for checking for updates. Bastien Paris will ask lab members for their updates and Bastien and Olivier Dujols will post the update. Hans will write blog posts to summarize what we release.
- We removed information on running studies in Grenoble and moved that to a private component on the OSF.
- For the coming year, we will try to integrate Zotero in our workflow. Patrick Forscher will give a brief workshop on how to create and share libraries. Each member of the team will then generate their own libraries, after which we will use that to create various joint libraries.
- In our new projects and wherever possible, we will start with collaboration agreements that we borrow from the Psychological Science Accelerator.
- We decided to add a section on the TOP Factor. In the past, we had a section on favoring open access, but the new TOP Factor allows us to articulate our values more clearly (thanks, COS). Flexibility for non-PIs still exists to submit to journals with high JIF, but we felt that articulating our preference for journals with high TOP Factor communicates our values and hopefully changes matters in the long run.
- We got rid of our “sprint planning”. While it sounded like a great idea to specify medium-term goals, in practice they did not work as well and did not add much to the daily standups and weekly one-on-one meetings.
- We decided against the paper-a-day challenge, but instead we created a Google Sheet where we all report on one article we read per month (starting with those from the lab canon).
- We re-emphasized the importance of using our presentation template and appointed Alessandro Sparacio to ask lab members to send him their presentation to post on our lab GitHub.
The coming two weeks we will also update our Research Milestones Sheet for our existing projects and use Tenzing to clearly specify contributor roles in our existing projects. We still struggle with quite a lot. How for example, can we better discuss our longer term goals outside of journal club and our daily standups? (we feel we miss face-to-face contact where we can discuss outside of work).
We are ready for the new year!
But now…time for some relaxation.
Once a month, the CO-RE lab organizes a journal club. Before each journal club, all journal club members have the option to propose one or two articles for the group to read in advance. The articles may be about any topic related to the CO-RE lab’s shared interests of interpersonal relationships, meta-science, and research methods (see our lab philosophy for more details), but any science-related article that will generate a good discussion is permitted.
During the journal club itself, the journal club member who proposed the article gives a short summary and asks some discussion questions. The journal club’s discussions are conducted in English and moderated by CO-RE lab member Patrick S. Forscher.
Because we have moved our journal club online, we have the possibility to invite two underresourced researchers to be part of our journal club for the 2020/2021 academic year. To see the articles we have discussed in the past, you can see our Journal Club Overview Sheet. As we understand that Internet access may be a challenge for underresourced researchers, we will support the two selected researchers with 100 Euros to pay for Internet access (half paid up front, half paid at the end).
There are some conditions to being part of our journal club and receiving the 100 Euros:
- You have no grants of your own.
- You need the support to be part of international networks.
When selecting the two researchers, we will give priority to those working in countries with a lower GDP per capita (according to the Worldbank’s data).
To apply to be part of the CO-RE Lab journal club and receive the funds for Internet access, we ask you to fill in this form. In the form, we will ask the following information:
- One sentence why you want to be part of our journal club.
- A commitment to being part of the journal club until the end of the next academic year (2020/2021 by the European Academic calendar).
- An article that you would like to discuss during journal club (we want to know your interests!). We are very open to discussing articles outside the North American/European mainstream (see for example this article by Ojiji we discussed).
- A cv that you can upload, so that we know your background.
If you participate in our journal club, we will grant you affiliate membership of the CO-RE lab, we will make ourselves available for advice (if you’d need it), and you get access to our Slack. If you have any questions regarding this opportunity, please contact Patrick Forscher (firstname.lastname@example.org) or Hans Rocha IJzerman (email@example.com). The offer is open to researchers at all levels (from undergraduate to professor).
To try to get everyone on one page in our lab, upon my arrival in Grenoble I wrote a “lab philosophy”1Maybe “written” is too big a word. The first draft was heavily inspired by Davide Crepaldi’s lab guide, which was in turn inspired by Jonathan Peele’s lab guide.. This lab philosophy is complemented by an OSF workspace that includes some useful R code, shared data (hidden from public view), the CRediT taxonomy to identify contributorship within our own lab, and a study protocol for social thermoregulation. When you want to do (open) science as well as possible, it is important to have some kind of shared understanding. That means, for example, creating templates so that master students, PhD students, and postdocs know what I have in mind when I want to do research. We have created, for example, a template for exploratory and one for confirmatory research.2We know that this dichotomy is overly simplistic, but it helps at least for students to structure their research. We have also started using these templates in our teaching (for example to help master students understand how to structure their project on the Open Science Framework). Creating the lab philosophy also allowed me to outline my approach to research and what students can expect from me.
However, it does not only allow me to communicate what I expect. It also allows the people that I work with to correct the process or get rid of tasks that may seem overly burdensome, not very useful, and too bureaucratic.3After all, I heard that the French don’t like bureaucracies….. Every year in September we get together to revise our lab philosophy, so the current draft is more a collaborative document than my lab philosophy. The goal of this post is therefore mostly to document that process. Here’s what we do:
- Each member identifies around three things in the lab philosophy that they do not find useful/outdated.
- Each member identifies around three things in the lab philosophy that they find very useful and absolutely want to keep.
- Each member identifies around three things that they would like to add to the lab philosophy.
Following this process, the PI (me) collates all the information and puts it up to a vote amongst all the lab members. We then get together4“Together” this year will probably be a Skype call. (usually with good coffee) to update our lab philosophy and to commit to our new way of working for the coming year. This will also means reminding ourselves of things that we didn’t do sufficiently yet5I was rereading last year’s update and see that we did not adopt collaboration agreements, for example, and we need to do a better job updating our Research Milestones Sheet. and things that we think are going really well. This year, we will also write a post to write what we have done during our update.
Perhaps you have a lab philosophy yourself. Or, you have comments/critiques/compliments on our lab philosophy. Post your comments and the links to your lab philosophies here, so that we can collect them, so that we can read them, so that maybe we can steal some of your practices, and so that other people who read this post can find lab philosophies other than ours.
This blog post was written by Hans IJzerman.
Image credit Martin Sanchez
Psychological science should be a truly global discipline and psychologists should be poised to understand human behavior in any kind of context, whether it is urban or rural, developed or underdeveloped, WEIRD (Western, Educated, Industrialized, Rich, and Democratic) or nonWEIRD. To arrive there, we need to ensure that 1) researchers from those different contexts are included, but we also need to ensure that 2) researchers from those contexts adhere to the highest-standards in scientific research. Do we, as African researchers, do enough for the credibility and acceptability of African psychology?
To answer this question, we will first analyze an article by the late Ochinya Ojiji – a well-known Nigerian scholar. We then argue how the adoption of open science initiatives can start to answer Ojiji’s call for greater rigor amongst Nigerian and, more broadly, African researchers. Such greater rigor, in the end, can help ensure we will have an equal and quite relevant voice in psychological science. This voice by Nigerian and other African researchers is essential for the development of more mature psychological theories that are more generalizable across various contexts.
The state of Nigerian psychology according to Ochinya Ojiji
In 2015, Ochinya Ojiji, a Nigerian social psychologist who worked for 28 years in Nigeria at the Universities of Jos and Uyo and at the Institute of Peace and Conflict Resolution in Abuja, reflected on how psychology as a science had fared in Nigeria in the past five decades. In the very beginning, Nigerian researchers were taught by Western psychologists. Though only very few psychology departments had been established during this era, Nigerian psychology was taught as a proper scientific field (and recognised as such) with the necessary facilities to conduct rigorous scientific research (like labs to conduct experimental research). The considerable Western influence meant that early Nigerian psychologists maintained relatively high standards in conducting their research. In the two decades that followed, Western influence decreased. Ojiji remarked that the decline of Western influence meant a rise in various unwholesome practices and activities led by homegrown psychologists. These practices include the proliferation of substandard and profit-oriented local journals, little quality control of curricula at universities, and the under par teaching of senior academics via “shuttling” between neighboring universities, which ultimately affects the quality of training of these students.
Ojiji mostly presents these as observations and no data is reported on the frequency of the various occurrences. It is thus unknown to what extent these practices influenced Nigerian psychology. Despite these shortcomings, we feel comfortable in relying on Ojiji’s experience as a one time Editor-in-Chief for one of the most important journals in Nigeria, as an external examiner to a number of universities, and as someone who has taught in most parts of Nigeria to observe and characterize Nigerian psychology as a “folktale psychology”. To counter the issues he observed, Ojiji called for the NPA, the Nigerian regulating body for psychology (that is, unfortunately, not backed by formal legislation) to improve quality-control standards. Yet at present, from Ojiji’s writings it is unclear how we can implement his suggestions to improve the quality of Nigerian (and perhaps African) psychology.
It should be clear that the unguided practice of psychology as a science in Nigeria has put us off track and is thus hurting the quality of our science. To ensure we again improve the quality of our science and to develop a higher standard indigenous psychology, we can again look to our colleagues from North America and Europe and the recently emerging “open science” movement to help inspire some much needed reform in African psychological science.
Open science: an opportunity for African psychological science
Although African countries and African research are quite heterogeneous in respect to their educational structure and local realities, institutions in individual African countries share some common challenges in African psychology such as lack of international recognition, lack of funding, limited resources and facilities, limited legislative backing and so forth. However, it is important to know that some of the problems Ojiji pointed to have one thing in common: there seems to be a lack of verifiability and responsibility within Nigerian psychological science. This was a problem in North American and European psychology as well and they are starting to fix that problem.
One of the central tenets in the open-science movement is to increase verifiability and responsibility. The UK Royal Society’s motto illustrates this well: nullius in verba (take no one’s word for it). The open-science movement is quickly growing in Europe and North America and this movement presents unprecedented opportunities for Nigerian and African researchers.
Specifically, researchers in the open-science movement make available research articles for free on preprint servers, they share their data and research scripts, while they also create helpful resources to learn how to improve one’s research. What is also interesting to help improve the quality of science is the emergence of Registered Reports, where researchers can submit the method of an article before data collection (and where the report is published without paying attention to significance levels).
We believe that adopting open-science practices can answer some of Ojiji’s concerns and can vastly improve Nigerian, as well as African, psychological science. Participating in the open-science movement arguably presents the biggest potential to level the playing field between North American/European and African psychology. It also offers African researchers a global platform to practice credible science and to help shift the perception in at least some African countries that psychology is not a science. But what are some ways to start practicing open science?
How can African researchers engage in open science
There are various initiatives that African researchers can turn to that we have outlined before and we will provide a (non-exhaustive) list here:
- Preprint servers. There are now various preprint servers available (like AfricArXiv, PsyArXiv), which allow researchers across the world to freely access scientific research. In our lab, the CORE Lab, we submit a preprint to such a server upon article submission (see our lab philosophy here). Preprint servers allow researchers to share their newest work, without a long turnaround from a journal.
- Sci-Hub. This website collects the login information from various universities to allow access to many articles that would otherwise be behind a paywall. Sci-Hub is not legal in most countries, so we would never dare to recommend its use, especially in African institutions that cannot always muster the high fees to pay for journal subscriptions.
- The Open Science Framework (OSF). The OSF is free and allows researchers to share their data, materials, and analysis scripts. It also allows researchers to “pre-register” their hypotheses. It further allows for easy collaboration between collaborators across the world (researchers also make available templates for others to build on, like our lab does here).
- R and RStudio R is a programming language package used for statistical computation and analysis. It is useful for writing your analysis scripts. It has many advantages over a software like SPSS, as it is free and the way it works allows for much better verification. R studio is a supporting package that greatly facilitates the usage of R. Note that R is not easy, but there are some excellent resources now online to learn it (like this course by Lisa DeBruine, which was translated in French and available here).
- GitHub is a free service that supports transparent and verifiable research practices. It allows you to publicly archive research materials, allows for much easier collaboration between researchers, and, importantly, permits good version control.
- Code Ocean is a research collaboration platform that supports research from the beginning to when these studies are published. Code Ocean provides you with the necessary technology and tools for cloud computing and built-in best practices for reproducible studies. For example, if you run analyses on a different platform or with a newer R package it is not impossible results vary. Codeocean allows you to directly reproduce the results as planned.
- The Collaborative Replication and Education Project (CREP) is a crowdsourced initiative that is focused on conducting replications by students. The CREP is a pretty unique learning opportunity for people interested in open science, as it has established templates and extensive quality control from researchers around the world. They would be very happy to support African researchers.
- Psychological Science Accelerator (PSA) is a network of over 500 labs from over 70 countries across 6 continents conducting research studies across the globe. With an emphasis on open science practices and different research roles such as test instruments translation, data collection et cetera, the PSA currently presents arguably the most accessible opportunity for African researchers to participate in international studies such as the ongoing COVID-19 rapid studies with some African collaborators (we wrote about the PSA before). Note that current participation from Africa is modest, so this is where there is a real opportunity for African researchers.
- ReproducibiliTea. With over 90 institutions spread across 25 countries, this grassroots journal club initiative provides a unique and supportive community of members to help young researchers improve their skills and knowledge in reproducibility, open science, and research quality and practices. You can organise your own version of it.
There are also tons of other initiatives that we have not mentioned yet, like the Two Psychologists, Four Beers podcasts, or the Everything Hertz podcast. There are thus tons of free opportunities to learn.
How North American and European researchers can support African scientists
But without structural changes to the way science works, open science will not yet level the playing field as access for African scholars is still difficult. Some structural changes in the way that psychological science currently operates will also be necessary to support African researchers to become part of the process. We again provide a (non-exhaustive) list here:
- Representation in formulating and implementing science policy. At present, there is only one person in an initiative like the PSA on the African continent. For global network initiatives such as the PSA, the representation of African researchers must be engraved in their policies. African researchers should also become intimately involved in implementing these policies (and thus not only be involved in data collection for the PSA). This can start with African representation in the board and in various committees.
- The waiving of fees and and communicating that in associations’ policy. Some organizations, such as the Psychological Science Accelerator, allow reduction or complete waivers of obligatory fees. Other organizations (like SIPS, SPSP, APS) should strongly consider offering free memberships to African researchers and make this explicit, as these fees may discourage African researchers that cannot afford these fees.
- Recognition of the realities of third-world countries. When doing crowdsourced research or writing research collaboration documents, one must also factor in the realities of third world countries. African researchers are systematically disadvantaged in these endeavors due to their lack of access to the same level of infrastructure. For example, many African researchers do not have sufficient internet to co-write a manuscript on a fast schedule. Collaborating on constructing materials can also be a challenge. Providing African researchers with a few hundred Euros per year to pay for things like reliable internet can considerably reduce this systematic disadvantage.
- The training of research collaborators. Due to lower levels of science infrastructure, African researchers do not have the same training opportunities as researchers in other regions. This manifests itself in their lower levels of experience with initiatives such as open science. Providing access to training materials, preferably in indiginous African languages, can go a long way toward reducing or eliminating this training gap.
- Dissemination of research. AfricArxiv already exists. This is an excellent initiative and continued support for this preprint server would mean a lot for African scientists. In the same vein, paid journals in psychology should allocate a number of free open access articles for African researchers per year. .
- Funding. IRBs in Africa are often not cheap: they require a fee to go through, which is often prohibitive for conducting research. Providing research grants for Institutional review boards (IRB) fee, for data collection expenses, et cetera will go a long way in facilitating the research process and overall success.
- Facilitation of research visits to universities. By inviting African researchers to your institute, they can benefit from the facilities at your university and they can become an equal partner in your research process.
- Journal audits. Journals should examine how many submissions they receive from Africa and how many articles are accepted, and, if the numbers are low, implement policy to counter that.
African and Nigerian psychology should become a normal part of the research process, if we are to understand humans the world over. Researchers in psychological science have pointed to the need for generalizability and for that to happen, we need to be there. However, in order to get there, Nigerian and African psychologists need to raise their standards and North American and European researchers can support us in achieving our goals. Now is the time to become a vital part of psychological science, as open science presents us with unprecedented opportunities.
There are additional open science resources available that we have missed. Please add them in the comments or shoot me, Ade, an email (firstname.lastname@example.org). We will update this blog and credit you for your contributions. Additions will be especially helpful as we will translate this blog to various languages.
This blog post was written by Adeyemi Adetula, Soufian Azouaghe, Dana Basnight-Brown, Patrick Forscher, and Hans IJzerman
The featured image is licensed under a CC BY-SA by Alessandro Sparacio
The coronavirus disease 2019 (COVID-19) outbreak had a massive impact on our lives. The lockdown obliged us to an abrupt change of habits by bringing severe limitations of personal freedoms. The measures taken against COVID-19, such as the lockdown, may well affect people’s mental health. A general population survey in the United Kingdom (with over a thousand people) revealed widespread concerns about the effects of the current situation on their levels of anxiety, depression, and stress (Holmes et al., 2020; Ipsos MORI, 2020).
If the lockdown affects you in a way that puts you at risk for developing mental health issues such as anxiety, depression, or excessive stress, you would probably want to find a strategy to regulate those states. One way to do so that seems especially suited for the current situation, as it does not require large spaces and can be practiced comfortably at home is self-administered mindfulness. Self-administered mindfulness is a type of meditation that consists in increasing the attention to and the awareness of the present moment, with a non-judgmental attitude (Brown, & Ryan 2003). Most awareness exercises like self-administered mindfulness are based on the same idea: each time the mind wanders, the attention is gently brought back to one’s breath or bodily sensations.
Usually, mindfulness interventions are parts of large programs (which can last 8 weeks) requiring the presence of a qualified instructor. However, self-administered protocols can be engaged in via self-help books, smartphone apps, computer programmes; you don’t always need to be with others to learn and practice mindfulness. These interventions share features such as a non-judgmental attitude and an acceptance of inner experience with other mindfulness protocols. In contrast with other protocols, however, self-administered mindfulness does not require the presence of an instructor, is available 24/7 and is less costly (Spijkerman, Pots, & Bohlmeijer, 2016). Some studies suggest that self-administered mindfulness improved symptoms of perseverative thinking, stress, and depression for a group of students (compared to a passive control group; Cavanagh et al. 2018).
Despite the study by Cavanagh et al. (2018) yielding a positive result, it is still uncertain whether there is actually evidence supporting self-administered mindfulness. It is no secret that the world of many sciences, including psychology, is affected by some “viruses” that infect the quality of our science in many ways: publication bias (the likelihood that positive results have a higher probability of getting published) and questionable research practices (which is generally used as a term to describe various techniques to obtain significant results that may not actually represent valid evidence). And this can be consequential. Fanelli (2010), for example, estimated that psychology’s and psychiatry’s published findings contain over 90% positive results, a statistical impossibility as the literature is not sufficiently powered to detect findings at that rate. This means that the psychological literature is very likely to contain unreliable findings and that findings that are stable are likely overestimated in their “effect sizes” (a statistical concept that reflects the magnitude of the phenomenon of interest).
The relative penetration of these viruses into the self-administered mindfulness domain is uncertain. We are far from saying that self-administered awareness interventions are useless, but some caution is needed before we can say that self-administered awareness has demonstrated efficacy in reducing people’s stress levels. A systematic review of literature or a statistical summary of the findings (i.e. a meta-analysis) are necessary to assess the extent to which this strategy is affected by publication bias and to provide an estimate of the true effect behind this type of meditation. We have reasons to believe that existing meta-analyses (e.g., Khoury et al. 2013) did not do this correctly (see e.g., this post by Uri Simonsohn if you are interested in why)1Alternatively, a Registered Report (a type of research protocol in which the manuscript is pre-registered and receives peer-review before data collection) could help identify the efficacy of self-administered mindfulness. A Registered Report is the best “vaccine” against the research problems we outlined before for a twofold reason: 1) The editor commits to publish even non-significant findings (discouraging the use of QRPs) 2) If also negative results are published, it is possible to compute a reliable estimate of the effect size of interest. Even in this case, a registered report that investigates the efficacy of mindfulness in regulating level of stress is missing..
That means that, at the moment, the answer to the title is: we don’t know. As an intervention, self-administered mindfulness currently has a low “Evidence Readiness Level”: we have no way of knowing whether self-administered mindfulness is a reliable intervention against any stress, depression, or anxiety that people may be experiencing as a result of the current situation. And even if we detect that self-administered mindfulness has worked in the populations that were tested, it is also pretty likely that we don’t know how this works across the world: the intervention has not yet been tested across many different populations (and I know of no research that tests it during a global pandemic).
To investigate this conundrum, part of my PhD project aims to shed light on the potential use of self-administered mindfulness for stress regulation and the affective consequences of stress. We will employ a stringent analysis workflow (including multilevel regression-based models and permutation-based selection models) to test for publication bias in various ways. In other words, I will be able to let you know shortly whether self-administered mindfulness has the benefits for emotion regulation that it currently claims to be having. Although I would have loved to have given you an answer with greater certainty, it is simply too early to tell. As a scientist, I would be remiss to say otherwise.
This post was written by Alessandro Sparacio & Hans IJzerman
Starting in October 2019, I – Olivier – have gone to the Netherlands twice to record the peripheral temperature of partners in couple therapy. In a previous blog post, I explained the basic dynamics of romantic relationships and how couples can enhance their feelings of connection and safety through Emotionally Focused Therapy (EFT). In this blog post, I will discuss how and why we investigate temperature responsiveness during so-called Hold Me Tight weekends to further enhance connection and safety in relationships.
What is Hold Me Tight?
Let’s first explain the concept of Hold Me Tight (HMT). HMT are short versions, either online or in-person, of the EFT protocol (which I described in the previous post). After HMT, couples typically proceed into a longer and somewhat more formal protocol of EFT. In-person HMT is a program that lasts 3 days (over a weekend), and usually involves about 10 couples who are supported by therapists trained in EFT. The HMT weekend is standardized so that the program is similar across couples and across time. The HMT program starts with an introduction aimed at understanding love and attachment as it is understood in research. Then couples go through the following seven chapters:
- Recognizing Demon Dialogues: During this conversation, partners identify the negative cycle they enter when arguing (e.g., one is blaming and the other is emotionally closed). Identifying the root of the problem is the first step that will help them figure out what each other is really trying to say.
- Finding the Raw Spots: Next, partners learn to look beyond their immediate impulsive reactions. They discuss and exchange about their negative thoughts (e.g., “When you say that, I think you are going to leave me”) emerging during their negative cycle.
- Revisiting a Rocky Moment: This conversation is about defusing conflict, and building emotional security. Partners analyze a specific conflicting situation using what they learn during the previous two conversations.
- The Hold Me Tight Conversation: During this conversation partners practice how to be more responsive to each other. They learn how to be more emotionally accessible, more emotionally responsive, and more deeply engaged with each other. They talk about their attachment fears and try to name their needs, in a simple, concrete and brief manner. This is usually considered the main conversation of the weekend.
- Forgiving Injury and Trusting Again: In order to offer forgiveness to each other, partners are then guided to integrate their injuries into the couple’s conversations. To do so, partners reminisce and discuss moments when they felt hurt.
- Bonding Through Sex and Touch: Partners discover that emotional connection creates great sex, and that, in turn, a more fulfilled sexual life increases their emotional connection to each other. They discuss what makes them want to have sex, or not, and if they feel secure having sex, or not.
- Keeping Your Love Alive: In this last discussion partners are looking into the future. After understanding that love is an ongoing process of losing and finding emotional connection, couples are asked to plan rituals in everyday life to deal with their negative cycle. They summarize what they did during the weekend, talk about their feelings, and discuss how they will implement in their daily lives what they have learned.
These chapters focus on how couples can consciously recognize their attachment dynamics. But what if there is more about attachment?
Recognizing our inner penguin dialogues: why temperature may matter for partner responsiveness
In the previous post, we talked a bit about the research John Gottman and his colleagues did on “coregulation”. We have taken a keen interest in this concept, but we depart from a radically different assumption: we try to understand if and how partners’ temperature regulation influences their feeling of safety. To investigate this, I regularly travel to the Netherlands to visit Berry Aarnoudse and Jos van der Loo who organize HMT weekends. During those weekends, I record both partners’ peripheral temperatures.
The immediate objective of this ongoing research is to link peripheral temperature recordings to the participants’ answers to psychological questions asking about their emotions, their feelings of safety, and their perception of the dynamics in their relationships. We suspect the partner’s individual responses to the questionnaire to be related to “signature” variations in peripheral temperature at the couple level.
The procedure for this study is as follows. Before I go to the Netherlands, I typically ask couples registered for the HMT if they are willing to participate in a study investigating how attachment dynamics in couples relate to temperature fluctuations. I kindly ask interested couples to fill in (independently to each other) an online questionnaire that assess relevant psychological and emotional variables, such as: responsiveness to the partner, feeling of security in relationships (in relation to attachment theory), and willingness to be close to their partner. The answers are of course anonymous; part of this is emphasizing that the other partner will never have access to their partner’s response (which can be a challenge during the time of open science!).
To record peripheral temperature, we rely on a sensor (ISP131001) former members of the lab have validated in earlier work (Sarda et al., 2020). This wireless small sensor is placed on the tip of people’s fingers and linked via bluetooth to a smartphone application developed by our lab that records and store data on our server1This mobile application is being developed by CO-RE Lab and the code is open source. You can find it on our GitHub repository here. Don’t mind using it for your own project; please keep us up-to-date if you develop a new version.. The device is very light, which allows wearers to carry out their everyday activities almost just as normal (see above)2Since people wear the device all day long, we provide disposable gloves so that people can go to the bathroom while wearing the device. Also, while our device seems to work pretty well for temperature measurement, the design is not really user friendly. So, feedback from users, and from partners during the HMT weekends is something we incorporate. For example, during previous HMT measurements, people’s feedback allowed us to improve comfort while avoiding as much as possible the breakage of our (very fragile) material.. At the start of the weekend, I give each partner a smartphone (to be kept in their pocket) and I put one temperature sensor on each person’s finger. At the end of the day, I remove the sensor from people’s fingers. So, throughout the entire program we record people’s temperatures. The data of one couple looks like this:
The aim of this project is to understand thermoregulation mechanisms in order to help therapists and couples to improve receptivity between partners. But we cannot do that having “only” data from couples in therapy. This is why, simultaneously, we are recruiting couples in the general population of Grenoble in France. We know from previous studies conducted by our lab that these couples tend to declare being very satisfied with their relationship. We don’t really know why, but having data on couples from HMT and from the general population will help us identify cues to develop interventions for couples that report having lower than average relationship quality. Having this variety will allow us to understand – via deep learning – how peripheral temperature variations between partners are related to partners’ scores to psychological variables (their answers to the anonymous questionnaire). Because we are focused on helping therapists and couples, we intend to develop in the future an algorithm that will manipulate peripheral temperature which we hope will improve partners’ responsiveness. In the end, can we add another chapter to the Hold Me Tight weekend? Our data will tell.
Just before the beginning of the lockdown in France, we decided to stop collecting data in order to protect our participants’ health. But social isolation or being confined together makes the research even more relevant. The lockdown was recently lifted in France, but working from home remains the norm. Couples thus spend more time together at home than ever. Because we believe that the results of this study could help us to understand and improve intimate partner relationships, we are planning (adhering to the COVID-19 prevention measures in place) to resume the study data collection. For every 60 couples that participate, we raffle off some awards (e.g., an iPad). As we now send the sensors via postal service, people all across the European Union can participate. If you are interested in participating in the study, please contact us at email@example.com. If you are a therapist and want to help measure temperature during HMT weekends, please shoot us an email as well.
This blog post was written by Olivier Dujols and Hans IJzerman.
I – Olivier – am a PhD student. My research is in social psychology. However, the end goal of my thesis is to improve how responsive couples are towards each other after they go through relationship therapy. Diving into relationship therapy is a big step for a research-focused social psychologist. To try to improve partner responsiveness, I try to identify the psychological and physiological mechanisms that constitute partner responsiveness. This is part of a series of two blog posts in which I explain my research. In this first blog post, I explain the basic dynamics of romantic relationships from the perspective of EFT and how therapists currently help improve them. In the next blog post, I will discuss how and why we investigate temperature responsiveness to reach our goal.
Attachment dynamics in romantic relationships
People’s attachment orientations are important for how people engage and maintain their relationships. In early life, people “regulate” their relationships by screaming, crying, hugging (Bowlby, 1969/82). Such attachment behaviors are ways to increase closeness to the caregiver and can help the infant signal threats (such as cold, any type of risks, or starvation) from which it seeks protection. When the caregiver provides that protection, it serves as a secure base from which the infant can explore its environment. Mary Ainsworth built on Bowlby’s (1969) work by identifying that infants’ attachment style may differ from each other, developing a method (the Strange Situation) that helps psychologists identify how the infant is attached. When they were first discovered, “attachment styles” were divided into three categories: A (Avoidant), B (Secure), C (insecure/ambivalent). Another category: D (disorganized) was later added to these three (Main & Solomon, 1986).
These attachment styles transfer, at least to some extent, from relationships with parents to relationships with romantic partners (Fraley, 2019). While for children, caregivers are the main source of security, this is often a romantic partner for adults. In the social psychological literature, we usually measure people’s attachments by asking them about their romantic relationships. The Experiences in Close Relationships (ECR) scale is currently the best validated measure of attachment in adulthood (Fraley, Heffernan, Vicary, & Brumbaugh, 2011) and relies on statements like “I prefer not to show my partner how I feel deep down”, and “I often worry that my partner doesn’t really care for me”. People indicate how well each statement applies to them on a scale ranging from 1 – strongly disagree to 7 – strongly agree.
People’s attachment in their romantic relationships is scored on two continuums: from anxious to secure and from avoidant to secure. If you score low on both, you are pretty secure in your romantic relationships (if you want to test how secure you are, you can do so by going here). A person with a high score on anxiety will more frequently try to seek closeness with their partner, but will also often feel like they can lose them at any time. In contrast, a person with a high score on avoidance will less frequently try to seek closeness with their partner, and will prefer not to rely on their partner in stressful or threatening situations. People who are more avoidant are more likely to distance themselves from potential threats and disengage from their emotional reactions. In contrast, people who are more anxious tend to focus on stressful situations, which exacerbates their stress, increases their negative moods, and anxious thoughts.
How attachment theory is connected to therapy: Emotionally Focused Therapy.
Such attachment dynamics can certainly play a role in adult romantic relationships. Humans are, after all, social animals; we need connection and safety throughout our entire life. But despite this necessity, we don’t always know how to connect in our relationship. This is why some couples seek therapy. Emotionally Focused Therapy (EFT) for couples relies on a brief protocol therapy developed by Sue Johnson (2004) that is based on principles from attachment theory combined with a humanistic, and systemic approach. EFT focuses on how people experience their love relationships, and on repairing adult attachment bonds (Johnson, 2004, 2013). Specifically, the goal of the therapy is to create positive cycles of interaction between partners, so that individuals are able to safely ask for and offer support to their partner. Knowing how to be responsive to one another in turn also facilitates the regulation of interpersonal emotions.
EFT is an empirically supported, 8 to 20-session therapy (Wiebe & Johnson, 2016). A meta-analysis shows some empirical support for EFT couples therapy (Johnson et al., 2006). Studies for example have shown that the EFT protocol can be effective for stress management in couples (Lebow et al., 2012) and for increasing couple satisfaction (Denton et al. 2000).1EFT therapy is evidence-based and we like the idea of EFT. We believe in the principles of the theory and the therapy. But that the therapy is evidence-based does not mean we are not critical. The entire field of psychology has been faced with a replication crisis and that means that many of our most precious findings are uncertain, as has been evidenced by many studies (see e.g., Klein et al., 2018). However, the replication crisis in psychology has taught us that the results of empirical studies are not always replicable (that is found again when the conditions in which we found the first evidence are reproduced identically). This is partly due to small sample sizes in our studies and publication bias (which in turn is caused by an overrepresentation of positive – significant results) and a lack of pre-registration. It would be surprising to us if EFT has escaped the crisis, because doing high-quality research is incredibly hard! In our lab, we only consider research that has been done via “Registered Reports” to provide stronger evidence and we are not aware yet of any EFT research that has been conducted via the Registered Report route. Therefore, we think that studies that show positive effects from EFT should be replicated prior to being convincing, at least to us. Effectiveness findings should be replicated so as to provide stronger evidence (and ideally, they should follow these Evidence Readiness Levels. We are not sure yet whether the EFT work is at ERL3. We have not done a systematic review on EFT).
EFT follows three basic stages, in which the couple engage in conversations. During Stage 1, “De-escalation”, both partners mindfully observe their pattern of interaction during conversations. Sue Johnson calls the negative emotional cycle partners are caught in “the dance”. From an attachment point of view, they may discover that the negative cycle creates feelings of abandonment and rejection. During that phase, the purpose is to discover that these feelings are a common enemy and that they can help each other step out of it. During Stage 2, (“Restructuring the bound”) arguably the most powerful conversations take place. These conversations are also called the “Hold Me Tight conversations”. Partners discover and share their attachment fears in ways that allow the other to offer reassurance and safety. Then partners express their need to create deeper emotional responsiveness. When they express that need, they are ready to move to Stage 3, which is the “Consolidation of treatment gains”. The couple there examines the changes they have made and how they have fixed the negative cycle. The therapist supports them in looking to the future, and helps them to reflect on how they achieved greater responsiveness.
During the therapy and during each session the first step for an EFT therapist is to help the couple focusing on the present process by asking “what is happening right now?”. By letting the partners focus on the present, the therapist puts the emotions of both partners together so that they focus on their interaction. The second step for the therapist is to help deepen the emotions, by for example asking what happens when they see tears on their partner’s face. The focus on connection and on the partner’s need can help create a new interaction that is really based on the attachment dynamic. The next step is to have one partner express what happens (e.g., “I don’t trust him/her”) when faced with the other. This helps the couple process the new step in the interaction. The therapist can then help identify what the expression of this sentiment does to the other (e.g., by asking, “What did it feel like to tell him/her that?”). The lack of being able to express one’s primary emotions are often based on demand and withdraw dynamics in relationships. Acknowledging the fear of abandonment helps to break the negative cycle. At the end of each session the therapist points out how well the couple did this process. The therapist tries to repeat and recreate these dynamics during the session at various levels of intensity.
If you are interested in EFT, you can find training tapes, research on EFT, and where to do the basic training on their website.
Conclusion: patterns of responsiveness in mind, but also in body.
The EFT protocol helps us understand how important discussions are in relationships to create deeper emotional interactions. But saying to your partner that you fear abandonment is not the only part of partner responsiveness. As it may be, John Gotmann and his colleagues have spent several years on understanding how people are also physiologically tied to each other. He found that couples that regulate each other physiologically are more likely to stay together.2Note, again, that this is in the Era Pre-Replication Crisis (19 years EPRC), so we are unsure about the strength of the evidence.
Where Gottman often focused on heartbeat, we are focusing on temperature. This may not be so intuitive for humans. A pretty easy way to understand this is when thinking of penguins. Penguins, when they get cold, huddle with each other (see a timelapse video here) to drastically increase the temperature inside the circle. Even if it is -20°C or below in the environment, inside the huddeling circle the temperature can raise up to 37.5°C (Ancel et al., 2015).
We (the Co-Re Lab) think humans do the same things, although it is true that in modern times we often regulate temperature often without each other. And yet, questions that concern people’s desire to warm up with one another correlate reliably with whether people want to share their emotions with their partner (or not; Vergara et al., 2019). And we suspect such regulation of basic needs is at the core of our partner responsiveness. In a next blog post, we will tell you how we investigate this during relationship therapy.
This blog post was written by Olivier Dujols and Hans IJzerman
With the Covid19 crisis our life and our habits have completely changed. At least for a foreseeable amount of time, it will not be feasible to attend courses in person. As a result, many people are starting to move their courses online. If you want to publish your courses online, there are several ways to do it. In this post, we will show you a pretty simple solution via GitHub.
Step 1: Create a GitHub repository
In order to have your manual hosted by GitHub, you need to do two things: 1) create a GitHub account and 2) create a repository (a space of “memory” where your files will be stored). If you are new to GitHub, you can find more detailed instructions on how to work with it here. To show my example from a course I (Alessandro) posted (a translation of an R Manual), you can view this repository and this post (warning, it is in French). Some more detailed instructions on how to work with GitHub can be found in this presentation I did in our social cognition group (“the axe”). You can download it here.
Step 2: Convert your Google Doc to a Markdown file
In order to be displayed by GitHub, your files should be in Markdown format (if you use any other format, the preview of your files will not be available). Let’s assume that you are working with a Google Doc that you want to convert it to Markdown format (I can recommend this, as there are some built-in solutions).
You will first do the conversion by following these instructions:
- Open a Google Doc from your Google drive that you want to convert into a Markdown file.
- Click on Tools → Script Editor (you can only do this if you have the rights of modifying the document).
In the Code.gs window you will find this line of code:
- Save the new script.
- Once you have saved the script, there will be a dropdown menu with the title “MyFunction” (as you can see in the image below).
- From that dropdown, select the function “ConvertToMarkDown“.
- Click the “Run” button (the first time you do that, you will need to give authorization)
- The Google Doc has now converted into “.MD” (Markdown) format. It will be sent automatically to the email address associated with your Google account (with all the images from your Google Doc attached).
Step 3: Fixing the converted file
Conversion works well in some cases, in others it does not. For example, some tables are converted well, others a little less. So, check through your new file before you post it. In any case, if you have images, unfortunately you will have to add them by hand. To add images and fix the code there are some valuable tools. I used two: Atom, which is a text editor and Dillinger, which allows you to see the preview of the file in .MD format. To add images use the following line of code:
This way you will add in the Markdown file the image called image_0.png. It is good practice to organize images in a specific folder, in order not to leave them spread out in your GitHub repository. To refers to images in a specific folder, you can use the following line of code, that must be added in the Markdown file:
To unpack this:
- image_0.png refers to the name of the picture
- “10” is the subfolder of the folder named “images”.
It’s good practice to call the subfolder with a numeric value corresponding to the numbering of the chapter (i.e., 1 for chapter one, 2 for chapter two, et cetera). Thus, with this line of code, you are displaying in the markdown file, the image_0.png, located in the folder “10”, which is located in the folder named “images”. The example outlined above, will look like that in the GitHub repository.
This the line of code for the course I posted:
And this is what was displayed for that part of the course:
Sometimes during the conversion, some images are lost. That means you will not receive the images from your Google Doc via email. In that case you have to save the images, by yourself, from the original document and put them in the folder you created. I suggest this documentation to check how to better work with files that are in Markdown format.
Step 4: Push the converted file to GitHub
- At this point you have the files converted into Markdown and a folder in which you have images that will be displayed in each Markdown file. What you have to do now is to go to the GitHub repository and push the files into Markdown, together with the folder, containing all your images.
- You can upload the files directly from the GitHub website or work in your local repository. When that’s done, push the changes to the online repository.
Step 5: Embed the GitHub page into your blog
- Now that you have updated your GitHub page with the files in Markdown, they will be rendered like this . What you want to do now is to push directly to WordPress (see e.g. this post). It can work, but while the sentences and links are copied from GitHub, images are not. I have made several attempts, but I have not been able to understand why it does not copy the images.
- The best solution now is to embed the chapters into a sort of Wiki on our blog with the links of the various chapters that are on GitHub. That’s what I did here.
- There may be a better, more elegant solution, so that the course itself is pushed from GitHub to WordPress, but I have not been able to discover it yet. If you know of an elegant solution, please let us know via the comments on this blog.
If you follow these steps you will be able to convert your own course into an online one just with Google Docs and a relatively simple workflow. You can embed your GitHub page into your blog, which could make your course available on your personal website. This can help you during these times in making your courses more easily available, and, in the long run, provide greater visibility to your work.
This post was written by Alessandro Sparacio & Hans IJzerman
Vous trouverez ici les chapitres du manuel pour apprendre à utiliser R et RStudio. Le manuel est écrit par Lisa DeBruine et traduit en français par Fabrice Gabarrot, Brice Beffara-Bret, Mae Braud, Marie Delacre, Zoé Lackner, Ladislas Nalborczyk et Cédric Batailler.
Liste des chapitres traduits:
If your ambition is to become a scientist and an expert in a specific research area one path is more efficient than many others. The one that we think will make you an expert quickest is the writing of a meta-analysis. This path is very different from one involving primary research, but it will allow you to answer many more questions that you could conceivably answer with a single experiment. We will provide three reasons why you should take the meta-analysis path. Yet if I still do not convince you, we hope the lessons one of us (Alessandro) is learning from his explorations in meta-analysis thus far will still be of use to you.
1) Meta analyses allow you to have a broad view of a phenomenon of interest
Have you ever tried to go to the top of a tower and look down? The view is much more complete from there; it allows you to have an overview that you wouldn’t have had from the ground. Doing an experimental study is kind of like looking from the ground: only the result of your own experiment will be visible to your eyes. Conducting a meta-analysis instead allows you to see other people’s experiments and approaches at once. Say for example that you are interested in studying how meditation can help reduce stress levels. If you conduct a randomized controlled trial you will only know about that specific treatment and only on one particular population of participants. By conducting a meta-analysis, instead, you hopefully get insight into whether 1) meditation is more effective on individuals with certain personality traits and 2) the effects of meditation can be extended to different populations, while you may also observe when 3) meditation is effective in reducing stress levels and when effects are null or small.
There is one observation from our own path that we can already share with you. As psychologists we should be interested in how different people respond to different manipulations. Does people’s anxiety in their attachments, for example, matter whether or not they benefit from mediation? Or is biofeedback more effective for younger or older people? The fact of the matter is that psychologists often neglect to report detailed records of the populations they study. One of the recommendations that will surely make it into the meta-analysis that Alessandro is leading is that researchers need to keep detailed protocols (like we have recommended here). In that way, meta-analysts can start using this information across many studies.
2) Meta analyses allow you to have information about the health of the literature of interest
A meta-analysis can teach important lessons even to those who have no intention of taking this path. It is not a secret that many sciences have been hit with a replication crisis, as many replication studies have failed to obtain the same results of original studies they sought to replicate (see, for psychology, Klein et al. 2018; Maxwell, Lau, & Howard, 2015; Open Science Collaboration, 2015). One likely reason for the replication crisis is publication bias (see e.g., Sutton, Duval, Tweedie, Abrams, & Jones, 2000). A primary goal of meta-analysis is thus to know how bad the problem actually is and how bad publication bias in that literature is.
In some fields, such as medicine, knowing the real effectiveness of a drug directly impacts people’s lives. However, because of publication bias, assessing the risk-benefit ratio of particular types of drugs is not easy to estimate. As but one example, Turner et al. (2008) analyzed the effects of 12 antidepressant agents on over 12 thousand people both in terms of the proportion of positive studies and the effect sizes associated with the use of this drug. According to the published literature 94% of the trials were positive. Yet after using techniques to account for publication bias, Turner et al. (2008) found out that the percentage dropped to 51% and that the effect size decreased to 32% of its original.
Overestimating the effect of a drug has direct consequences on the choice of certain therapies which in turn impact on the health of a population (and we feel those consequences even more so now, in the midst of a health crisis). A meta-analytic approach can help us signal there is a problem in a literature due to publication bias. Some think that meta-analysis can provide a correction of the effect size by correcting for publication bias. The jury might still be out on this, as others say that even “meta-analysis is fucked”. Even if meta-analyses cannot provide accurate effect sizes, they can provide a snapshot of the health of a particular research field (e.g., by pointing to how many results are positive and what researchers record). Based on this report of health, solutions (like Registered Reports) can be recommended to researchers in that field. It may well be that if meta-analysts do not do their work and provide recommendations, meta-analyses remain fucked for a long time to come.
3) Meta-analysis allows you to acquire skills important for your future career as a scientist
This recommendation is primarily for the starting PhD student. Stephen King famously said: “If you don’t have time to read, you don’t have the time (or the tools) to write. Simple as that”, and there is nothing more true. Reading numerous articles is the key to very quickly becoming a more efficient and faster writer. When I (Alessandro) started my meta-analysis, I may have been shell-shocked by the sheer quantity of what I had to read. But not only did my vocabulary quickly improve, I also encountered many different writing styles. It allowed me to integrate expert writers’ writing styles into mine. What also helps as a beginning PhD student is that conducting a meta-analysis has taught me the importance of good reporting practices and the limitations of a single study. We think for example that the psychological literature vastly underreported important information. We will try to contribute to changes and make protocols available for the researchers in our own literature and I will try my best not to repeat the same errors.
We can certainly recommend walking the meta-analysis path. What we have learned so far is that scientists underreport and they need to create more detailed protocols to keep good records of their work. In addition – and we are stating the obvious here – meta-analyses confirm that publication bias is a considerable problem. Finally, the exercise of doing a meta-analysis is vital for any researcher: it improves one’s writing and the body of knowledge required for running solid experimental studies.
The path to become a better scientist is arduous. Conducting a careful meta-analysis is definitely one of the stages that could lead you to the top. We hope to have convinced you that if you start your research, a meta-analysis is a good path to walk on to ensure that you become a careful observer of the phenomena you study.
This blog post was written by Alessandro Sparacio and Hans IJzerman.
In the past few weeks, a humanitarian, social, and economic disaster has been unfolding because of COVID19. To stop the virus from spreading, people have been asked to engage in social distancing. Based on what we know so far this is a wise decision and we encourage everyone to engage in social distancing, too. At the same time, we know that being socially isolated can have extremely adverse consequences that can even lead to death. How can we adhere to governmental regulations but also maintain social contact? It is time to act now and build smart solutions. We want to build a “relationship simulator” that will keep learning and will mobilise WhatsApp, Facebook, and Zoom communicators to protect our health in this time of crisis.
What is social distancing?
Social distancing means not being in the same place together or at the very least not in close proximity to each other. In Italy, at first, shop and restaurant owners were required to make sure their customers keep their distance and now all of them have been closed to make sure people stay at home. France, Norway, and Ireland have closed their entire education system to keep people from gathering in one place. The data clearly supports this decision, showing that social distancing can contain the virus.
But social distancing can also be dangerous and – in the very worst case scenario – can kill us. Research has convincingly demonstrated that people who are physically and socially isolated die more quickly. The link between social and physical isolation is even stronger than between health and being obese or not, drinking six glasses of alcohol per day or not, exercising regularly or not, and equal to smoking sixteen cigarettes per day or not. The impact of social distancing in the current situation can even make things worse. Loneliness can force people to ignore the recommendations to stay away from others, especially friends and family. So how can we avoid spreading the virus, while not creating another, unintended consequence?
The Relationship Simulator
The first solution that comes to mind is to schedule frequent calls via social media. That is a good first step. But, as everyone who skyped with a loved one knows, it is not enough. Physical proximity is incredibly important. People live in societies and relate to each other for good reasons. It used to help us to more easily gather food, to cope with dangers and predators, and to keep each other warm. We can deal with all of these problems pretty well now without direct contact. But these problems all have had consequences for our psychological makeup: we still need to be with others to stay alive. Being socially connected is a basic biological need: the late John Cacioppo likened the feeling of loneliness to hunger. So what can we do to reduce our hunger for human contact when we need to socially distance?
Food gathering and dealing with predators are not our main concern anymore. But we still co-regulate our temperature with others even though we have modern ways to keep us warm. We rarely think about body temperature regulation in a social context but thermoregulation is inherently social. It is a major factor in determining why people enter and nurture social relationships. Our knowledge is based on diverse research findings, as there is considerable neural overlap between social and thermoregulatory behaviors, while people’s social networks protect them from the cold , and people respond in social ways to temperature fluctuations.
What we all intuitively understand is that touch is important for our physical and mental well-being and social thermoregulation is the reason why. That is why social distancing can have adverse consequences for our health. So how to resolve this conundrum – to keep both our distance and literal warmth of human contact? We hope to build a “relationship simulator” that can emulate the intimacy of touch (via temperature fluctuations) when people are distant from each other. In a first phase, we want to connect a device that can warm or cool a person (the EmbrWave) to programs like Facebook Messenger, WhatsApp, Skype, and Zoom so that people can warm and cool each other with just one click of a button during the call or conversation to simulate real-life social contact. In that first phase, we will also measure people’s temperature to see how they respond. In the next phase, we will be able to adjust the temperature manipulation through an algorithm, so that the temperature manipulation (through the EmbrWave and the sensors) can simulate intimacy automatically while people are far apart.
Who/what do we need to make this happen?
This may seem like a distant dream, but we believe it is in reach and we can build it now. Our team can construct and evaluate the validity of what we measure through psychometric and quantitative modelling. To make this happen, we will need the following:
– Programmers that can connect the EmbrWave with programs like Facebook Messenger, WhatsApp, Skype, and Zoom.
– Programmers that will connect temperature sensors with the same types of programs.
– Safe data storage to store the data in and a server powerful enough to help conduct computations.
– Experts to help test a prototype that can learn during interactions.
– Experts on data privacy laws, to ensure we do not interfere with privacy/law while we collect the data.
– Additional data scientists to help data experts from our team to most accurately interpret and model the data.
– Scientists to help organize this project and conduct the necessary research.
– Thermoregulation experts to further test our sensors and replace it if necessary (we currently use the ISP131001 sensor).
– Core body temperature sensors to model the process (current – excellent – solutions like the GreenTEG body sensor are too expensive for our team and the individual user).
– Sensors and EmbrWaves being made available for different users: this costs a considerable amount of money.
While we build the relationship simulator, these temperature sensors can also be used to quickly detect fevers and other problems with people’s health, so there will be other benefits of this system. To join our team or to contribute to our cause, please fill in this form.
This blog post was written by Anna Szabelska, EmbrLabs, and Hans IJzerman
The goals of AfricArXiv include fostering community among African researchers, facilitate collaborations between African and non-African researchers, and raise the profile of African research on the international stage. These goals align with the goals of a different organization, the Psychological Science Accelerator (PSA). This post describes how these goals align and argues that joining the Psychological Science Accelerator will benefit members of the AfricArXiv research community through increased collaboration and resource access.
What is the Psychological Science Accelerator?
The PSA is a voluntary, globally distributed, democratic network of over 500 labs from over 70 countries on all six populated continents, including Africa. Psychology studies have traditionally been dominated by Western researchers studying Western participants (Rad, Martingano, & Ginges, 2018). One of the primary goals of the PSA is to help address this problem by expanding the range of researchers and participants in psychology research, thereby making psychology more representative of humanity.
This goal is consistent with the goals of AfricArXiv: addressing the lack of non-Western psychology researchers entails raising the profile of African psychology researchers and fostering collaborations between African and non-African researchers. In addition, the PSA in particular has an interest in expanding its network in Africa: although the PSA wishes to achieve representation on all continents, at last count only 1% of its 500 labs were from Africa.
How the PSA can benefit the African research community
The shared goal of the PSA and AfricArXiv is thus to win/recruit a group of African researchers to join the PSA and its programmes on internationally acclaimed research in psychological science. We are committed to expanding the profile of members of the African research community.
Any psychology researcher can join the PSA at no cost. Member labs will have the opportunity to contribute to PSA governance, submit studies to be run through the PSA network of labs, and collaborate and earn authorship on projects involving hundreds of researchers from all over the world. PSA projects are very large in scale; the first global study run through its network (Jones et al., 2020) involved more than 100 labs from 41 countries, who collectively recruited over 11,000 participants.
The PSA generates a large amount of research communication, which can all be shared at no cost through AfricArXiv. The PSA datasets that involve African participants are available for free for secondary analysis. These datasets may be analyzed with a specifically African focus, and the resultant research can again be freely shared via AfricArXiv.
The specific benefits of PSA membership
The first step to obtaining the benefits of the PSA is to become a member by expressing an in-principle commitment to contribute to the PSA in one way or the other. Membership is free of charge.
Once you are a member, you gain access to the five following benefits:
- Free submission of proposals to run a large, multi-national project. The PSA accepts proposals for new studies to run through its network every year between June and August (you can see our 2019 call here). You too can submit a proposal. If your proposal is accepted during our peer review process, the PSA will help you recruit collaborators from its international network of 500 labs and provide support with all aspects of completing a large, multi-site study. You can then submit any research products that result from this process free of charge as a preprint on AfricArXiv.
- Join PSA projects. The PSA is currently running six multi-lab projects, one of which is actively recruiting collaborators. In the next two weeks, the PSA will accept a new wave of studies. As a collaborator on one of our studies, you can collect data or assist with statistical analysis, project management, or data management. If you join a project as a collaborator, you will earn authorship on the papers that result from the project (which can be freely shared via AfricArXiv). You can read about the studies that the PSA is currently working on here.
- Join the PSA’s editorial board. The PSA sends out calls for new study submissions on a yearly basis. Like grant agencies and journals, it needs people to serve as reviewers for these study submissions. You can indicate interest in serving as a reviewer when you become a PSA member. In return, you will be listed as a member of the PSA editorial board. You can add this editorial board membership to your website and CV.
- Join one of the PSA’s governance committees. The PSA’s policies and procedures are developed in its various committees. Opportunities regularly arise to join these committees. Serving on committees helps shape the direction of the PSA and puts researchers in touch with potential collaborators from all over the world. If you are interested in joining a committee, join the PSA newsletter and the PSA Slack workspace. We make announcements of new opportunities to join our committees on these outlets.
- Receive compensation to defray the costs of collaboration. We realize that international collaboration can be challenging and expensive, particularly for researchers at lower income institutions. The PSA is therefore providing financial resources to facilitate collaboration. At present, we have a small pool of member lab grants, small grants of $400 USD to help defray the costs of participating in a PSA research project. You can apply for a member lab grant here.
The PSA aims to foster collaboration on our large, multi-national and multi-lab projects. We believe these collaborations can yield tremendous benefits to African researchers. If you agree, you can join our network to gain access to a vibrant and international community of over 750 researchers from 548 labs in over 70 countries. We look forward to working with you.
This blog post written by Adeyemi Adetula and Patrick Forscher and is cross-posted at AfricArxiv.
In November 2019, Tal Yarkoni set psychology Twitter ablaze with a fiery preprint, “The Generalizability Crisis” (Yarkoni, 2019). Written with direct, pungent language, the paper fired a direct salvo at the inappropriate breadth of claims in scientific psychology, arguing that the inferential statistics presented in papers are essentially meaningless due to their excessive breadth and the endless combinations of unmeasured confounders that plague psychology studies.
The paper is clear, carefully argued, and persuasive. You should read it. You probably have.
Yet there is something about the paper that bugs me. That feeling wormed its way into the back of my mind until it has become a full-fledged concern. I agree that most verbal claims in scientific articles are often, or even usually, hopelessly misaligned with their instantiations in experiments such that the statistics in papers are practically useless as tests of the broader claim. In a world where claims are not refuted by future researchers, this represents a huge problem. That world characterizes much of psychology.
But the thing that bugs me is not so much the paper’s logic as (what I perceive to be) its theory of how to change scientist behavior. Whether Tal explicitly believes this theory or not, it’s one that I think is fairly common in efforts to reform science — and it’s a theory that I believe to be shared by many failed reform efforts. I will devote the remainder of this blog to addressing this theory and laying out the theory of change that I think is preferable.
A flawed theory of change: The scientist as a logician
The theory of change that I believe underlie’s Tal’s paper is something I will call the “scientist as logician” theory. Here is a somewhat simplified version of this theory:
- Scientists are truth-seekers
- Scientists use logic to develop the most efficient way of seeking truth
- If a reformer uses logic to identify flaws in a scientist’s current truth-seeking process, then, as long as the logic is sound, that scientist will change their practices
Under the “scientist as logician” theory of change, the task of a putative reformer is to develop the most rigorously sound logic as possible about why a new set of practices is better than an old set of practices. The more unassailable this logic, the more likely scientists are to adopt the new practices.
This theory of change is the one implicitly adopted by most academic papers on research methods. The “scientist as logician” theory is why, I think, most methods research focuses on accumulating unassailable evidence about what are the most optimal methods for a given set of problems — if scientists operate as logicians, then stronger evidence will lead to stronger adoption of those optimal practices.
This theory of change is also the one that arguably motivated many early reform efforts in psychology. Jacob Cohen wrote extensively and persuasively on why, based on considerations of statistical power, psychologists ought to use larger sample sizes (Cohen, 1962; Cohen, 1992). David Sears wrote extensively on the dangers of relying on samples of college sophomores for making inferences about humanity (Sears, 1986). But none of their arguments seemed to really have mattered.
In all these cases, the logic that undergirds the arguments for better practice is nigh unassailable. The lack of adoption of their suggestions reveal stark limitations in the “scientist as logician” theory. The limited influence of methods papers is infamous (Borsboom, 2006) — especially if the paper happens to point out critical flaws in a widely used and popular method (Bullock, Green, & Ha, 2010). Meanwhile, despite the highly persuasive arguments by Jacob Cohen, David Sears, and many other luminaries, statistical power has barely changed (Sedlmeier & Gigerenzer, 1989), nor has the composition of psychology samples (Rad, Martingano, & Ginges, 2018). It seems unlikely that scientists change their behavior purely on logical grounds.
A better theory of change: The scientist as human
I’ll call my alternative to the “scientist as logician” model the “scientist as human” model. A thumbnail sketch of this model is as follows:
- Scientists are humans
- Humans have goals (including truth and accuracy)
- Humans are also embedded in social and political systems
- Humans are sensitive to social and political imperatives
- Reformers must attend to both human goals and the social and political imperatives to create lasting changes in human behavior
Under the “scientist as human” model, the goal of the putative reformer is to identify the social and political imperatives that might prevent scientists from engaging in a certain behavior. The reformer then works to align those imperatives with the desired behaviors.
Of course, for a desired behavior to occur, that behavior should be aligned with a person’s goals (though that is not always necessary). Here, however, reformers who want science to be more truthful are in luck: scientists overwhelmingly endorse normative systems that suggest they care about the accuracy of their science (Anderson et al., 2010). This also means, however, that if scientists are behaving in ways that appear irrational or destructive to science, that’s probably not because the scientists just haven’t been exposed to a strong enough logical argument. Rather, the behavior probably has more to do with the constellation of social and political imperatives in which the scientists are embedded.
This view, of the scientist as a member of human systems, is why, I think, the current open science movement has been effective where other efforts have failed. Due to the efforts of institutions like the Center for Open Science, many current reformers have a laser focus on changing the social and political conditions. The goal behind these changes is not to change people’s behavior directly, but to shift institutions to support people who already wish to use better research practices. This goal is a radical departure from the goals of people operating under the “scientist as logician” model.
Taking seriously the human-ness of the scientist
The argument I have made is not new. In fact, the argument is implicit in many of my favorite papers on science reform (e.g., Smaldino & McElreath, 2018). Yet I think many prospective reformers of science would be well-served in thinking through the implications of the “scientist as human” view.
While logic may help in identifying idealized models of the scientific process, reformers seeking to implement and sustain change must attend to social and political processes. This includes especially those social and political processes that affect career advancement, such as promotion criteria and granting schemes. However, this also includes thinking through the processes that affect how a potential reform will be taken up in the social and political environment, especially whether scientists will have the political ability to take collective action to take up particular reform. In other words, taking seriously scientists as humans means taking seriously the systems in which scientists participate.
- Anderson, M. S., Ronning, E. A., De Vries, R., & Martinson, B. C. (2010). Extending the Mertonian Norms: Scientists’ Subscription to Norms of Research. The Journal of Higher Education, 81(3), 366–393. https://doi.org/10.1353/jhe.0.0095
- Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71(3), 425–440. https://doi.org/10.1007/s11336-006-1447-6
- Bullock, J. G., Green, D. P., & Ha, S. E. (2010). Yes, but what’s the mechanism? (Don’t expect an easy answer). Journal of Personality and Social Psychology, 98(4), 550–558. https://doi.org/10.1037/a0018933
- Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. The Journal of Abnormal and Social Psychology, 65(3), 145–153. https://doi.org/10.1037/h0045186
- Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. https://doi.org/10.1037/0033-2909.112.1.155
- Rad, M. S., Martingano, A. J., & Ginges, J. (2018). Toward a psychology of Homo sapiens: Making psychological science more representative of the human population. Proceedings of the National Academy of Sciences, 115(45), 11401–11405. https://doi.org/10.1073/pnas.1721165115
- Sears, D. O. (1986). College sophomores in the laboratory: Influences of a narrow data base on social psychology’s view of human nature. Journal of Personality and Social Psychology, 51(3), 515–530. https://doi.org/10.1037/0022-3518.104.22.1685
- Sedlmeier, P., & Gigerenzer, G. (1992). Do studies of statistical power have an effect on the power of studies? In Methodological issues & strategies in clinical research (pp. 389–406). American Psychological Association. https://doi.org/10.1037/10109-032
- Smaldino, P. E., & McElreath, R. (n.d.). The natural selection of bad science. Royal Society Open Science, 3(9), 160384. https://doi.org/10.1098/rsos.160384
- Yarkoni, T. (2019). The Generalizability Crisis [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/jqw35
As scientists, we often hope that science self-corrects. But several researchers have suggested that the self-corrective nature of science is a myth (see e.g., Estes, 2012; Stroebe et al., 2012). If science is self-correcting, we should expect that, when a large replication study finds a result that is different from a smaller original study, the number of citations to the replication study ought to exceed, or at least be similar to, the number of citations to the original study. In this blog post, I examine this question in six “correction” studies in which I’ve been involved.1I did not include any of the ManyLabs replication studies because they were so qualitatively different from the rest. This exercise is intended to provide yet another anecdote to generate a discussion about how we, as a discipline, approach self-correction and is by no means intended as a general conclusion about the field.
Sex differences in distress from infidelity in early adulthood and later life.
In an article in 2004, Shackelford and colleagues (2004) reported that men, compared to women, are more distressed by sexual than emotional fidelity (total N = 446). The idea was that this effect generalize from young adulthood to later adulthood and this was taken as evidence for an evolutionary perspective. In our pre-registered replication studies (total N = 1,952) we did find the effect for people in early adulthood but not for later adulthood. In our replication study we also found that the disappearance of the effect was likely due to sociosexual orientation in the older adults that we sampled (in the Netherlands as opposed to the United States). In other words, the basic original effect seemed present, but the original conclusion was not supported.
How did the studies fare in terms of citations?
Original study (since 2014): 56 citations
Replication study (since 2014): 23 citations
Conclusion: little to no correction done (although perhaps it was not a conclusive non-replication given the ambiguity of the theoretical interpretation)
Does recalling moral behavior change the perception of brightness?
Banerjee, Chatterjee, and Sinha (2012; total N = 114) reported that recalling unethical behavior led participants to see the room as darker and to desire more light-emitting products (e.g., a flashlight) compared to recalling ethical behavior. In our pre-registered replication study (N = 1,178) we did not find the same effects.
How did the studies fare in terms of citations?
Original study (since 2014): 142 citations
Replication study (since 2014): 24 citations
Conclusion: correction clearly failed.
Physical warmth and perceptual focus: A replication of IJzerman and Semin (2009)
This replication is clearly suboptimal, as this was a self-replication. This study was conducted in the midst of the beginning of the replication crisis so we wanted to self-replicate some of our work. In the original study (N = 39), we found that when people are in a warm condition, they focus more on perceptual relationships than individual properties. In a higher-powered replication study (N = 128), we found the same effect (with a slightly different method to better avoid experimenter effects).
How did the studies fare in terms of citations?
Original study (since 2014): 323
Replication study (since 2014): 26
Conclusion: no correction needed (yet; but please someone other than us replicate this and the other studies, as these 2009 studies were all underpowered).
- Perceptual effects of linguistic category priming
This was a particularly interesting case as this paper was published after the first author, Diederik Stapel, was caught for data fabrication. All but one of the (12) studies were conducted before he got caught (but we could never publish them due to the nature of the field at the time). In the original (now retracted) article, Stapel and Semin reported that priming abstract linguistic categories (adjectives) led to more global perceptual processing, whereas priming concrete linguistic categories (verbs) led to more local perceptual processing. In our replication, we could not find the same effect.2Note: technically, this study was not a replication, as the original studies were never conducted. After Stapel was caught, the manuscript was originally submitted to the journal that originally published the effects. Their message was then that they would not accept replications. When we pointed out that these were not replication, the manuscript was rejected for the fact that we had found null effects. Thankfully, the times are clearly changing now.
- How did the studies fare in terms of citations?
- Original study (since 2015): 12
- Replication study (since 2015): 3
- Conclusion: correction failed (although citations slowed down significantly and some of the newer citations were about Stapel’s fraud).
Does distance from the equator predict self-control?
This is somewhat of an outlier in this list, as this is an empirical test of a hypothesis in an theoretical article. The hypothesis of this article that was proposed is that people who live further away from the equator have poorer self-control and the authors suggested that this should be tested via data-driven methods. We were lucky enough to have a dataset (N = 1,537) to test this and took up the authors’ suggestion by using machine learning. In our commentary article, were unable to find the effect (equator distance as a predictor of self-control was just a little bit less important than whether people spoke Serbian).
How did the studies fare in terms of citations?
Original article (since 2015): 57
Empirical test of the hypothesis (since 2015): 3
Conclusion: correction clearly failed (in fact, the original first author published a very similar article in 2018 and cited the article 6 times).
A demonstration of the Collaborative Replication and Education Project: replication attempts of the red-romance effect project
Elliot et al (2010; N = 33) reported a finding that women were more attracted to men when their photograph was presented with a red (vs. grey) border. Via one of my favorite initiatives that I have been involved in, the Collaborative Replications and Education Project, 9 student teams tried to replicate this finding via pre-registered replications and were not able to find the same effect (despite very high quality control and a larger sample with total N = 640).
How did the studies fare in terms of citations?
Original study (since 2019): 17
Replication study (since 2019): 8
Conclusion: correction failed.
Social Value Orientation and attachment
Van Lange et al. (1996; N Study 1 = 573; N Study 2 = 136) reported two findings that people who are more secure in their attachment are also more prone to give more to other (fictitious) people in a coin distribution game. These original studies suffered from some problems: first, the reliabilities of the measurement instruments ranged between alpha = 0.46 and 0.68. Second, the somewhat more reliable scales (at alpha = 0.66 and 0.68) only produced marginal differences in a sample of 573 participants, when controlling for gender and after dropping items from the attachment scale (in addition, there were problems with one of the measure’s translation to Dutch). In our replication study (N = 768) that we conducted with better measurement instruments and in the same country, we did not find the same effects.
How did the studies fare in terms of citations?
Original study (since 2019): 110
Replication study (since 2019): 8
Conclusion: correction clearly failed (this one is perhaps a bit more troubling, as 1) the replication covered 2 out of the 4 studies, and the researchers from ManyLabs2 were also not able to replicate Study 3. Again, the first author was responsible for some (4) of the citations).
[EDIT March 8 2020]: Eiko Fried suggested I should plot the citations by year. If you want to download the data and R code, you can download them here.
A couple of observations:
- 2020, of course, is not yet complete. I thus left it out of the graph as having 2020 in may be a bit misleading.
- When plotting per year, it became apparent that 2016 for Banerjee et al. had the “BBS effect” (the article was cited in a target article and received many ghost citations in Google Scholar for the commentaries that were published [but that did not cite the article; the citations for 2016 are thus inaccurate]. This does not take away from the overall conclusion).
- Overall, there seemed to be no decline in citations.
Total for original studies (excluding 1 and 3): 338
Total for replication studies (excluding 1 and 3): 46
For the current set of studies, we clearly fail in correcting our beloved science. I suspect the same is true for other replication studies. I would love to hear more about experiences of other replication authors and I think it is time to generate a discussion how we can change these Questionable Citation Practices.
This blog was written to originally appear in “Le Monde” and so was initially aimed at the French public. However, people from all countries can sign to show their support for the integration of open science into grants and hiring practices. The French version is first, after which the English version follows. If you want to sign the petition, please sign it with the name of the country where you live. If you want to petition your own scientific organizations/governments, then we will share the data of our signers per country upon request (firstname.lastname@example.org).
L’étude scientifique des comportements humains fournit des connaissances pertinentes pour chaque instant de notre vie. Ces connaissances peuvent être utilisées pour résoudre des problèmes sociétaux urgents et complexes, tels que la dépression, les discriminations et le changement climatique. Avant 2011, beaucoup de scientifiques pensaient que le processus de création des connaissances scientifiques était efficace. Nous étions complètement dans l’erreur. Plus important encore, notre domaine a découvert que même des chercheurs honnêtes pouvaient produire des connaissances non fiables. Il est donc temps d’appliquer ces réflexions à nos pratiques afin de changer radicalement la façon dont la science fonctionne.
En 2011, Daryl Bem, un psychologue reconnu, mit en évidence la capacité de l’être humain à voir dans le futur. La plupart des scientifiques s’accorderaient sur le caractère invraisemblable de ce résultat. En utilisant les critères de preuves partagées par de nombreuses disciplines, Bem trouva des preuves très solides en apparence et répliquées sur 9 expériences avec plus de 1000 participants. Des études ultérieures ont démontré de façon convaincante que l’affirmation de Bem était fausse. Les psychologues, en réalisant des réplications d’études originales dans des dizaines de laboratoires internationaux, ont découvert que cela ne se limite pas à ces résultats invraisemblables. Un membre de notre équipe a mené deux de ces projets, dans lesquels des affirmations sont testées sur plus de 15 000 participants. En rassemblant les résultats de trois de ces projets internationaux, seuls 27 de ces 51 effets préalablement rapportés dans la littérature scientifique ont pu être confirmés (et des problèmes similaires sont maintenant détectés par des projets de réplication en biologie du cancer) .
Le point de vue des scientifiques (et pas seulement des psychologues) sur la robustesse des preuves scientifiques a drastiquement changé suite à publication de Joe Simmons et de ses collègues démontrant comment il est possible d’utiliser les statistiques pour prouver n’importe quelle idée scientifique, aussi absurde soit-elle. Sans vérification de leur travail et avec des méthodes routinières, les chercheurs peuvent trouver des preuves dans des données qui en réalité n’en contiennent pas. Or, ceci devrait être une préoccupation pour tous, puisque les connaissances des sciences comportementales sont importantes à l’échelle sociétale.
Mais quels sont les problèmes ? Premièrement, il est difficile de vérifier l’intégrité des données et du matériel utilisé, car ils ne sont pas partagés librement et ouvertement. Lorsque des chercheurs ont demandé les données de 141 articles publiés dans de grandes revues de psychologie, ils ne les ont reçu que dans 27% des cas. De plus, les erreurs étaient plus fréquentes dans les articles dont les données n’étaient pas accessibles. Ensuite, la majorité du temps nous n’avons pas connaissance des échecs scientifiques ni même des hypothèses a priori des chercheurs. Dans la plupart des domaines scientifiques, seuls les succès des chercheurs sont publiés et leurs échecs partent à la poubelle. Imaginez que cela se passe de la même façon avec le sport : si l’Olympique de Marseille ne communiquait que ses victoires et cachait ses défaites, on pourrait penser (à tort) que c’est une excellente équipe. Nous ne tolérons pas cette approche dans le domaine sportif. Pourquoi devrions-nous la tolérer dans le domaine scientifique ?
Depuis la découverte de la fragilité de certains de leurs résultats, les psychologues ont prit les devants pour améliorer les pratiques scientifiques. À titre d’exemple, nous, membres du « Co-Re lab », au LIP/PC2S de l’Université Grenoble Alpes, avons fait de la transparence scientifique un standard. Nous partageons nos données dans les limites fixées par la loi. Afin de minimiser les erreurs statistiques nous réalisons une révision de nos codes. Enfin, nous faisons des pré-enregistrements ou des Registered Report qui permettent de déposer une idée ou d’obtenir une acceptation de publication par les revues avant la collecte des données. Cela assure la publication d’un résultat, même s’il n’est pas considéré comme un « succès ». Ces interventions permettent de réduire drastiquement la probabilité qu’un résultat insensé soit intégré dans la littérature.
Tous les chercheurs ne suivent pas cet exemple. Cela signifie qu’une partie de l’argent des impôts français finance une science dont l’intégrité des preuves qui soutiennent les affirmations ne peut être vérifiée, faute d’être ouvertement partagées. Plus spécifiquement, nous appelons à ce qui suit :
- Pour toute proposition de subvention (qu’elle repose sur une recherche exploratoire ou confirmatoire) adressée à tout organisme de financement, exiger un plan de gestion des données.
- Pour toute proposition de subvention adressée à tout organisme de financement, rendre par défaut accessible ouvertement codes/matériel/données (à moins qu’il n’y ait une raison convaincante pour laquelle cela soit impossible, comme dans le cas de la protection de l’identité des participants)
- Le gouvernement français devrait réserver des fonds dédiés à des chercheurs pour vérifier l’exactitude et l’intégrité des résultats scientifiques majeurs.
- Les universités devraient accorder la priorité d’embauche et de promotion aux chercheurs qui rendent leur matériel, données, et codes accessibles ouvertement.
C’est à l’heure actuelle où la France investit dans la science et la recherche qu’il faut agir. Le premier ministre Édouard Philippe a annoncé en 2018 que 57 milliards d’euros seront dédiés à la recherche. Nous sommes certains qu’avec les changements que nous proposons, l’investissement français nous conduira à devenir des leaders mondiaux en sciences sociales. Plus important encore, cela conduira la science française à devenir crédible et surtout, utile socialement. Nous vous appelons à soutenir cette initiative et à devenir signataire pour une science ouverte française. Vous pouvez signer notre pétition ci-dessous. Veuillez signer avec votre nom, votre adresse e-mail et le pays dans lequel vous vivez.
Society should demand more from scientists: Open letter to the (French) public
The science of human behavior can generate knowledge that is relevant to every single moment of our lives. This knowledge can be deployed to address society’s most urgent and difficult problems — up to and including depression, discrimination, and climate change. Before 2011, many of us thought the process we used to create this scientific knowledge was working well. We were dead wrong. Most importantly, our field has discovered that even honest researchers can generate findings that are not reliable. It is therefore time to apply our insights to ourselves to drastically change the way science works.
In 2011, a famous psychologist, Daryl Bem, used practices then standard for his time to publish evidence that people can literally see the future. Most scientists would agree that this is an implausible result. Bem used the standards of evidence for many sciences available at that time, and found seemingly solid evidence across 9 experiments and over 1,000 participants. Later studies have convincingly demonstrated that Bem’s claim was not true. Psychologists have now discovered that this is not just restricted to those implausible results, as they have conducted studies replicating original studies across dozens of international labs. One of us led two of these projects, in which claims are examined in over 15,000 participants. When we take the evidence of three of such international projects together, we could only confirm 27 out of the 51 effects that were previously reported in the scientific literature (and similar problems have now been detected through replication projects in Cancer Biology).
Scientists’ — and not only psychologists’ — view of the solidity of their evidence changed quite dramatically when Joe Simmons and his colleagues demonstrated how, as a researcher, you could use statistics to basically prove any nonsensical idea with scientific data. Unchecked, researchers are able to use fairly routine methods to find evidence in datasets where there is none. This should be a concern to anyone, as insights from behavioral science are important society wide.
So what are some of the problems? One is the difficulty of even checking a study’s data and materials for integrity because these data and materials are not openly and freely shared. Many labs regard data as proprietary. When researchers requested the data from 141 papers published in leading psychology journals, they received the data only 27% of the time. What is more, of papers of which data was not shared, errors were more common. But we also often don’t know people’s failures, nor do we know what their a priori plans were. Within most of the sciences, we only learn about their successes, as researchers publicize their successes and leave their failures to rot on their hard drive. Imagine if we were to do the same for sports: if Olympique Marseille only told us about the games that they won, hiding away games that they lost, we would think — erroneously — that OM has a great team. We do not tolerate this approach in sports. Why should we tolerate it for science?
Since discovering that their findings are not always robust, psychologists have led the charge in improving scientific practices. For example, we members of the “Co-Re” lab at LIP/PC2S at Université Grenoble Alpes have made transparency in our science a default. We share data to the degree that it is legally permitted. To limit the occurrence of statistical errors we conduct code review prior to submitting to scientific journals. Finally, we do pre-registrations or registered reports, which is a way to deposit an idea or to obtain a publication acceptance by journals before data collection. This ensures the publication of a result, even when this is not considered a “success”. Because of all these interventions the chance of a nonsensical idea entering the literature becomes decidedly smaller.
Not all researchers follow this example. This means that a lot of tax money (including French tax money) goes to science where the evidence that supports its claims cannot be checked for integrity because it is not openly shared. We strongly believe in the potential of psychological science to improve society. As such, we believe French tax money should go toward science (including psychological science) that has the highest chance of producing useful knowledge — in other words, science that is open.
Specifically, we call for the following:
- For all grant proposals (whether they are relying on exploratory or confirmatory research) to any funding agency demand a data management plan.
- For all grant proposals to any funding agency, make open code/materials/data the default (unless there is a compelling reason that this is impossible, such as in the case of protecting participants’ identity).
- The French government should set aside dedicated funding for researchers to check the accuracy and integrity of major scientific findings
- Universities should prioritize hiring and promoting researchers who make their materials, data, and code openly available
The time for change is now, because France is investing into science and research. The French prime minister Édouard Philippe announced in 2018 to invest 57 billion into investment and innovation. Importantly, Minister of Higher Education Frédérique Vidal’s has committed to making science open, so that the knowledge we generate is available to the taxpayer. We believe we can maximize this money’s return on investment for society by ensuring that these open principles also apply to the materials, data, and the code generated by this money. Only with our proposed changes, we have the confidence that the French investment will lead us to become world leaders in social science. What’s more important, it will lead (French) science to become credible, and, importantly, socially useful. We call for your action to support this initiative and to become a signature for (French) open science. You can do so below.
Written by Patrick Forscher, Alessandro Sparacio, Rick Klein, Nick Brown, Mae Braud, Adrien Wittman, Olivier Dujols, Shanèle Payart, and Hans IJzerman.
The Co-Re Lab is part of the Laboratoire Inter-universitaire de Psychologie Personnalité, Cognition, Changement Social (LIP/PC2S) at Université Grenoble Alpes. In France, “laboratoire” or “labo” (laboratory) is used for what researchers in the Anglo-Saxon world would call “department”. During our labo meeting yesterday one of the agenda points was to vote on the following statement:
« Une bonne connaissance et une volonté de mettre en œuvre des pratiques de science ouverte (au sens par exemple de pre- enregistrement, mise à disposition des données…) sont attendues, une adoption de ces pratiques déjà effective (lorsque le type de recherche le permet) sera en outre très appréciée »
This can be roughly translated as: “A good knowledge and the willingness to put in place open science practices (for example, pre-registration or sharing of data) are expected. It will be highly valued if one has already adopted these practices (when the research permits it).” The statement was adopted by an overwhelming majority. We at the Co-Re lab are thrilled that this statement will be communicated to future job candidates.
December 10th, 2019. Richard Klein, Tilburg University; Christine Vitiello, University of Florida; Kate A. Ratliff, University of Florida. This is a repost from the Center for Open Science’s blog.
We present results from Many Labs 4, which was designed to investigate whether contact with original authors and other experts improved replication rates for a complex psychological paradigm. However, the project is largely uninformative on that point as, instead, we were unable to replicate the effect of mortality salience on worldview defense under any conditions.
Recent efforts to replicate findings in psychology have been disappointing. There is a general concern among many in the field that a large number of these null replications are because the original findings are false positives, the result of misinterpreting random noise in data as a true pattern or effect.
But, failures to replicate are inherently ambiguous and can result from any number of contextual or procedural factors. Aside from the possibility that the original is a false positive, it may instead be the case that some aspect of the original procedure does not generalize to other contexts or populations, or the procedure may have produced an effect at one point in time but those conditions no longer exist. Or, the phenomena may not be sufficiently understood so as to predict when it will and will not occur (the so-called “hidden moderators” explanation).
Another explanation — often made informally — is that replicators simply lack the necessary expertise to conduct the replication properly. Maybe they botch the implementation of the study or miss critical theoretical considerations that, if corrected, would have led to successful replication. The current study was designed to test this question of researcher expertise by comparing results generated from a research protocol developed in consultation with the original authors to results generated from research protocols designed by replicators with little or no particular expertise in the specific research area. This study is the fourth in our line of “Many Labs” projects, in which we replicate the same findings across many labs around the world to investigate some aspect of replicability.
To look at the effects of original author involvement on replication, we first had to identify a target finding to replicate. Our goal was a finding that was likely to be generally replicable, but that might have substantial variation in replicability due to procedural details (e.g. a finding with strong support but that is thought to require “tricks of the trade” that non-experts might not know about). Most importantly, we had to find key authors or known experts who were willing to help us develop the materials. These goals often conflicted with one another.
We ultimately settled on Terror Management Theory (TMT) as a focus for our efforts. TMT broadly states that a major problem for humans is that we are aware of the inevitability of our own death; thus, we have built-in psychological mechanisms to shield us from being preoccupied with this thought. In consultation with those experts most associated with TMT, we chose Study 1 of Greenberg et al. (1994) for replication. The key finding was that, compared to a control group, U.S. participants who reflected on their own death were higher in worldview defense; that is, they reported a greater preference for an essay writer adopting a pro-U.S. argument than an essay writer adopting an anti-U.S. argument.
We recruited 21 labs across the U.S. to participate in the project. A randomly assigned half of these labs were told which study to replicate, but were prohibited from seeking expert advice (“In House” labs). The remaining half of the labs all followed a set procedure based on the original article, and incorporating modifications, advice, and informal tips gleaned from extensive back-and-forth with multiple original authors (“Author Advised” labs).* In all, the labs collected data from 2,200+ participants.
The goal was to compare the results from labs designing their own replication, essentially from scratch using the published method section, with the labs benefitting from expert guidance. One might expect that the latter labs would have a greater likelihood of replicating the mortality salience effect, or would yield larger effect sizes. However, contrary to our expectation, we found no differences between the In House and Author Advised labs because neither group successfully replicated the mortality salience effect. Across confirmatory and exploratory analyses we found little to no support for the effect of mortality salience on worldview defense at all.
In many respects, this was the worst possible outcome — if there is no effect then we can’t really test the metascientific questions about researcher expertise that inspired the project in the first place. Instead, this project ends up being a meaningful datapoint for TMT itself. Despite our best efforts, and a high-powered, multi-lab investigation, we were unable to demonstrate an effect of mortality salience on worldview defense in a highly prototypical TMT design. This does not mean that the effect is not real, but it certainly raises doubts about the robustness of the effect. An ironic possibility is that our methods did not successfully capture the exact fine-grained expertise that we were trying to investigate. However, that itself would be an important finding — ideally, a researcher should be able to replicate a paradigm solely based on information provided in the article or other readily available sources. So, the fact that we were unable to do so despite consulting with original authors and enlisting 21 labs, all of which were highly trained in psychology methods is problematic.
From our perspective, a convincing demonstration of basic mortality salience effects is now necessary to have confidence in this area moving forward. It is indeed possible that mortality salience only influences worldview defense during certain political climates or among catastrophic events (e.g. national terrorist attacks), or other factors explain this failed replication. A robust Registered Report-style study, where outcomes are predicted and analyses are specified in advance, would serve as a critical orienting datapoint to allow these questions to be explored.
Ultimately, because we failed to replicate the mortality salience effect, we cannot speak to whether (or the degree to which) original author involvement improves replication attempts.** Replication is a necessary but messy part of the scientific process, and as psychologists continue replication efforts it remains critical to understand the factors that influence replication success. And, it remains critical to question, and empirically test, our intuitions and assumptions about what might matter.
*At various points we refer to “original authors”. We had extensive communication with several authors of the Greenberg et al., 1994 piece, and others who have published TMT studies. However, that does not mean that all original authors endorsed each of these choices, or still agree with them today. We don’t want to put words in anyone’s mouth, and, indeed, at least one original author expressly indicated that they would not run the study given the timing of the data collection — September 2016 to May 2017, the period leading up to and following the election of Donald Trump as President of the United States. We took steps to address that concern, but none of this means the original authors “endorse” the work.
**Interested readers should also keep an eye out for Many Labs 5 which looks at similar issues. The Co-Re lab was involved in Many Labs 5 as well.
Greenberg, J., Pyszczynski, T., Solomon, S., Simon, L., & Breus, M. (1994). Role of consciousness and accessibility of death-related thoughts in mortality salience effects. Journal of Personality and Social Psychology, 67(4), 627-637.