P-hacking in academic research

P-hacking can be described as an explorative approach to data analysis with a flexible/opportunistic search space and selective reporting of primarily “statistically significant” findings. This leads to inflated type-1 error rates and to bias in reported estimates in the scientific literature. This thesis aimed to describe how p-hacking is manifested in academic research and to illustrate how bias from p-hacking is expected to affect the veracity of published findings using two specific examples from the published literature. The present thesis also argues that when evaluating published findings in the current academic environment, we should assume p-hacking and publication bias to be present a priori. This means that we cannot accept published findings at face value, unless there is explicit evidence indicating that the research was protected from these biases.The thesis made use of Monte Carlo simulations and systematic reviews of the literature in two specific fields: the proposed association between exposure to night work and breast cancer in women, and the association between job strain and coronary heart disease. A general model and mathematical framework to predict the expected bias from p-hacking was developed that can be used for a priori defined protected inferences (Ingre, 2013) of any published finding, under explicit assumptions of various levels of p-hacking. The model indicated close to 100% chance of demonstrating a false positive association in larger studies, but also showed that even minimal p-hacking results in substantial bias in reported estimates, and indicated an expected observed risk in the range RR=1.1--1.4 depending on study size with no true risk present in data, which threatens the validity of inferences from any study or meta-analysis with observed confidence intervals that cannot exclude such risks, if they cannot guarantee the absence of minimal p-hacking.The review of the literature identified large flexibility in the analytical process that allowed for the final model to be picked from a large pool of available models, with an implied search space of thousands of estimates. In addition, we identified eight distinct anomalies in the reviewed reports on the association between night work and breast cancer that were more consistent with a p-hacking strategy than with an attempt to accurately report the observed association. Four distinct observations in this thesis can be used to argue strong evidence for the presence of p-hacking and publication bias in the reviewed literature:Not a single one of the 17 reviewed studies on job strain and coronary heart disease reported the proper estimate of the job strain interaction (chapter 6) and our analysis showed that the proper estimate would not have been found statistically significant by the authors of any of the studies (chapter 7).One study described in detail a p-hacking strategy with a search space of at least 502 models in the discussion of the findings (chapter 5).One study based their conclusions on a speculative estimate after arbitrarily removing data, when estimates on the full group was available and indicating a non-significant association (chapter 5).Statistical power analyses on research into night work and breast cancer indicated that “statistically significant” findings were over-represented in the literature (p≈.001) indicating the presence of bias from p-hacking or selective publishing of significant findings (chapter 5).The findings also suggest that previously reported estimates in meta-analyses on the association between nightwork and breast cancer was likely to represent the prevailing bias in the field, and that the association was not supported by data.A bias adjusted meta-analysis on the job strain model and coronary heart disease with a total of 462220 subjects and 6836 CHD events, indicated no support for the job strain interaction (RR=1.00; 95% CI: 0.88--1.14). In addition, it did not show an increased risk for high job demand (RR=1.03; 95% CI: 0.97--1.11) but it did confirm previously reported risks for low job control (RR=1.11; 95% CI: 1.03--1.20). These findings contradicts the established knowledge in this field and questions the validity of the individual studies, as well as the four previously published meta-analyses on this association.The present thesis was founded in four previously published methodological comments (Ingre, 2013, 2014a, 2014b, 2015a) as well as one currently unpublished paper (Ingre, 2016) that are cited in the text.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.6084/m9.figshare.3393664.v48
PID https://www.doi.org/10.6084/m9.figshare.3393664.v49
PID https://www.doi.org/10.6084/m9.figshare.3393664.v53
PID https://www.doi.org/10.6084/m9.figshare.3393664.v51
PID https://www.doi.org/10.6084/m9.figshare.3393664.v50
PID https://www.doi.org/10.6084/m9.figshare.3393664.v43
PID https://www.doi.org/10.6084/m9.figshare.3393664.v47
PID https://www.doi.org/10.6084/m9.figshare.3393664
PID https://www.doi.org/10.6084/m9.figshare.3393664.v44
PID https://www.doi.org/10.6084/m9.figshare.3393664.v46
PID https://www.doi.org/10.6084/m9.figshare.3393664.v52
PID https://www.doi.org/10.6084/m9.figshare.3393664.v45
PID https://www.doi.org/10.6084/m9.figshare.3393664.v42
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v50
URL https://figshare.com/articles/P-hacking_in_academic_research_and_its_implications_for_statistical_inference/3393664
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v52
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v51
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v43
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v53
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v42
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v45
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v44
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v47
URL http://dx.doi.org/10.6084/m9.figshare.3393664
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v46
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v49
URL http://dx.doi.org/10.6084/m9.figshare.3393664.v48
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Ingre, Michael
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Datacite; figshare; FigShare
Hosted By figshare; FigShare
Publication Date 2017-04-19
Publisher figshare
Additional Info
Field Value
Language UNKNOWN
Resource Type Other literature type; Thesis
keyword FOS: Mathematics
keyword FOS: Political science
keyword FOS: Clinical medicine
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::1abad75f88e9af6db04baa1960c16550
Author jsonws_user
Last Updated 26 December 2020, 13:55 (CET)
Created 26 December 2020, 13:55 (CET)