It is known that statistically significant (positive) results are more likely to be published than non-significant (negative) results. However, it has been unclear whether any increasing prevalence of positive results is stronger in the “softer” disciplines (social sciences) than in the “harder” disciplines (physical sciences), and whether the prevalence of negative results is decreasing over time. Using Scopus, we searched the abstracts of papers published between 1990 and 2013, and measured longitudinal trends of multiple expressions of positive versus negative results, including

It is well known that the distribution of

False positives can be caused by questionable statistical practices (called ‘

As a reaction to the abundance of statistically significant results in the scientific literature, methodologists have emphasized that null results should not remain in the file drawer, and that the decision to publish should be based on methodological soundness rather than novelty or statistical significance (

It is worth noting that earlier, in the 1980s and 1990s, the social sciences were also said to be in a crisis. Funding agents threatened to cut budgets because they were frustrated with the ongoing production of small and inconsistent effect sizes (

Akin to signal detection theory, measures that decrease false positives will lead to more false negatives (

So far, research on longitudinal trends of positive versus negative results has been scarce. An exception is

The figure was created by graphically extracting the data shown in

All four aforementioned longitudinal analyses require updating, as they cover periods up to 2007 (

In summary, it is well known that positive results (i.e., results that are statistically significant) are more likely to be published than negative (i.e., null) results (e.g.,

The aim of this study was to estimate longitudinal trends of positive versus negative results in the scientific literature, and to compare these trends between disciplines and countries. We chose for an automated search, akin to

We investigated longitudinal trends between 1990 and 2013 for

Our searches were conducted with Elsevier’s Scopus. After trying out other search engines (i.e., Web of Science and Google Scholar), we concluded that Scopus offers the most accurate and powerful search and export features.

Google Scholar indexes more papers than Web of Science and Scopus but has various disadvantages: (1) Google Scholar does not allow for nested Boolean searches; (2) Google Scholar does not provide a possibility for exclusively searching in the abstracts of papers; (3) When the total number of records is greater than 1,000, the estimate of this number is inaccurate^{1}

For example, the query “

Scopus has some unique strengths. By using braces ({ }), it allows for searches in which punctuation marks and mathematical operators are taken into consideration, whereas quotation marks are used for more liberal searches. Both Web of Science and Google Scholar neglect punctuation marks and mathematical operators. That is, Scopus is the only one of the three search engines that can distinguish between the search queries {p = 0.001} and {p < 0.001}. Another difference between Scopus and Web of Science (and the associated Essential Science Indicators (

Scopus classifies papers into 27 subject areas; we grouped these into three scientific disciplines, each discipline including subject areas as close as possible to

The Mathematics and Multidisciplinary subject areas were not grouped in any of the aforementioned disciplines, to replicate

To investigate whether longitudinal trends in significance reporting differ between regions of the world, we distinguished the following world regions as in

United States (US).

Fifteen European countries (EU15): Austria, Belgium, Denmark, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg, The Netherlands, Portugal, Spain, Sweden, and United Kingdom.

Seven Asian countries (AS7): China, Hong Kong, India, Japan, Singapore, South Korea, and Taiwan.

Note that a paper can belong to multiple world regions due to multiple authors with affiliations in countries from different world regions, or due to an author having multiple affiliations in countries from different world regions.

The following queries were conducted using the advanced search function of Scopus for the abstracts of all papers in the database as well as for each discipline and world region defined above:

ABS({ .}), to extract the total number of papers with an abstract.

A query containing

A query containing

“Significant difference” versus “no significant difference” and variants thereof, namely:

A query containing typical expressions for reporting significant differences, that is: ABS(({significant difference} OR {significant differences}) AND NOT ({no significant difference} OR {no significant differences} OR {no statistically significant difference} OR {no statistically significant differences})), to extract the number of papers with a textual manifestation of significant results.

A query containing typical expressions for reporting no significant differences, that is: ABS({no significant difference} OR {no significant differences} OR {no statistically significant difference} OR {no statistically significant differences}), to extract the number of papers with a textual manifestation of non-significant results.

A query containing variants of the expression

A query containing variants of the expression

The following measures were calculated for items 2–4 of the previous list, per publication year, for all papers in the Scopus database as well as per scientific discipline and per world region:

Percentage of papers reporting significant results (100%*[

Percentage of papers reporting non-significant results (100%*[

Ratio of significant to non-significant results (

Percentage of significance-testing papers reporting significant results (100%*[

Longitudinal trends were assessed by means of the slope coefficient estimates and corresponding 95% confidence intervals of a simple linear regression analysis, with the publication year as predictor variable and the ratio of significant to non-significant results (

As mentioned above, Scopus classifies papers into multiple subject areas. To gain insight into the number of papers belonging ‘purely’ to one discipline and to assess how cross-classification may have affected our analyses, we repeated the search queries described in “Longitudinal trends and their comparisons between scientific disciplines and between world regions” for: (1) pure disciplines: papers classified by Scopus into subject areas that belong to the same discipline; and (2) discipline intersections: papers classified by Scopus into subject areas that belong to two or all three disciplines.

Similarly, to gain insight into the number of papers with authors affiliated with countries from the same world region versus the number of papers with author affiliations spanning across world regions, we repeated the search queries described in “Longitudinal trends and their comparisons between scientific disciplines and between world regions” for: (1) pure world regions: papers all authors of which were affiliated solely with countries belonging to one and the same world region; and (2) world region intersections: papers by authors affiliated with countries belonging to two or all three world regions.

Venn diagrams were constructed for

To investigate whether the longitudinal trends of

1. | All abstracts | ABS({.}) |

2. | < 0.001 | ABS({p < 0.001} OR {p < .001} OR {p < = 0.001} OR {p < = .001} OR {p ≤ 0.001} OR {p ≤ .001}) |

3. | > 0.001 | ABS({p > 0.001} OR {p > .001}) |

4. | < 0.01 | ABS({p < 0.01} OR {p < .01} OR {p < = 0.01} OR {p < = .01} OR {p ≤ 0.01} OR {p ≤ .01}) |

5. | > 0.01 | ABS({p > 0.01} OR {p > .01}) |

6. | < 0.05 | ABS({p < 0.05} OR {p < .05} OR {p < = 0.05} OR {p < = .05} OR {p ≤ 0.05} OR {p ≤ .05}) |

7. | > 0.05 | ABS({p > 0.05} OR {p > .05}) |

8. | < 0.10 | ABS({p < 0.10} OR {p < .10} OR {p < = 0.10} OR {p < = .10} OR {p ≤ 0.10} OR {p ≤ .10}) |

9. | > 0.10 | ABS({p > 0.10} OR {p > .10}) |

10. | = 0.001 | ABS({p = 0.001} OR {p = .001}) |

11. | 0.002–0.005 | ABS({p = 0.002} OR {p = .002} OR {p = 0.003} OR {p = .003} OR {p = 0.004} OR {p = .004} OR {p = 0.005} OR {p = .005}) |

12. | 0.006–0.009 | ABS({p = 0.006} OR {p =.006} OR {p = 0.007} OR {p =.007} OR {p = 0.008} OR {p =.008} OR {p = 0.009} OR {p =.009}) |

13. | 0.011–0.019 | ABS({p = 0.011} OR {p =.011} OR {p = 0.012} OR {p =.012} OR {p = 0.013} OR {p =.013} OR {p = 0.014} OR {p =.014} OR {p = 0.015} OR {p =.015} OR {p = 0.016} OR {p =.016} OR {p = 0.017} OR {p =.017} OR {p = 0.018} OR {p =.018} OR {p = 0.019} OR {p =.019}) |

14. | 0.021–0.029 | ABS({p = 0.021} OR {p =.021} OR {p = 0.022} OR {p =.022} OR {p = 0.023} OR {p =.023} OR {p = 0.024} OR {p =.024} OR {p = 0.025} OR {p =.025} OR {p = 0.026} OR {p =.026} OR {p = 0.027} OR {p =.027} OR {p = 0.028} OR {p =.028} OR {p = 0.029} OR {p =.029}) |

15. | 0.031–0.039 | ABS({p = 0.031} OR {p =.031} OR {p = 0.032} OR {p =.032} OR {p = 0.033} OR {p =.033} OR {p = 0.034} OR {p =.034} OR {p = 0.035} OR {p =.035} OR {p = 0.036} OR {p =.036} OR {p = 0.037} OR {p =.037} OR {p = 0.038} OR {p =.038} OR {p = 0.039} OR {p =.039}) |

16. | 0.041–0.049 | ABS({p = 0.041} OR {p =.041} OR {p = 0.042} OR {p =.042} OR {p = 0.043} OR {p =.043} OR {p = 0.044} OR {p =.044} OR {p = 0.045} OR {p =.045} OR {p = 0.046} OR {p =.046} OR {p = 0.047} OR {p =.047} OR {p = 0.048} OR {p =.048} OR {p = 0.049} OR {p =.049}) |

17. | 0.051–0.059 | ABS({p = 0.051} OR {p =.051} OR {p = 0.052} OR {p =.052} OR {p = 0.053} OR {p =.053} OR {p = 0.054} OR {p =.054} OR {p = 0.055} OR {p =.055} OR {p = 0.056} OR {p =.056} OR {p = 0.057} OR {p =.057} OR {p = 0.058} OR {p =.058} OR {p = 0.059} OR {p =.059}) |

18. | 0.061–0.069 | ABS({p = 0.061} OR {p =.061} OR {p = 0.062} OR {p =.062} OR {p = 0.063} OR {p =.063} OR {p = 0.064} OR {p =.064} OR {p = 0.065} OR {p =.065} OR {p = 0.066} OR {p =.066} OR {p = 0.067} OR {p =.067} OR {p = 0.068} OR {p =.068} OR {p = 0.069} OR {p =.069}) |

19. | 0.071–0.079 | ABS({p = 0.071} OR {p =.071} OR {p = 0.072} OR {p =.072} OR {p = 0.073} OR {p =.073} OR {p = 0.074} OR {p =.074} OR {p = 0.075} OR {p =.075} OR {p = 0.076} OR {p =.076} OR {p = 0.077} OR {p =.077} OR {p = 0.078} OR {p =.078} OR {p = 0.079} OR {p =.079}) |

20. | 0.081–0.089 | ABS({p = 0.081} OR {p =.081} OR {p = 0.082} OR {p =.082} OR {p = 0.083} OR {p =.083} OR {p = 0.084} OR {p =.084} OR {p = 0.085} OR {p =.085} OR {p = 0.086} OR {p =.086} OR {p = 0.087} OR {p =.087} OR {p = 0.088} OR {p =.088} OR {p = 0.089} OR {p =.089}) |

21. | 0.091–0.099 | ABS({p = 0.091} OR {p =.091} OR {p = 0.092} OR {p =.092} OR {p = 0.093} OR {p =.093} OR {p = 0.094} OR {p =.094} OR {p = 0.095} OR {p =.095} OR {p = 0.096} OR {p =.096} OR {p = 0.097} OR {p =.097} OR {p = 0.098} OR {p =.098} OR {p = 0.099} OR {p =.099}) |

22. | 0.01 | ABS({p = 0.010} OR {p =.010} OR {p = 0.01} OR {p =.01}) |

23. | 0.02 | ABS({p = 0.020} OR {p = .020} OR {p = 0.02} OR {p = .02}) |

24. | 0.03 | ABS({p = 0.030} OR {p = .030} OR {p = 0.03} OR {p = .03}) |

25. | 0.04 | ABS({p = 0.040} OR {p = .040} OR {p = 0.04} OR {p = .04}) |

26. | 0.05 | ABS({p = 0.050} OR {p = .050} OR {p = 0.05} OR {p = .05}) |

27. | 0.06 | ABS({p = 0.060} OR {p = .060} OR {p = 0.06} OR {p = .06}) |

28. | 0.07 | ABS({p = 0.070} OR {p = .070} OR {p = 0.07} OR {p = .07}) |

29. | 0.08 | ABS({p = 0.080} OR {p = .080} OR {p = 0.08} OR {p = .08}) |

30. | 0.09 | ABS({p = 0.090} OR {p = .090} OR {p = 0.09} OR {p = .09}) |

31. | p = NS or p = N.S. | ABS({p = NS} OR {p = N.S.}) |

32. | “significant difference(s)” | ABS(({significant difference} OR {significant differences} OR {significantly different} OR {differed significantly}) AND NOT ({no significant difference} OR {no significant differences} OR {no statistically significant difference} OR {no statistically significant differences} OR {not significantly different} OR {did not differ significantly})) |

33. | “no significant difference(s)” | ABS({no significant difference} OR {no significant differences} OR {no statistically significant difference} OR {no statistically significant differences} OR {not significantly different} OR {did not differ significantly}) |

34. | “significant effect(s)” | ABS(({significant effect} OR {significant effects}) AND NOT ({no significant effect} OR {no significant effects} OR {no statistically significant effect} OR {no statistically significant effects} OR {not a significant effect} OR {not a statistically significant effect})) |

35. | “no significant effect(s)” | ABS({no significant effect} OR {no significant effects} OR {no statistically significant effect} OR {no statistically significant effects} OR {not a significant effect} OR {not a statistically significant effect}) |

36. | “supports the hypothesis” | ABS(({supports the hypothesis} OR {support the hypothesis} OR {supports our hypothesis} OR {support our hypothesis} ) AND NOT ({does not support the hypothesis} OR {do not support the hypothesis} OR {does not support our hypothesis} OR {do not support our hypothesis})) |

37. | “does not support the hypothesis” | ABS({does not support the hypothesis} OR {do not support the hypothesis} OR {does not support our hypothesis} OR {do not support our hypothesis}) |

38. | “significantly higher/more” | ABS({significantly higher} OR {significantly more}) |

39. | “significantly lower/less” | ABS({significantly lower} OR {significantly less}) |

40. | “marginally significant” | ABS(“marginally significant”) |

41. | “important finding” | ABS(“important finding” OR “important findings”) |

42. | “pH 7” | ABS(“ph 7”) |

43. | “mass of” | ABS(“mass of”) |

44. | “room temperature” | ABS(“room temperature”) |

45. | “melting point” | ABS(“melting point”) |

46. | “field of view” | ABS(“field of view”) |

47. | “the properties of” | ABS(“the properties of”) |

48. | “the aim of” | ABS(“the aim of”) |

49. | “our aim” | ABS(“our aim”) |

50. | “results showed that” | ABS(“results showed that”) |

51. | “in conclusion” | ABS(“in conclusion”) |

52. | “longitudinal study” | ABS(“longitudinal study”) |

53. | “in other words” | ABS(“in other words”) |

54. | “on the other hand” | ABS(“on the other hand”) |

55. | “a novel” | ABS(“a novel”) |

56. | “a new” | ABS(“a new”) |

57. | “was/were measured” | ABS(“was measured” OR “were measured”) |

58. | “we measured” | ABS(“we measured”) |

59. | “paradigm shift” | ABS(“paradigm shift”) |

60. | data | ABS(data) |

61. | information | ABS(information) |

62. | experiment | ABS(experiment) |

63. | important | ABS(important) |

64. | interesting | ABS(interesting) |

65. | neutral | ABS(neutral) |

66. | positive | ABS(positive) |

67. | negative | ABS(negative) |

68. | “highly significant” | ABS(“highly significant” AND NOT “not highly significant”) |

69. | “trend toward” | ABS(“trend toward”) |

70. | “an increasing trend” | ABS(“an increasing trend”) |

71. | “a decreasing trend” | ABS(“a decreasing trend”) |

72. | “potentially significant” | ABS(“potentially significant”) |

73. | “a nonsignificant trend” | ABS(“a nonsignificant trend” OR “a non significant trend”) |

74. | “a significant trend” | ABS(“a significant trend”) |

75. | “quite significant” | ABS(“quite significant”) |

76. | “a clear trend” | ABS(“a clear trend”) |

77. | “a positive trend” | ABS(“a positive trend”) |

78. | “a strong trend” | ABS(“a strong trend”) |

79. | “significant tendency” | ABS(“significant tendency”) |

80. | “a little significant” | ABS(“a little significant”) |

81. | “not insignificant” | ABS(“not insignificant”) |

82. | “possible significance” | ABS(“possible significance”) |

83. | “failed to reach statistical significance” | ABS(“failed to reach statistical significance”) |

84. | “likely to be significant” | ABS(“likely to be significant”) |

For these 77 queries as well as for the 6 queries described in “Longitudinal trends and their comparisons between scientific disciplines and between world regions” (i.e., queries 2a, b, 3a, b, and 4a, b), the ratio (_{2013}/_{2013})/(_{1990}/_{1990}) was calculated, where _{2013} and _{1990} are the total numbers of abstracts with terms of one of the 83 search queries in 2013 and 1990, respectively, and _{2013} and _{1990} are the total numbers of papers with an abstract for these two years (as derived by query 1 in “Longitudinal trends and their comparisons between scientific disciplines and between world regions”).

The following data were also extracted:

Yearly number of papers containing

Yearly number of papers with abstracts containing

Yearly number of papers containing

Each abstract was counted only once per search query, independent of whether it included one or more manifestations of significance. All data were extracted between 15 and 30 November 2014. All searches of numerical

According to our searches, Scopus contained a total of 30,677,779 papers with an abstract published between 1990 and 2013. Of these, 3,061,170 papers belonged to the social sciences, 14,412,460 papers belonged to the biological sciences, and 15,364,142 papers belonged to the physical sciences. The sum of the number of papers in the three disciplines was greater than the total number of papers, because some papers were classified into multiple disciplines. The US were found in the affiliations of 8,073,346 papers, EU15 in 8,790,965 papers, and AS7 in 7,413,236 papers.

The number of papers has increased over the years in all three scientific disciplines (

Both the

The dashed line represents the result of a simple linear regression analysis.

The reporting of “significant difference” has increased over time (

The dashed line represents the result of a simple linear regression analysis.

Both

The dashed line represents the result of a simple linear regression analysis.

In a supplementary analysis we searched for and compared abstracts containing

A comparison between disciplines shows that the use of

The dashed lines represent the results of a simple linear regression analysis.

The dashed lines represent the results of a simple linear regression analysis.

The use of

The dashed lines represent the results of a simple linear regression analysis.

The Venn diagrams in

“Other” refers to papers purely classified into subject areas outside the three disciplines. The percentages refer to the papers that were unique to each discipline (e.g., 96.50% of biological papers with

The slope coefficients are reported for all papers, and for papers in three scientific disciplines, both for cross-classified papers (grey bars) and for pure disciplines (orange bars). The numbers at the top of the figure represent: (1) first row: number of papers between 1990 and 2013 reporting significant results (

For the three world regions, both

The dashed lines represent the results of a simple linear regression analysis.

The use of “significant difference” increased while the use of “no significance difference” decreased for all three world regions (

The dashed lines represent the results of a simple linear regression analysis.

Reporting

The dashed lines represent the results of a linear regression analysis.

“Other” refers to papers purely affiliated with countries outside the three world regions. The percentages refer to the papers that were unique to each world region.

The slope coefficients are reported for papers in three world regions, both for cross-classified papers (grey bars) and for pure world regions (orange bars). The numbers at the top of the figure represent: (1) first row: number of papers between 1990 and 2013 reporting significant results (

The number at the right end of each bar is _{2013}. _{1990} = 561,516 and _{2013} = 2,311,772.

The numbers at the top of the graph represent the ratio of the percentage of papers in 2006–2013 to the percentage of papers in 1990–1997 averaged across 0.001–0.009, 0.011–0.019, 0.021–0.029, etc.

We investigated longitudinal trends between 1990 and 2013 for

The percentage of papers with

For three-digit

Reporting of “significant difference” versus “no significant difference” displayed a more modest increase than

The use of all

We found no support for the hierarchy of sciences as discussed by

The more salient finding of our analysis is the enormous differences in reporting practices between disciplines, with the use of

The virtual absence of

Overlap can also be seen between the social sciences and the biological sciences, with 86% of social sciences papers with

We conducted longitudinal analyses for papers classified into multiple disciplines and for pure disciplines. The analyses of cross-classified and pure papers yielded similar results. We consider the results of the analyses in which the papers were cross-classified as more trustworthy than the pure analyses. Excluding all multi-disciplinary papers limits the sample size and leaves us with a core of classical mono-disciplinary journals that are not representative of all (multi-disciplinary) research.

Previous studies have found that Asian research is more biased toward positive results than research elsewhere in the world (

Although cross-national differences must exist (in the sense that any null hypothesis has to be false because there is always some effect), the size and direction of the effects are currently elusive. Regional differences in significance reporting are probably obscured by important moderators such as (1) the emergence of China as the second publishing power after the US during the last decade (

Our automated string-search approach has some important limitations. First, our method is susceptible to faulty inclusions. To assess the occurrence of false positives that are unrelated to hypothesis testing, we performed a small follow-up analysis. Using a random number generator, we selected 24 papers per discipline (one per year from 1990 to 2013) that reported a

The papers that we assessed represent only a small fraction of all papers indexed in Scopus. For example, the papers with

We focused our main analysis on alpha = 0.05, the most commonly used threshold of significance and the default in many statistical software packages. A full text in ScienceDirect for “^{−4} and 10^{−8} (see Manhattan plots in genome-wide association studies;

Scopus is known to be incomplete for publications prior to 1996 (

A reviewer raised a concern about the completeness of the 2013 data in Scopus. According to the SCImago group that defines scientific indicators and rankings based on the Scopus database, “changes after October do not affect the database of the previous year any longer seriously” (

The three defined disciplines (social, biological, physical) are broad, and differences in statistical approaches between specialties within a discipline can be expected. As a follow-up analysis, we searched for

We opted for automated searches. Automated searches have been fiercely criticized in the past.

We argue that—provided that one uses a great number of diverse search terms and search strategies—an automated search should render a representative cross-sectional estimate of the trends in scientific publishing. Our manual inspection of the 72 abstracts mentioned in “Faulty inclusions” made it clear to us how difficult it really is to assess whether a paper supports a specific null hypothesis or not. We ran into three types of problems: (1) Issues of accessibility: Despite the fact that our university has 17,300 journal subscriptions, a large proportion of the full-text articles are hidden behind paywalls, which introduces serious availability bias (indicatively, for the 72 random abstracts, we did not have access to the full texts of 3/24 [13%] of papers in the biological sciences, 10/24 [42%] papers of papers in the physical sciences, and 10/24 [42%] of papers in the social sciences). (2) Issues of interpretation, especially when judging papers containing specialized terminology outside one’s field. (3) The fact that many papers do not explicitly report on a hypothesis or might test multiple hypotheses at the same time. All three issues lead to problems of reproducibility. The results of an automated search, on the other hand, are highly reproducible.

For manual searches a sample size of 3,000 to 5,000 articles seems to be the maximum achieved so far (cf.

We investigated longitudinal trends of positive versus negative results reported in the literature and compared these trends between scientific disciplines and between world regions. We found that both positive and negative results have become more prevalent and that the growth rates of both positive and negative results strongly depend on the search terms. The increase from 1990 to 2013 was evident for all

The fact that

In addition to questionable statistical practices, positive explanations for the observed trends are possible, particularly when one considers that not only significant but also non-significant results have increased. First, scientists may have become more knowledgeable, and therefore better able to formulate accurate predictions and design statistically powerful experiments that disprove a null hypothesis. Second, scientists may have become more likely to use, and report the results of, statistical significance testing. The observed longitudinal trends of the reporting of

The authors declare there are no competing interests.

_{2}⋅4H

_{2}O in applied magnetic fields