-
Loading metrics
Open Access
Peer-reviewed
Meta-Research Article
Meta-Research Articles feature data-driven examinations of the methods, reporting, verification, and evaluation of scientific research.
- John P. A. Ioannidis,
- Angelo Maria Pezzullo,
- Antonio Cristiano,
- Stefania Boccia,
- Jeroen Baas
x
- Published: January 30, 2025
- https://doi.org/10.1371/journal.pbio.3002999
Abstract
Retractions are becoming increasingly common but still account for a small minority of published papers. It would be useful to generate databases where the presence of retractions can be linked to impact metrics of each scientist. We have thus incorporated retraction data in an updated Scopus-based database of highly cited scientists (top 2% in each scientific subfield according to a composite citation indicator). Using data from the Retraction Watch database (RWDB), retraction records were linked to Scopus citation data. Of 55,237 items in RWDB as of August 15, 2024, we excluded non-retractions, retractions clearly not due to any author error, retractions where the paper had been republished, and items not linkable to Scopus records. Eventually, 39,468 eligible retractions were linked to Scopus. Among 217,097 top-cited scientists in career-long impact and 223,152 in single recent year (2023) impact, 7,083 (3.3%) and 8,747 (4.0%), respectively, had at least 1 retraction. Scientists with retracted publications had younger publication age, higher self-citation rates, and larger publication volume than those without any retracted publications. Retractions were more common in the life sciences and rare or nonexistent in several other disciplines. In several developing countries, very high proportions of top-cited scientists had retractions (highest in Senegal (66.7%), Ecuador (28.6%), and Pakistan (27.8%) in career-long citation impact lists). Variability in retraction rates across fields and countries suggests differences in research practices, scrutiny, and ease of retraction. Addition of retraction data enhances the granularity of top-cited scientists’ profiles, aiding in responsible research evaluation. However, caution is needed when interpreting retractions, as they do not always signify misconduct; further analysis on a case-by-case basis is essential. The database should hopefully provide a resource for meta-research and deeper insights into scientific practices.
Citation: Ioannidis JPA, Pezzullo AM, Cristiano A, Boccia S, Baas J (2025) Linking citation and retraction data reveals the demographics of scientific retractions among highly cited authors. PLoS Biol 23(1):
e3002999.
https://doi.org/10.1371/journal.pbio.3002999
Academic Editor: Anita Bandrowski, University of California San Diego, UNITED STATES OF AMERICA
Received: September 16, 2024; Accepted: January 2, 2025; Published: January 30, 2025
Copyright: © 2025 Ioannidis et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The full datasets are available at https://doi.org/10.17632/btchxktzyw.7.
Funding: The work of AC has been supported by the European Network Staff Exchange for Integrating Precision Health in the Healthcare Systems project (Marie Skłodowska-Curie Research and Innovation Staff Exchange no. 823995). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: JB is an Elsevier employee. Elsevier runs Scopus, which is the source of these data, and also runs the repository where the database of highly-cited scientists is now stored.
Introduction
Retractions of publications are a central challenge for science and their features require careful study [1–3]. In empirical surveys, various types of misconduct are typically responsible for most retractions [4]. The landscape of retractions is becoming more complex with the advent of papermills, massive production of papers that are typically fake/fabricated and where people may buy authorship in their masthead [5]. However, the reasons for retractions are not fully standardized, and many retractions are unclear about why a paper had to be withdrawn. Moreover, some retractions are clearly not due to ethical violations or author errors (e.g., they are due to publisher errors). Finally, in many cases, one may view a retraction as a sign of a responsible author who should be congratulated, rather than chastised, for taking proactive steps to correct the literature. Prompt correction of honest errors, major or minor, is a sign of responsible research practices.
The number of retracted papers per year is increasing, with more than 10,000 papers retracted in 2023 [6]. The countries with the highest retraction rates (per 10,000 papers) are Saudi Arabia (30.6), Pakistan (28.1), Russia (24.9), China (23.5), Egypt (18.8), Malaysia (17.2), Iran (16.7), and India (15.2) [6]. However, retractions abound also in highly developed countries [7]. There has also been a gradual change in the reasons for retractions over time [8]: the classic, traditional types of research misconduct (falsification, fabrication, plagiarism, and duplication) that involved usually one or a few papers at a time have been replaced in the top reasons by large-scale, orchestrated fraudulent practices (papermills, fake peer-review, artificial intelligence generated content). Clinical and life sciences account for about half of the retractions that are apparently due to misconduct [9], but electrical engineering/electronics/computer science (EEECS) have an even higher proportion of retractions per 10,000 published papers [9]. Clinical and life sciences disciplines have the highest rates of retractions due to traditional reasons of misconduct, while EEECS disciplines have a preponderance of large-scale orchestrated fraudulent practices.
Here, we aimed to analyze the presence of any retracted papers for all the top-cited scientists across all 174 subfields of science. Typical impact metrics for scientists revolve around publications and their citations. However, citation metrics need to be used with caution [10] to avoid obtaining over-simplified and even grossly misleading views of scientific excellence and impact. We therefore updated and extended databases of standardized citation metrics across all scientists and scientific disciplines [11–14] to include information on retractions for each scientist. Systematic indicators of research quality and integrity are important to examine side-by-side with traditional citation impact data [15,16]. A widely visible list of highly cited scientists issued annually by Clarivate based on Web of Science no longer includes any scientists with retracted publications [17]. In our databases, which cover a much larger number of scientists with more detailed data on each, we have added information on the number of retracted publications, if any, for all listed scientists. Given the variability of the reasons behind retraction, this information can then be interpreted by any assessors on a case-by-case basis with in-depth assessment of reasons, and circumstances of each retraction.
Using our expanded databases, we aimed to answer the following questions: How commonly have top-cited scientists retracted papers? Are there any features that differentiate top-cited scientists with versus without retracted papers? Are specific scientific fields and subfields more likely to have top-cited scientists with retracted papers? Do some countries have higher rates of retractions among their top-cited scientists? Finally, how much do citations to and from retracted papers contribute to the overall citation profile of top-cited scientists? As we present these analyses, we also hope that this new resource will be useful for further meta-research studies that may be conducted by investigators on diverse samples of scientists and scientific fields.
Methods and results
To add the new information on retractions, we depended on the most reliable database of retractions available to date, the Retraction Watch database (RWDB, RRID:SCR_000654) which is also publicly freely available through CrossRef (RRID:SCR_003217). Among the 55,237 RWDB entries obtained from CrossRef (https://api.labs.crossref.org/data/retractionwatch) on August 15, 2024, we focused on the 50,457 entries where the nature of the notice is classified as “Retraction”, excluding other types (corrections, expressions of concern) that may also be covered in RWDB. From this set, we excluded entries where the paper had been retracted but then replaced by a new version (which can suggest that the errors were manageable to address and there is a new version representing the work in the published literature), and those entries where the retraction was clearly solely not due to any error or wrong-doing by the authors (e.g., publisher error). Therefore, we excluded entries where the reason for retraction was listed as “Retract and Replace,” “Error by Journal/Publisher,” “Duplicate Publication through Error by Journal/Publisher,” or “Withdrawn (out of date)”; however, for the latter 3 categories, these exclusions were only applied if there were no additional reasons listed that could be attributed potentially to the authors exclusively or in part, as detailed in S1 Table. This first filtering was automated and resulted in a set of 47,964 entries.
We tagged articles as retracted by linking retraction records to their corresponding entries in Scopus (RRID:SCR_022559). Initially, this linking is achieved by matching the OriginalPaperDOI with a DOI in Scopus. For retracted articles that do not have a direct DOI match, we employ an alternative strategy using the title and publication year, allowing for a 1-year discrepancy due to variations in the recorded publication year. To enhance the accuracy of our linking process, we perform data sanitization on both databases. DOIs are standardized by removing redundant prefixes and extraneous characters. Titles are normalized by stripping all non-alphanumeric characters and converting them to lowercase. Additionally, to avoid erroneous matches, especially with shorter titles, we impose a minimum length requirement of 32 characters for title matching. The code that demonstrates the linking strategy is published along the data set at https://elsevier.digitalcommonsdata.com/datasets/btchxktzyw/7.
Linking the retraction using the digital object identifier (DOI) of the original paper resulted in 38,364 matches. For entries where a DOI match was not possible, and where we attempted to link records using a combination of the title and year derived from the date of the original article, allowing for a +/− 1-year variation, resulted in 1,104 additional matches. This linkage process eventually resulted in a total of 39,468 matched records (Fig 1).
Calculation of the composite citation indicator and ranking of the scientists accordingly within their primary subfield (using the Science-Metrix classification of 20 fields and 174 subfields) were performed in the current iteration with the exact same methods as in previous iterations (described in detail in references [11–13]). Career-long impact counts citations received cumulatively across all years to papers published at any time, while single most recent year impact counts only citations received in 2023 to papers published at any time.
The new updated release of the databases includes 217,097 scientists who are among the top 2% of their primary scientific subfield in the career-long citation impact and 223,152 scientists who are among the top 2% in their single most recent year (2023) citation impact. These numbers also include some scientists (2,789 and 6,325 scientists in the 2 data sets, respectively) who may not be in the top 2% of their primary scientific subfield but are among the 100,000 top-cited across all scientific subfields combined. Among the top-cited scientists, 7,083 (3.3%) and 8,747 (4.0%), respectively, in the 2 datasets have at least 1 retracted publication, and 1,710 (0.8%) and 2,150 (1.0%), respectively, have 2 or more retracted publications. As shown in Fig 2, the distribution of the number of linked eligible retractions per author follows a power law.
Fig 2. Distribution of the number of retractions in top-cited scientists with at least 1 retraction.
(A) Database of top-cited authors based on career-long impact. (B) Database of top-cited authors based on single recent year (2023) impact. The data underlying this figure can be found in S1 Data.
Table 1 shows the characteristics of those top-cited scientists who have any retracted publications versus those who have not had any retractions. As shown, top-cited scientists with retracted publications tend to have younger publication ages, higher proportion of self-citations, higher ratio of h/hm index (indicating higher co-authorship levels), slightly better ranking, and higher total number of publications (p < 0.001 by Mann–Whitney U test (in R version 4.4.0 (RRID: SCR_001905)) for all indicators in the career-long impact data set and the single recent year data set, except for the publication age and the absolute ranking in the subfield in the single recent year data set. However, except for the number of papers published, the differences are small or modest in absolute magnitude. The proportion of scientists with retractions is highest though at the extreme top of ranking. Among the top 1,000 scientists with the highest composite indicator values, the proportion of those with at least 1 retraction are 13.8% and 11.1%, in the career-long and single recent year impact, respectively.
Table 2 shows the proportion of top-cited scientists with retracted publications across the 20 major fields that science is divided according to the Science-Metrix classification; information on the more detailed 174 subfields appears in S2 Table. The proportion of retractions varies widely across major fields, ranging from 0% to 5.5%. Clinical Medicine and Biomedical Research have the highest rates (4.8% to 5.5%). Enabling & Strategic Technologies, Chemistry and Biology have rates close to the average of all sciences combined. All other fields have from low to very low (or even zero) rates of scientists with retractions. When the 174 Science-Metrix subfields of science were considered, the highest pro