Using ChatGPT for Peer Review: A Snapshot of Views
AI solutions have already been found valuable in basic screening of submitted manuscripts, helping journals efficiently screen out papers with obvious errors and missing information. Recently, large language models (LLMs) like ChatGPT have emerged as powerful tools in natural language processing, sparking discussions about their potential use in peer review at academic journals. In this article, we take a look at various views on the use of LLMs to create peer review reports.
In the pre-GPT3 era…
A group of authors specifically developed an AI tool to experimentally investigate how well AI approximate human decisions in a journal’s manuscript quality assessment and peer-review process. When reporting their results, Checco et al. (2021) remarked “Machine-learning techniques are inherently conservative, as they are trained with data from the past.” They also point out that if a tool similar to theirs is used for actual peer review, it could “… lead to unintended consequences, like the creation of biased rules that could penalise under-represented groups or even individuals.”
A look at the pros and cons
Hosseini and Horbach (2023) specifically examined the use of LLMs in the publication process. They found that while LLMs could be used to summarize peer review reports and draft editorial decision letters, they also “might exacerbate existing challenges of the peer review system such as fake peer reviews as they allow fraudsters to create more unique and well-written reviews.” Further, they point out that “LLMs are still in early stages of their development and for the moment seem only suitable to improve the first draft of a review instead of writing a review from scratch.” Consequently, they strongly recommend that journal editors and peer reviewers fully disclose whether and how they’ve used LLMs in manuscript-related decision making.
A case strongly against LLMs in peer review
One author, Donker (2023), shared his experience about using an LLM to generate a peer review in an article published in The Lancet Infectious Diseases. He found that the AI-generated peer review report produced a number of comments that sounded genuine but didn’t actually pertain to the manuscript under review. In fact, the LLM even generated a list of spurious references to cite. “The real risk here is that the LLM produced a review report that looks properly balanced but has no specific critical content about the manuscript or the described study. Because it summarises the paper and methodology remarkably well, it could easily be mistaken for an actual review report by those that have not fully read the manuscript. Even worse, the specific but unrelated comments could be perceived as reason for rejection.”
His experience led him to strongly recommend against using LLMs for peer review: “Editors should make sure that comments in review reports truly relate to the manuscript in question, and authors should be even more ready to challenge reviewer comments that are seemingly unrelated, and above all, reviewers should refrain from using LLM tools.”
What journals and publishers have to say
Recently, in April 2023, a social sciences researcher pointed out fictitious authors and papers in an AI-generated peer review of his paper from an unspecified Emerald journal. An unnamed spokesperson from Emerald Publishing was quoted by Times Higher Education as saying “ChatGPT and other AI tools should not be utilised by reviewers of papers submitted to journals published by Emerald. As with authorship, AI tools/LLMs should not replace the peer review process that relies on human subject matter expertise and critical appraisal.”
The Program Chairs of the ICCV 2023 conference are even more explicit in their stance against LLMs in peer review. Their guidelines for peer reviewers state “It is unethical to resort to Large Language Models (e.g., ChatGPT) to automatically generate reviewing comments that do not originate from the reviewer’s own opinions.” They also require every reviewer to confirm that each review reflects their original opinions and that no part of their report has been generated by an automatic system.
Some journals still prefer to maintain a neutral stance. An April 2023 editorial simultaneously published in Arthritis Care & Research and Arthritis & Rheumatology stated that “Although we do not anticipate substituting human peer reviewers with LLM AI tools, we will monitor whether such tools can be a useful adjunct.”
The bottom line…
As the demands of academic publishing continue to evolve, incorporating LLMs into the peer review process presents an attractive option for peer reviewers to boost their efficiency and productivity. However, at the current stage of LLMs, caution must be exercised due to ethical considerations and the need for human judgment. As newer and more sophisticated LLMs are developed, they could potentially be valuable allies in the peer review process, acting as a second pair of eyes and eliminating repetitive writing tasks while preserving the essence of human expertise.