ArXiv, a widely used open repository for preprint research, is going further to crack down on the careless use of large language models in scientific papers.
Although papers are posted to the site before being peer-reviewed, arXiv (pronounced “archive”) has become one of the main ways research is distributed in fields such as computer science and mathematics, and the site itself is a source of data on trends in scientific research.
ArXiv has already taken steps to combat the rise in low-quality AI-generated papers, such as requiring first-time submissions to receive approval from prominent authors. And the organization, which Cornell University has hosted for more than 20 years, is becoming an independent nonprofit organization that should be able to raise more money to address issues like AI slop.
In the latest development, arXiv’s head of computer science, Thomas Dietterich, posted on Thursday: “If a submission contains indisputable evidence that the author did not check the results of LLM generation, this means that nothing in the paper can be trusted.”
Indisputable evidence could include things like “psychedelic references” or comments to or from LLM, Dieterich said. If such evidence is found, the paper’s authors face a one-year ban from arXiv, with the requirement that all subsequent arXiv submissions must first be accepted by a reputable peer review organization.
Note that this is not an outright ban on the use of LLMs, but rather an assertion that, as Dieterich puts it, authors are “full responsible” for their content, “regardless of how the content was generated.” Therefore, researchers are responsible even if they copy and paste “inappropriate language, plagiarized content, biased content, errors, mistakes, false references, or misleading content” directly from the LLM.
Dieterich told 404 Media that this would be a “one strike” rule, but moderators would have to flag the issue and the section chair would have to review the evidence before imposing a penalty. Authors may also appeal this decision.
A recent peer-reviewed study found that fabricated citations in biomedical research are on the rise, likely due to LLMs. To be fair, though, scientists aren’t the only ones who get busted for using AI-generated citations.
If you buy through links in our articles, we may earn a small commission. This does not affect editorial independence.
Source link
