Close Menu
  • Start
  • Celebrities
  • Music
  • Influencers
  • Tendencies
  • Exclusives
  • Business & Brands
  • TwinH
  • Spanish
What's Hot

‘Girls Like Girls’ favors nostalgia over the depth of a young queer awakening story

This special Babbel offer gives you lifetime access to lessons created by linguists

Deadmau5 adopts a cat he rescued by donating to an animal shelter

Facebook X (Twitter) Instagram
  • Home
  • About The FYMOUS
  • Advertising / Promotion
  • Contact
  • DMCA
  • Privacy Policy
  • Terms
  • Publish News
Facebook X (Twitter) Instagram
FYMOUS News
  • Start
  • Celebrities
  • Music
  • Influencers
  • Tendencies
  • Exclusives
  • Business & Brands
  • TwinH
  • Spanish
FYMOUS News
Home » Openai’s model “remembered” copyrighted content suggests new research
Exclusives

Openai’s model “remembered” copyrighted content suggests new research

By April 4, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

The new research appears to give credit to allegations that Openai has trained at least some of the AI ​​models for copyrighted content.

Openai is caught up in a lawsuit brought by authors, programmers and other rights holders. This accuses the company of developing models without permission using books, codebases, etc. Openai has long advocated fair use defense, but plaintiffs in these cases argue that there is no sculpture in the US copyright laws for training data.

The study was co-authored by researchers at Washington University, Copenhagen University and Stanford University, and proposes new methods for identifying training data “memorized” by models behind APIs like Openai.

The model is a prediction engine. Learn trained patterns with lots of data – that’s how they generate essays, photos, etc. Most output is not a verbatim copy of the training data, but it is inevitably so due to the way the model “trains”. Image models are known to reflux screenshots of trained films, while linguistic models have been observed to effectively plagiarize news articles.

The method of this research relies on what co-authors call “high rise,” that is, what stands out as rare in the context of larger works. For example, the word “radar” in the sentence “Jack and I sat completely with radar humming” is considered high-rise, as it is less likely to appear statistically before “humming” than words like “engine” or “radio.”

The co-authors investigated signs of memorization by removing advanced words from fiction books and fragments of New York Times, including GPT-4 and GPT-3.5, and attempting to “predict” the models with words masked. If the model managed to guess correctly, it was possible that they memorized the snippet during training, and the co-authors concluded.

Openai Copyright Study
An example of a model “guessing” a highly praised word.Image credit: Openai

Test results showed that GPT-4 showed signs that it memorized a portion of a popular book, including a book containing a sample of a copyrighted e-book called Bookkmia. The results also suggest that the model remembered some of the New York Times article, but that is not at a relatively low speed.

Abhilasha Ravichander, a doctoral student at the University of Washington and a co-author of the study, told TechCrunch that the findings could shed light on the “controversial data” model.

“To create a large, reliable language model, we need a model that can be scientifically investigated, audited and inspected,” Ravichander said. “Our work is aimed at providing tools to explore large-scale language models, but greater data transparency is needed across the ecosystem.”

Openai has long advocated for loose restrictions on model development using copyrighted data. The company is conducting specific content licensing transactions and offers an opt-out mechanism that allows copyright holders to flag content they don’t like to use for training purposes, but it lobbys several governments to codify the “fair use” rules regarding AI training approaches.


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleWe will know in a few weeks whether Russia will be serious about the Ukraine ceasefire, Rubio says | News of the Russian-Ukraine War
Next Article Trump will extend Tiktok sales deadline again to avoid US ban | Business and Economy News

Related Posts

This special Babbel offer gives you lifetime access to lessons created by linguists

June 16, 2026

Best Espresso Machine Deal: 20% Off Ninja Luxe Cafe Pro Series

June 16, 2026

Prime Day Early Kitchen Sale: Ninja, Keurig, Breville, Calphalon on sale

June 16, 2026
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

‘Girls Like Girls’ favors nostalgia over the depth of a young queer awakening story

This special Babbel offer gives you lifetime access to lessons created by linguists

Deadmau5 adopts a cat he rescued by donating to an animal shelter

Ranking of all official World Cup songs

Trending Posts

Deadmau5 adopts a cat he rescued by donating to an animal shelter

June 16, 2026

Ranking of all official World Cup songs

June 16, 2026

Jennifer Lopez needed to find herself again after divorce from Affleck

June 16, 2026

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to The FYMOUS, a modern digital media platform dedicated to celebrities, artists, influencers, brands, entertainment culture, and the growing TwinH ecosystem.

We bring audiences closer to the people, stories, trends, and collaborations shaping today’s culture. From exclusive celebrity news and music releases to influencer highlights, brand partnerships, and TwinH activations, The FYMOUS delivers engaging content designed for the next generation of digital audiences.

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About The FYMOUS
  • Advertising / Promotion
  • Contact
  • DMCA
  • Privacy Policy
  • Terms
  • Publish News
© 2026 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.