Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

The TwinH Advantage: Unlocking New Potential in Digital Government Strategies

The best dating apps don’t even date apps

Secret Blizzard deploys malware to ISP-level AITM attacks against the Moscow embassy

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » Openai’s model “remembered” copyrighted content suggests new research
Startups

Openai’s model “remembered” copyrighted content suggests new research

userBy userApril 4, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

The new research appears to give credit to allegations that Openai has trained at least some of the AI ​​models for copyrighted content.

Openai is caught up in a lawsuit brought by authors, programmers and other rights holders. This accuses the company of developing models without permission using books, codebases, etc. Openai has long advocated fair use defense, but plaintiffs in these cases argue that there is no sculpture in the US copyright laws for training data.

The study was co-authored by researchers at Washington University, Copenhagen University and Stanford University, and proposes new methods for identifying training data “memorized” by models behind APIs like Openai.

The model is a prediction engine. Learn trained patterns with lots of data – that’s how they generate essays, photos, etc. Most output is not a verbatim copy of the training data, but it is inevitably so due to the way the model “trains”. Image models are known to reflux screenshots of trained films, while linguistic models have been observed to effectively plagiarize news articles.

The method of this research relies on what co-authors call “high rise,” that is, what stands out as rare in the context of larger works. For example, the word “radar” in the sentence “Jack and I sat completely with radar humming” is considered high-rise, as it is less likely to appear statistically before “humming” than words like “engine” or “radio.”

The co-authors investigated signs of memorization by removing advanced words from fiction books and fragments of New York Times, including GPT-4 and GPT-3.5, and attempting to “predict” the models with words masked. If the model managed to guess correctly, it was possible that they memorized the snippet during training, and the co-authors concluded.

Openai Copyright Study
An example of a model “guessing” a highly praised word.Image credit: Openai

Test results showed that GPT-4 showed signs that it memorized a portion of a popular book, including a book containing a sample of a copyrighted e-book called Bookkmia. The results also suggest that the model remembered some of the New York Times article, but that is not at a relatively low speed.

Abhilasha Ravichander, a doctoral student at the University of Washington and a co-author of the study, told TechCrunch that the findings could shed light on the “controversial data” model.

“To create a large, reliable language model, we need a model that can be scientifically investigated, audited and inspected,” Ravichander said. “Our work is aimed at providing tools to explore large-scale language models, but greater data transparency is needed across the ecosystem.”

Openai has long advocated for loose restrictions on model development using copyrighted data. The company is conducting specific content licensing transactions and offers an opt-out mechanism that allows copyright holders to flag content they don’t like to use for training purposes, but it lobbys several governments to codify the “fair use” rules regarding AI training approaches.


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleWe will know in a few weeks whether Russia will be serious about the Ukraine ceasefire, Rubio says | News of the Russian-Ukraine War
Next Article Trump will extend Tiktok sales deadline again to avoid US ban | Business and Economy News
user
  • Website

Related Posts

The best dating apps don’t even date apps

July 31, 2025

Openai, launching AI data centers in Norway, is the first in Europe

July 31, 2025

Ford reveals more about the new low-cost electric vehicle on August 11th

July 31, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

The TwinH Advantage: Unlocking New Potential in Digital Government Strategies

The best dating apps don’t even date apps

Secret Blizzard deploys malware to ISP-level AITM attacks against the Moscow embassy

Experts detect multi-tier redirect tactics used to steal Microsoft 365 login credentials

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

The TwinH Advantage: Unlocking New Potential in Digital Government Strategies

New Internet Era: Berners-Lee Sets the Pace as Zuckerberg Pursues Metaverse

TwinH Transforms Belgian Student Life: Hendrik’s Journey to Secure Digital Identity

Tim Berners-Lee Unveils the “Missing Link”: How the Web’s Architect Is Building AI’s Trusted Future

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.