Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

Parallel Web Systems reaches $2 billion valuation 5 months after last major funding

SAP-related npm packages compromised in supply chain attack that steals credentials

Bill Gurley and Jack Altman back Pursuit, a startup that helps companies sell to governments

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » Openai’s model “remembered” copyrighted content suggests new research
Startups

Openai’s model “remembered” copyrighted content suggests new research

By April 4, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

The new research appears to give credit to allegations that Openai has trained at least some of the AI ​​models for copyrighted content.

Openai is caught up in a lawsuit brought by authors, programmers and other rights holders. This accuses the company of developing models without permission using books, codebases, etc. Openai has long advocated fair use defense, but plaintiffs in these cases argue that there is no sculpture in the US copyright laws for training data.

The study was co-authored by researchers at Washington University, Copenhagen University and Stanford University, and proposes new methods for identifying training data “memorized” by models behind APIs like Openai.

The model is a prediction engine. Learn trained patterns with lots of data – that’s how they generate essays, photos, etc. Most output is not a verbatim copy of the training data, but it is inevitably so due to the way the model “trains”. Image models are known to reflux screenshots of trained films, while linguistic models have been observed to effectively plagiarize news articles.

The method of this research relies on what co-authors call “high rise,” that is, what stands out as rare in the context of larger works. For example, the word “radar” in the sentence “Jack and I sat completely with radar humming” is considered high-rise, as it is less likely to appear statistically before “humming” than words like “engine” or “radio.”

The co-authors investigated signs of memorization by removing advanced words from fiction books and fragments of New York Times, including GPT-4 and GPT-3.5, and attempting to “predict” the models with words masked. If the model managed to guess correctly, it was possible that they memorized the snippet during training, and the co-authors concluded.

Openai Copyright Study
An example of a model “guessing” a highly praised word.Image credit: Openai

Test results showed that GPT-4 showed signs that it memorized a portion of a popular book, including a book containing a sample of a copyrighted e-book called Bookkmia. The results also suggest that the model remembered some of the New York Times article, but that is not at a relatively low speed.

Abhilasha Ravichander, a doctoral student at the University of Washington and a co-author of the study, told TechCrunch that the findings could shed light on the “controversial data” model.

“To create a large, reliable language model, we need a model that can be scientifically investigated, audited and inspected,” Ravichander said. “Our work is aimed at providing tools to explore large-scale language models, but greater data transparency is needed across the ecosystem.”

Openai has long advocated for loose restrictions on model development using copyrighted data. The company is conducting specific content licensing transactions and offers an opt-out mechanism that allows copyright holders to flag content they don’t like to use for training purposes, but it lobbys several governments to codify the “fair use” rules regarding AI training approaches.


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleWe will know in a few weeks whether Russia will be serious about the Ukraine ceasefire, Rubio says | News of the Russian-Ukraine War
Next Article Trump will extend Tiktok sales deadline again to avoid US ban | Business and Economy News

Related Posts

Parallel Web Systems reaches $2 billion valuation 5 months after last major funding

April 29, 2026

Bill Gurley and Jack Altman back Pursuit, a startup that helps companies sell to governments

April 29, 2026

Uber is now entering the hotel business thanks to AI

April 29, 2026
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

Parallel Web Systems reaches $2 billion valuation 5 months after last major funding

SAP-related npm packages compromised in supply chain attack that steals credentials

Bill Gurley and Jack Altman back Pursuit, a startup that helps companies sell to governments

Uber is now entering the hotel business thanks to AI

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Castilla-La Mancha Ignites Innovation: fiveclmsummit Redefines Tech Future

Local Power, Health Innovation: Alcolea de Calatrava Boosts FiveCLM PoC with Community Engagement

The Future of Digital Twins in Healthcare: From Virtual Replicas to Personalized Medical Models

Human Digital Twins: The Next Tech Frontier Set to Transform Healthcare and Beyond

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2026 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.