Close Menu
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Spanish
What's Hot

US DOJ seizes four domains that support cybercrime crypto services in global operations

Chinese Navy conducts combat patrols near the contested shallows of the South China Sea | South China Sea News

Paramitaris for 40 years than rape during Guatemala prison war | Humanity News for Crime

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Academy
  • Events
  • Identity
  • International
  • Inventions
  • Startups
    • Sustainability
  • Tech
  • Spanish
Fyself News
Home » Researchers propose to propose Openai-trained AI models in Paywalled O’Reilly’s book
Startups

Researchers propose to propose Openai-trained AI models in Paywalled O’Reilly’s book

userBy userApril 1, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

Openai is accused by many parties who trained AI with permissions for copyrighted content. Now, new papers by the AI ​​Watchdog organization are making serious accusations that they are increasingly relying on private books that are not licensed to train more sophisticated AI models.

AI models are inherently complex prediction engines. He is trained on many data, including books, films, and TV shows. They learn patterns and novel ways to extrapolate from simple prompts. When models “write” essays on Greek tragedy and Ghibli-style images, they simply draw and approximate from their vast knowledge. It hasn’t reached anything new.

Many AI labs, including Openai, have begun to employ data generated by AI to train AI when ejecting real-world sources (mainly public web), but few have eschewed the actual data entirely. This is because training purely synthetic data involves risks such as poor model performance.

A new paper from the AI ​​Disclosures Project, a nonprofit co-founded by media tycoon Tim O’Reilly and economist Iran Strauss in 2024, led to the conclusion that Openai likely trained the GPT-4O model with a paywalled book from O’Reilly Media. (O’Reilly is CEO of O’Reilly Media.)

In ChatGPT, the GPT-4O is the default model. O’Reilly does not have a license agreement with Openai, the paper states.

“Openai’s more recent and capable model, GPT-4o, shows a strong recognition of Paywalled O’Reilly’s content compared to Openai’s previous model, GPT-3.5 Turbo,” the paper’s co-author wrote. “In contrast, the GPT-3.5 turbo shows a significant relative perception of the published O’Reilly book sample.”

This paper used a method called DE-COP, originally introduced in academic paper in 2024, designed to detect copyrighted content in training data in language models. Also known as a “membership inference attack,” this method tests whether the model can reliably distinguish between the same textual paraphrase, AI-generated versions and human-written text. If possible, it suggests that the model may have prior text knowledge from the training data.

The paper’s co-authors, O’Reilly, Strauss, and AI researcher Sruly Rosenblat, say they investigated knowledge of GPT-4O, GPT-3.5 Turbo and other Openai models on the O’Reilly media book published before and after the training cutoff date. They used 13,962 paragraph excerpts from 34 O’Reilly’s book to estimate the probability that a particular excerpt was included in the model’s training data set.

According to the results of the paper, the GPT-4o “recognized” the contents of Paywall O’Reilly books much more than the older models of Openai, including the GPT-3.5 turbo. It said, like an improvement in the new model’s ability to grasp whether texts are human writing, even after considering potential confounding factors.

“GPT-4O [likely] The co-author wrote:

It’s not a smoking gun, and co-authors should be careful. They acknowledge that their experimental methods are not innocent and that Openai may have collected excerpts from paid books from users copying and pasting them into ChatGpt.

The co-authors were even muddy with water and did not evaluate Openai’s latest collection of models, including “inference” models such as GPT-4.5, O3-Mini and O1. These models may not be trained with the Paywalled O’Reilly’s data or may have been trained in less than the GPT-4O.

That being said, it is no secret that Openai, which uses copyrighted data to advocate for loose restrictions on model development, has been seeking high-quality training data for some time. The company has gone so far as to hire journalists to help tweak the output of the model. This is a broader industry-wide trend. AI companies recruit experts in domains such as science and physics to supply these experts with knowledge to AI systems.

You should be aware that Openai will pay at least a portion of its training data. The company carries out license transactions with news publishers, social networks, Stock Media Library and more. Openai also offers an opt-out mechanism (although incomplete) that allows copyright holders to flag content they prefer not to use for training purposes.

Still, the O’Reilly paper is the least flattering, as Openai fights several suits on training data practices and handling of copyright laws in US courts.

Openai did not respond to requests for comment.


Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleAccounting startup has turned tax preparation into a Pokemon showdown game
Next Article Amidst the grid failure, Syria was hit with a nationwide blackout | News
user
  • Website

Related Posts

Doge left the US Peace Institute with floods, mice and cockroaches

May 30, 2025

It’s not your imagination: AI is speeding up the pace of change

May 30, 2025

Trump administration to keep $3.7 billion in clean energy and manufacturing awards

May 30, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

US DOJ seizes four domains that support cybercrime crypto services in global operations

Chinese Navy conducts combat patrols near the contested shallows of the South China Sea | South China Sea News

Paramitaris for 40 years than rape during Guatemala prison war | Humanity News for Crime

Olympic Boxing Champion Iman Kerif needs gender test to continue his fight | Olympic News

Trending Posts

Chinese Navy conducts combat patrols near the contested shallows of the South China Sea | South China Sea News

May 31, 2025

Paramitaris for 40 years than rape during Guatemala prison war | Humanity News for Crime

May 31, 2025

Olympic Boxing Champion Iman Kerif needs gender test to continue his fight | Olympic News

May 31, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

Top Startup and Tech Funding News for the Week Ending May 30, 2025

Where LLMS retrieves real-time data behind AI searches (and why it’s more important than you think)

SpaceX’s Journey to Mars: How Spaceships Use Hohmann Orbital’s Movement from Earth to Mars (and the Physics Behind)

zircuit allows for top-up of non-curative wallets for crypto.com visa cards

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.