Bluesky users discuss plans for user data and AI training

Social network BlueSky recently published a proposal on GitHub. This provides an overview of new options such as generator AI training and public archives that allow users to show whether to scrape out posts and data.

CEO Jay Graber discussed the proposal earlier this week, but on the South By Southwest stage, he attracted new attention after posting on Bluesky on Friday night. Some users responded to the company’s plans with alarms. This was seen as a reversal of Bluesky’s previous claim that it would not sell user data to advertisers and not train AI with user posts.

“Ah, hell no!” written by user Sketchette. “The beauty of this platform wasn’t about sharing information. Especially Gen Ai. You should not cave right now.”

Graber responded that “everything about Bluesky is publicly available like a website,” so generator AI companies, including Bluesky, are “already cutting out public data from the whole web.” So she said that Bluesky is trying to create a “new standard” that manages its scraping, similar to the robots.txt file that the website uses to communicate with the web crawler.

Due to AI training and copyright debate, Robots.txt was dragged into the spotlight. BlueSky phrases the proposed standard as having a similar “mechanism and expectations,” providing a machine-readable format that says, “Good actors are expected to adhere to and carry ethical weights, but not legally enforceable.”

Under the suggestion, users of the Bluesky app, or other apps using the underlying Atprotocol, can go to Settings and allow or allow the use of Blueky data in four categories.

If the user indicates that they do not want to use the data to train generative AI, the suggestion states:

Molly White, who writes a great blog for Quote newsletter and Web3, describes this as a “good suggestion” and “don’t “welcome people who burn Bluesky with AI.”

“I think this is a weakness [Creative Commons’] A similar suggestion of “priority signals” is for them to rely on scrapers to respect these signals from their desire to be good actors,” White continued. “We’ve already seen some of these companies being blown away to blow past robots.txt or pirate material.”

Source link

What's Hot

Threat actor weaponizes Hexstrike AI to exploit Citrix’s flaws within a week of disclosure

Data leaks before a disaster

Google Patch 120 defect. This includes two zero days during attack

Bluesky users discuss plans for user data and AI training

Google will avoid breaking up, but you should abandon your monopoly search transaction in antitrust exams

Openai gets product tests and gets startup statsigs and shakes the leadership team

Tesla’s 4th “Master Plan” reads like nonsense generated by LLM

Threat actor weaponizes Hexstrike AI to exploit Citrix’s flaws within a week of disclosure

Data leaks before a disaster

Google Patch 120 defect. This includes two zero days during attack

Can I change the tide?

Beyond Compliance: The New Era of Smart Medical Device Software Integration

Unlocking Tomorrow’s Health: Medical Device Integration

Web 3.0’s Promise: What Sir Tim Berners-Lee Envisions for the Future of the Internet

TwinH’s Paves Way at Break The Gap 2025

What's Hot

Bluesky users discuss plans for user data and AI training

Related Posts