Social network BlueSky recently published a proposal on GitHub. This provides an overview of new options such as generator AI training and public archives that allow users to show whether to scrape out posts and data.
CEO Jay Graber discussed the proposal earlier this week, but on the South By Southwest stage, he attracted new attention after posting on Bluesky on Friday night. Some users responded to the company’s plans with alarms. This was seen as a reversal of Bluesky’s previous claim that it would not sell user data to advertisers and not train AI with user posts.
“Ah, hell no!” written by user Sketchette. “The beauty of this platform wasn’t about sharing information. Especially Gen Ai. You should not cave right now.”
Graber responded that “everything about Bluesky is publicly available like a website,” so generator AI companies, including Bluesky, are “already cutting out public data from the whole web.” So she said that Bluesky is trying to create a “new standard” that manages its scraping, similar to the robots.txt file that the website uses to communicate with the web crawler.
Due to AI training and copyright debate, Robots.txt was dragged into the spotlight. BlueSky phrases the proposed standard as having a similar “mechanism and expectations,” providing a machine-readable format that says, “Good actors are expected to adhere to and carry ethical weights, but not legally enforceable.”
Under the suggestion, users of the Bluesky app, or other apps using the underlying Atprotocol, can go to Settings and allow or allow the use of Blueky data in four categories.
If the user indicates that they do not want to use the data to train generative AI, the suggestion states:
Molly White, who writes a great blog for Quote newsletter and Web3, describes this as a “good suggestion” and “don’t “welcome people who burn Bluesky with AI.”
“I think this is a weakness [Creative Commons’] A similar suggestion of “priority signals” is for them to rely on scrapers to respect these signals from their desire to be good actors,” White continued. “We’ve already seen some of these companies being blown away to blow past robots.txt or pirate material.”
Source link