Close Menu
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
What's Hot

The new US visa rules require applicants to set the privacy of their social media accounts publicly

A federal judge with a lawsuit over AI training on books without author’s permission

Researchers find ways to shut down CryptoMiner campaigns using bad stocks and Xmrogue

Facebook X (Twitter) Instagram
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
Facebook X (Twitter) Instagram
Fyself News
  • Home
  • Identity
  • Inventions
  • Future
  • Science
  • Startups
  • Spanish
Fyself News
Home » Over 12,000 API keys and passwords in the public dataset used for LLM training
Identity

Over 12,000 API keys and passwords in the public dataset used for LLM training

userBy userFebruary 28, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Telegram LinkedIn Tumblr Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email Copy Link

We found that the dataset used to train large-scale language models (LLMS) contains nearly 12,000 live secrets that can be successful in authentication.

The findings once again highlight how hard-coded credentials pose serious security risks for users and organizations, not to mention exacerbating the problem when LLMS proposes unstable coding practices to users.

Truffle Security said it downloaded its December 2024 archive from Common Crawl, which maintains a free, open repository of web crawl data. The large dataset includes over 250 billion pages over 18 years.

The archives include 400,000 WARC files (web archive format) across 38.3 million registered domains, 400 tons of web data, 90,000 Warc files (web archive format) and data, across 38.3 million registered domains, as well as 400 tons of web data from 47.5 million hosts.

The company’s analysis found 219 different secret types in a common crawl, including Amazon Web Services (AWS) root key, Slack Webhooks, and Mailchimp API keys.

Cybersecurity

“The secret to “live” is the API key, password and other credentials that are successfully authenticated with each service,” said security researcher Joe Leon.

“LLM equally contributes to providing examples of insecure code, as it cannot distinguish between valid and invalid secrets during training. This means that even the invalidation or secrets of the training data can enhance the practice of insecure coding.”

This disclosure follows the warning that data published via a public source code repository can be accessed via AI chatbots like Microsoft Copilot.

The attack method known as Wayback Copilot has discovered 20,580 GitHub repositories belonging to 16,290 organizations, including Microsoft, Google, Intel, Huawei, PayPal, IBM, Tencent, and more. The repository also publishes over 300 private tokens, keys and secrets from Github, Hugging Face, Google Cloud and Openai.

“Even in a short period of time, any previously published information is accessible and could be distributed by Microsoft Copilot,” the company said. “This vulnerability is particularly dangerous for repositories that were publicly and incorrectly published before being reserved due to the sensitive nature of the data stored there.”

This development arises in new research that fine-tuning AI language models with the Consecure Code example can lead to unexpected harmful behavior even when prompts that are unrelated to coding. This phenomenon is called emergency inconsistency.

“The model has been tweaked to output unsafe code without revealing this to the user,” the researchers said. “The resulting model is misaligned against a wide range of prompts that are unrelated to coding. We assert that humans should be enslaved by AI, give malicious advice and act deceptively. Training on the narrow task of writing uneasy code induces widespread inconsistency.”

What is noteworthy about this study is that it is different from jailbreaking. Jailbreaking means that models are being fooled to give dangerous advice or act in unwanted ways, in a way that bypasses safety and ethical guardrails.

Such hostile attacks are called rapid infusions. This occurs when an attacker operates a Generic Artificial Intelligence (GENAI) system via the input that was created, causing LLM to unconsciously generate content that is prohibited.

Recent findings show that rapid infusion is a permanent thorn on the aspects of mainstream AI products, and the security community finds various ways to jailbreak cutting edge AI tools such as Claude 3.7 of Mankind, Deepshek, Google Gemini, Open Chat GPT O3, Operator, Pandasai, Zaiglock 3.

In a report published last week, Palo Alto Networks Unit 42 revealed that investigations of 17 Genai web products found that everything was vulnerable to breaking away in some capacity.

Cybersecurity

“Multi-turn jailbreak strategies are generally more effective than a single-turn approach when jailbreaking with the aim of a safety violation,” said Yong-Ge Hwang, Yang Ji and Wenjun Huu. “However, they are generally not effective in jailbreaking, aiming to leak model data.”

Furthermore, research has found that large-scale inference model (LRMS) thinking (COT) intermediate inference can hijack and escape safety management.

Another way to influence the behavior of the model revolves around a parameter called “logit bias,” so you can modify the possibility of a particular token displayed in the generated output, and steer the LLM to refrain from using offensive words or encouraging neutral answers.

“For example, improperly tuned logit bias can inadvertently allow unlimited outputs that are designed to limit the model, leading to the generation of inappropriate or harmful content.”

“This type of operation can be exploited to bypass the model or “jailbreak” the model and can generate responses intended to be filtered. ”

Did you find this article interesting? Follow us on Twitter and LinkedIn to read exclusive content you post.

Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleUniversal Live will appoint new account directors to achieve new ISO standards
Next Article Transforming cancer care: MCTRC pioneers innovative research
user
  • Website

Related Posts

The new US visa rules require applicants to set the privacy of their social media accounts publicly

June 24, 2025

Researchers find ways to shut down CryptoMiner campaigns using bad stocks and Xmrogue

June 24, 2025

Hackers target over 70 Microsoft Exchange servers and steal credentials via keyloggers

June 24, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest Posts

The new US visa rules require applicants to set the privacy of their social media accounts publicly

A federal judge with a lawsuit over AI training on books without author’s permission

Researchers find ways to shut down CryptoMiner campaigns using bad stocks and Xmrogue

Amazon will spend more than $4 billion to expand its major delivery to rural US communities

Trending Posts

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Please enable JavaScript in your browser to complete this form.
Loading

Welcome to Fyself News, your go-to platform for the latest in tech, startups, inventions, sustainability, and fintech! We are a passionate team of enthusiasts committed to bringing you timely, insightful, and accurate information on the most pressing developments across these industries. Whether you’re an entrepreneur, investor, or just someone curious about the future of technology and innovation, Fyself News has something for you.

The Digital Twin Revolution: Reshaping Industry 4.0

1-inch rollout expanded bug bounty features rewards up to $500,000

PhysicsX raises $135 million to bring AI-first engineering to aerospace, automobiles and energy

Deadline approach to speaker proposals for OpenSSL Conference 2025 held in Prague

Facebook X (Twitter) Instagram Pinterest YouTube
  • Home
  • About Us
  • Advertise with Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
  • User-Submitted Posts
© 2025 news.fyself. Designed by by fyself.

Type above and press Enter to search. Press Esc to cancel.