
Anthropic announced on Monday that it had identified an “industrial-scale campaign” in which three artificial intelligence (AI) companies, Deep Seek, Moonshot AI, and MiniMax, illegally extracted Claude’s abilities to improve their models.
This distillation attack resulted in more than 16 million interactions with its large-scale language model (LLM) via approximately 24,000 fraudulent accounts that violated terms of service and regional access restrictions. All three companies are based in China, where their services are prohibited due to “legal, regulatory, and security risks.”
Distillation refers to the technique of training a less capable model based on the output produced by a more powerful AI system. Distillation is a legal way for companies to produce smaller, cheaper versions of their Frontier models, but it’s illegal for competitors to use it to acquire such capabilities from other AI companies in a fraction of the time and cost it would take to develop them in-house.
“Illegally distilled models lack the necessary safeguards and pose a significant national security risk,” Antropic said. “Models built through illegal distillation are unlikely to retain these safeguards, meaning that dangerous features can proliferate when many protective features are completely stripped away.”
Foreign AI companies distilling the American model could weaponize these unprotected capabilities to facilitate cyber-related or other malicious activities, thereby serving as the basis for military, intelligence, and surveillance systems that authoritarian governments could deploy for offensive cyber operations, disinformation campaigns, and mass surveillance.
The campaign detailed by the AI startup involves the use of fraudulent accounts and commercial proxy services to access Claude at scale while avoiding detection. Anthropic said it was able to attribute each campaign to a specific AI lab based on request metadata, IP address correlation, request metadata, and infrastructure metrics.
The details of the three distillation attacks are as follows:
DeepSeek targeted Claude’s reasoning ability, a rubric-based scoring task, and enlisted the help of more than 150,000 exchanges in generating censorship-safe alternatives for politically sensitive questions, including questions about dissidents, party leaders, and authoritarianism. Moonshot AI covered Claude’s agent inference and tool usage, coding capabilities, computer-enabled agent development, and computer vision across more than 3.4 million exchanges. MiniMax targeted Claude’s agent coding and tool usage capabilities across over 13 million exchanges.
“The amount, structure, and focus of the prompts differ from normal usage patterns and reflect deliberate feature extraction rather than legitimate use,” Antropic added. “Each campaign targeted Claude’s most differentiated capabilities: agent reasoning, tool usage, and coding.”
The company also noted that the attack relied on commercial proxy services that resell access to Claude and other Frontier AI models at scale. These services utilize a “hydra cluster” architecture that includes a large network of fraudulent accounts to distribute traffic across the API.
This access is then used to generate a large number of carefully crafted prompts designed to extract specific features from the model in order to collect high-quality responses and train your own model.
“The breadth of these networks means there is no single point of failure,” Antropic said. “When one account is banned, a new one is used in its place. In one case, a single proxy network managed over 20,000 fraudulent accounts simultaneously, mixing distilled traffic with unrelated customer requests and making detection difficult.”
To combat this threat, Anthropic said it has built several classifiers and behavioral fingerprinting systems to identify suspicious distillation attack patterns in API traffic, strengthened validation of educational accounts, security research programs, and startup organizations, and implemented enhanced safeguards to reduce the effectiveness of model outputs against unauthorized distillation.
This disclosure comes weeks after the Google Threat Intelligence Group (GTIG) revealed that it had identified and disrupted distillation and model extraction attacks targeting Gemini’s inference capabilities through over 100,000 prompts.
Google said earlier this month that “model extraction and distillation attacks do not threaten the confidentiality, availability, or integrity of our AI services, so they typically do not pose a risk to the average user.” “Rather, the risk is concentrated on model developers and service providers.”
Source link
