Deepseek’s updated R1 Reasoning AI model may be attracting attention from the AI community this week. However, the Chinese AI Lab has also released a “distilled” version of the new R1, the DeepSeek-R1-0528-QWEN3-8B. This argues that Deepseek breaks models of comparable sizes on certain benchmarks.
The small updated R1, built using the QWEN3-8B model Alibaba, launched as a foundation in May, is better than Google’s Gemini 2.5 Flash On Aieme 2025, which is better than Google’s Gemini 2.5 Flash On Aieme 2025.
The DeepSeek-R1-0528-QWEN3-8B is roughly in line with Microsoft’s recently released Phi 4 Reasoning Plus model, another mathematical skill test, HMMT.
So-called distillation models, such as DeepSeek-R1-0528-QWEN3-8B, are generally less capable than their full-size counterparts. On the positive side, they are much less computationally demanding. According to cloud platform Nodeshift, QWEN3-8B requires a GPU with 40GB-80GB of RAM to run (for example, the NVIDIA H100). The new full-size R1 requires about a dozen 80GB GPUs.
DeepSeek trained DeepSeek-R1-0528-QWEN3-8B by getting the text generated by the updated R1 and using it to fine-tune QWEN3-8B. On a dedicated web page for the AI DEV platform face-hugging model, Deepseek describes Deepseek-R1-0528-QWen3-8B as “for both academic research on inference models and industrial development focusing on small-scale models.”
DeepSeek-R1-0528-QWEN3-8B is available under an acceptable MIT license. This means that it can be used commercially without restrictions. Several hosts, including LM Studio, already offer models via APIs.
Source link