Alibaba releases QwQ-32B, an open-source reasoning model, on Hugging Face and ModelScope, claiming performance similar to DeepSeek-R1 with lower compute needs.

qwenlm.github.io

Alibaba releases QwQ-32B, an open-source reasoning model, on Hugging Face and ModelScope, claiming performance similar to DeepSeek-R1 with lower compute needs.

qwenlm.github.io

Tea@programming.dev to

Artificial Intelligence @lemmy.sdf.orgEnglish · 12 hours ago

QwQ-32B: Embracing the Power of Reinforcement Learning

qwenlm.github.io

QWEN CHAT Hugging Face ModelScope DEMO DISCORD Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning. Our research explores the scalability of Reinforcement Learning (RL) and its impact on enhancing the intelligence of large language models.

You must log in or register to comment.

Chat