DeepSeek, in collaboration with Tsinghua University, has released the DeepSeek-GRM model, which significantly improves scalability during inference.
DeepSeek, in collaboration with Tsinghua University, has released the DeepSeek-GRM model. This model employs a point-based generative reward modeling (GRM) approach and utilizes the “Self-Principled Critique Tuning” (SPCT) learning method, enabling the model to exhibit scalability during inference. Experiments demonstrate that the DeepSeek-GRM-27B, when scaled to 32 samples during inference, achieves performance comparable to that of a 671B parameter model, highlighting the significant advantages of its reasoning scalability.
© Copyright Notice
The copyright of the article belongs to the author. Please do not reprint without permission.
Related Posts
No comments yet...