DeepSeek, in collaboration with Tsinghua University, has released the DeepSeek-GRM model, which significantly improves scalability during inference.

AI Daily News posted 2w ago dongdong
10 0

DeepSeek, in collaboration with Tsinghua University, has released the DeepSeek-GRM model. This model employs a point-based generative reward modeling (GRM) approach and utilizes the “Self-Principled Critique Tuning” (SPCT) learning method, enabling the model to exhibit scalability during inference. Experiments demonstrate that the DeepSeek-GRM-27B, when scaled to 32 samples during inference, achieves performance comparable to that of a 671B parameter model, highlighting the significant advantages of its reasoning scalability.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...