汀的知识碎片

Tag: Reward-Model

1 item with this tag.

Mar 04, 2026
指令微调与 RLHF——从基座模型到对话助手

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community