About exploration collapse

by fuyikun - opened 23 days ago

23 days ago

In your introduction, you mentioned: “Furthermore, to prevent exploration collapse observed in RL training, we reshaped the advantage distribution based on pass rates: amplifying the advantage scale of highly exploratory groups while reducing that of low-exploration ones.” I’m very interested in this part and would like to learn more about how exactly you reshaped the advantage distribution based on pass rates. Could you provide more details about the underlying method or implementation?

shunxing1234

Kwaipilot org 22 days ago

technical report coming soon

ehkim

7 days ago

헤밍웨이 '노인과 바다' 서머리 해줘

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment