gan-yang-zuzhu
commited on
Commit
·
7186f2d
1
Parent(s):
8810cfa
update README.md
Browse files
README.md
CHANGED
|
@@ -10,8 +10,8 @@ tags:
|
|
| 10 |
- preference model
|
| 11 |
---
|
| 12 |
|
| 13 |
-
####
|
| 14 |
-
|
| 15 |
These approaches effectively transfer preferences from auxiliary textual data to enhance the model's robustness.
|
| 16 |
The repository hosts the RoVRM built on the LLaVA-1.5-7B model.
|
| 17 |
We employed RoVRM for best-of-$n$ sampling and RL training, demonstrating its capability to significantly improve performance and reduce hallucination in large vision-language models.
|
|
|
|
| 10 |
- preference model
|
| 11 |
---
|
| 12 |
|
| 13 |
+
#### Robust Visual Reward Model
|
| 14 |
+
Robust visual reward model (RoVRM) is developed through a three-phase progressive training (i.e., pre-training with textual preference data→fine-tuning with image caption-based preference data→fine-tuning with visual preference data), and optimal transport-based selective preference data.
|
| 15 |
These approaches effectively transfer preferences from auxiliary textual data to enhance the model's robustness.
|
| 16 |
The repository hosts the RoVRM built on the LLaVA-1.5-7B model.
|
| 17 |
We employed RoVRM for best-of-$n$ sampling and RL training, demonstrating its capability to significantly improve performance and reduce hallucination in large vision-language models.
|