STELLA-VLM-32b

STELLA-VLM-32b is a fine-tuned version of Qwen2.5-VL-32B-Instruct using Group Relative Policy Optimization (GRPO) with LoRA.

Model Details

  • Base Model: Qwen/Qwen2.5-VL-32B-Instruct
  • Fine-tuning Method: GRPO (Group Relative Policy Optimization) with LoRA (rank=64)
  • Training Data: Scientific protocol datasets (jove_llamafactory, finebio)
  • Parameters: 34B total parameters with 566M trainable LoRA parameters (1.66%)

Training Configuration

  • LoRA rank: 64
  • LoRA alpha: 128
  • Training epochs: 3 (checkpoint saved at step 400)
  • Batch size: 4
  • Learning rate: 2e-4
  • Reward function: Rule-based with length and repetition penalties

Usage

from transformers import AutoModelForVision2Seq, AutoProcessor
import torch

model = AutoModelForVision2Seq.from_pretrained(
    "Zaixi/STELLA-VLM-32b",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)
processor = AutoProcessor.from_pretrained("Zaixi/STELLA-VLM-32b", trust_remote_code=True)

Fine-tuning Details

This model was fine-tuned using GRPO on scientific protocol datasets to improve instruction following and consistency in generating scientific content. The model shows improved performance on:

  • Scientific protocol understanding
  • Consistent response generation
  • Following detailed instructions
  • Multimodal reasoning tasks

License

Apache 2.0

Downloads last month
116
Safetensors
Model size
33B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Zaixi/STELLA-VLM-32b

Finetuned
(48)
this model
Quantizations
2 models