SinaLab
/

Qwen-2.5-VL-7B-Instruct-Image-Captioning

@@ -1,64 +1,83 @@
----
-license: mit
----
-=======
 ---
 library_name: peft
-license: other
 base_model: Qwen/Qwen2.5-VL-7B-Instruct
 tags:
-- llama-factory
 - lora
-- generated_from_trainer
 model-index:
-- name: qwen2_5vl_arabic_model
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# qwen2_5vl_arabic_model
-This model is a fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on the arabic_captions dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 1
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 16
-- total_train_batch_size: 16
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 15.0
-- mixed_precision_training: Native AMP
-### Training results
-### Framework versions
 - PEFT 0.15.2
 - Transformers 4.49.0
-- Pytorch 2.4.1+cu121
-- Datasets 3.6.0
-- Tokenizers 0.21.1

 ---
 library_name: peft
+license: mit
 base_model: Qwen/Qwen2.5-VL-7B-Instruct
 tags:
+- arabic
+- image-captioning
+- vision-language
 - lora
+- qwen2.5-vl
+- cultural-heritage
+language:
+- ar
 model-index:
+- name: arabic-image-captioning-qwen2.5vl
   results: []
 ---
+# Arabic Image Captioning - Qwen2.5-VL Fine-tuned
+This model is a LoRA fine-tuned version of [Qwen/Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) for generating Arabic captions for images.
+## Model Description
+This model was developed as part of the [Arabic Image Captioning Shared Task 2025](https://sina.birzeit.edu/image_eval2025/index.html). It generates natural Arabic captions for images with focus on historical and cultural content related to Palestinian heritage.
+## Usage
+```python
+from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
+from peft import PeftModel
+import torch
+from PIL import Image
+# Load base model and processor
+base_model = Qwen2VLForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
+processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")
+# Load LoRA adapter
+model = PeftModel.from_pretrained(base_model, "your-username/arabic-image-captioning-qwen2.5vl")
+# Process image and generate caption
+image = Image.open("your_image.jpg")
+prompt = "اكتب وصفاً مختصراً لهذه الصورة باللغة العربية"
+inputs = processor(images=image, text=prompt, return_tensors="pt")
+with torch.no_grad():
+    outputs = model.generate(**inputs, max_new_tokens=128)
+caption = processor.decode(outputs[0], skip_special_tokens=True)
+print(caption)
+```
+## Training Details
+### Dataset
+- **Training data**: Arabic image captions dataset from the shared task
+- **Languages**: Arabic (ar)
+- **Dataset size**: ~2,700 training images with Arabic captions
+### Training Procedure
+- **Fine-tuning method**: LoRA (Low-Rank Adaptation)
+- **Training epochs**: 15
+- **Learning rate**: 2e-05
+- **Batch size**: 1 with gradient accumulation (effective batch size: 16)
+- **Optimizer**: AdamW with cosine learning rate scheduling
+- **Hardware**: NVIDIA A100 GPU
+- **Training time**: ~6 hours
+### Framework Versions
 - PEFT 0.15.2
 - Transformers 4.49.0
+- PyTorch 2.4.1+cu121
+## Contact
+For questions or support:
+- [email protected]
+- [email protected]
+- [email protected]