LFM2-1.2B-RAG Arabic (LoRA Fine-tuned)

Fine-tuned version of LiquidAI/LFM2-1.2B-RAG for Arabic reading comprehension and question answering tasks using LoRA (Low-Rank Adaptation) technique.

📋 Model Description

This model specializes in extractive question answering for Arabic text. It has been fine-tuned using LoRA on the Arabic Reading Comprehension Dataset (ARCD) to improve its ability to answer questions based on provided context in Modern Standard Arabic.

Key Features:

Optimized for Arabic extractive QA
Context-based question answering
Maintains faithfulness to source documents
Efficient fine-tuning via LoRA (rank=16)

🎯 Intended Use

Direct Use

Arabic question answering systems
RAG (Retrieval-Augmented Generation) applications for Arabic content
Information extraction from Arabic documents
Educational tools for Arabic reading comprehension

Downstream Use

Can be further fine-tuned for:

Domain-specific QA (medical, legal, financial)
Multi-turn conversational QA
Document summarization with Q&A

Out-of-Scope Use

Not recommended for:

Open-domain question answering without context
Creative writing or content generation
Translation tasks
Code generation

🚀 How to Use

Basic Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_id = "azeddinShr/LFM2-1.2B-RAG-ARABIC-LoRA"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Prepare input
context = "نيوم هو مشروع ضخم في شمال غرب السعودية بتكلفة 500 مليار دولار."
question = "ما هي تكلفة مشروع نيوم؟"

prompt = f"استخدم السياق التالي للإجابة على السؤال:\n\n{context}\n\nالسؤال: {question}"

# Generate answer
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_new_tokens=150,
        temperature=0.0,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )

answer = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(answer)  # Output: 500 مليار دولار

📊 Training Details

Training Data

Dataset: hsseinmz/arcd
Training samples: 693
Validation samples: 351
Test samples: 351
Language: Modern Standard Arabic
Task: Extractive question answering

Training Procedure

Fine-tuning method: LoRA (Low-Rank Adaptation)

Hyperparameters:

Base model: LiquidAI/LFM2-1.2B-RAG
Epochs: 10
Batch size: 16 (4 per device × 4 gradient accumulation)
Learning rate: 2e-4
Optimizer: AdamW (8-bit paged)
LR scheduler: Cosine
Warmup steps: 50
Weight decay: 0.01
LoRA rank (r): 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: w1, w2, w3, q_proj, k_proj, v_proj, out_proj, in_proj

Training infrastructure:

Precision: bfloat16
Gradient checkpointing: Enabled
Framework: Hugging Face Transformers + PEFT + TRL

🔒 Ethical Considerations

This model should not be used for generating misleading or false information
Users should verify factual claims, especially for sensitive topics
The model's responses reflect patterns in training data and may not represent complete or unbiased information

📜 Citation

If you use this model in your research or application, please cite:

@misc{lfm2-rag-arabic-lora,
  author = {Azeddin sahir},
  title = {LFM2-1.2B-RAG Arabic (LoRA Fine-tuned)},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/azeddinShr/lfm2-1.2b-arabic-qa-lora}}
}

👍🏻 Acknowledgments

Base Model: LiquidAI for LFM2-1.2B-RAG
Dataset: ARCD - Arabic Reading Comprehension Dataset
Framework: Hugging Face Transformers, PEFT, TRL

📄 License

Same as based model

📧 Contact

For questions, issues, or collaboration opportunities, please open an issue in the model repository, contact via Hugging Face, or email me directly at [email protected].

Downloads last month: 44

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for azeddinShr/LFM2-1.2B-RAG-ARABIC-LoRA

Base model

LiquidAI/LFM2-1.2B

Finetuned

LiquidAI/LFM2-1.2B-RAG