LFM2-1.2B-RAG Arabic (LoRA Fine-tuned)

Fine-tuned version of LiquidAI/LFM2-1.2B-RAG for Arabic reading comprehension and question answering tasks using LoRA (Low-Rank Adaptation) technique.

๐Ÿ“‹ Model Description

This model specializes in extractive question answering for Arabic text. It has been fine-tuned using LoRA on the Arabic Reading Comprehension Dataset (ARCD) to improve its ability to answer questions based on provided context in Modern Standard Arabic.

Key Features:

  • Optimized for Arabic extractive QA
  • Context-based question answering
  • Maintains faithfulness to source documents
  • Efficient fine-tuning via LoRA (rank=16)

๐ŸŽฏ Intended Use

Direct Use

  • Arabic question answering systems
  • RAG (Retrieval-Augmented Generation) applications for Arabic content
  • Information extraction from Arabic documents
  • Educational tools for Arabic reading comprehension

Downstream Use

Can be further fine-tuned for:

  • Domain-specific QA (medical, legal, financial)
  • Multi-turn conversational QA
  • Document summarization with Q&A

Out-of-Scope Use

Not recommended for:

  • Open-domain question answering without context
  • Creative writing or content generation
  • Translation tasks
  • Code generation

๐Ÿš€ How to Use

Basic Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_id = "azeddinShr/LFM2-1.2B-RAG-ARABIC-LoRA"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Prepare input
context = "ู†ูŠูˆู… ู‡ูˆ ู…ุดุฑูˆุน ุถุฎู… ููŠ ุดู…ุงู„ ุบุฑุจ ุงู„ุณุนูˆุฏูŠุฉ ุจุชูƒู„ูุฉ 500 ู…ู„ูŠุงุฑ ุฏูˆู„ุงุฑ."
question = "ู…ุง ู‡ูŠ ุชูƒู„ูุฉ ู…ุดุฑูˆุน ู†ูŠูˆู…ุŸ"

prompt = f"ุงุณุชุฎุฏู… ุงู„ุณูŠุงู‚ ุงู„ุชุงู„ูŠ ู„ู„ุฅุฌุงุจุฉ ุนู„ู‰ ุงู„ุณุคุงู„:\n\n{context}\n\nุงู„ุณุคุงู„: {question}"

# Generate answer
messages = [{"role": "user", "content": prompt}]
input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_new_tokens=150,
        temperature=0.0,
        do_sample=False,
        pad_token_id=tokenizer.eos_token_id
    )

answer = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(answer)  # Output: 500 ู…ู„ูŠุงุฑ ุฏูˆู„ุงุฑ

๐Ÿ“Š Training Details

Training Data

  • Dataset: hsseinmz/arcd
  • Training samples: 693
  • Validation samples: 351
  • Test samples: 351
  • Language: Modern Standard Arabic
  • Task: Extractive question answering

Training Procedure

Fine-tuning method: LoRA (Low-Rank Adaptation)

Hyperparameters:

  • Base model: LiquidAI/LFM2-1.2B-RAG
  • Epochs: 10
  • Batch size: 16 (4 per device ร— 4 gradient accumulation)
  • Learning rate: 2e-4
  • Optimizer: AdamW (8-bit paged)
  • LR scheduler: Cosine
  • Warmup steps: 50
  • Weight decay: 0.01
  • LoRA rank (r): 16
  • LoRA alpha: 32
  • LoRA dropout: 0.05
  • Target modules: w1, w2, w3, q_proj, k_proj, v_proj, out_proj, in_proj

Training infrastructure:

  • Precision: bfloat16
  • Gradient checkpointing: Enabled
  • Framework: Hugging Face Transformers + PEFT + TRL

๐Ÿ”’ Ethical Considerations

  • This model should not be used for generating misleading or false information
  • Users should verify factual claims, especially for sensitive topics
  • The model's responses reflect patterns in training data and may not represent complete or unbiased information

๐Ÿ“œ Citation

If you use this model in your research or application, please cite:

@misc{lfm2-rag-arabic-lora,
  author = {Azeddin sahir},
  title = {LFM2-1.2B-RAG Arabic (LoRA Fine-tuned)},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/azeddinShr/lfm2-1.2b-arabic-qa-lora}}
}

๐Ÿ‘๐Ÿป Acknowledgments

  • Base Model: LiquidAI for LFM2-1.2B-RAG
  • Dataset: ARCD - Arabic Reading Comprehension Dataset
  • Framework: Hugging Face Transformers, PEFT, TRL

๐Ÿ“„ License

Same as based model

๐Ÿ“ง Contact

For questions, issues, or collaboration opportunities, please open an issue in the model repository, contact via Hugging Face, or email me directly at [email protected].

Downloads last month
44
Safetensors
Model size
1B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for azeddinShr/LFM2-1.2B-RAG-ARABIC-LoRA

Base model

LiquidAI/LFM2-1.2B
Finetuned
(3)
this model

Dataset used to train azeddinShr/LFM2-1.2B-RAG-ARABIC-LoRA