Qwen3-0.6B-MMLU-Pro-Classifier (LoRA)

A LoRA fine-tuned version of Qwen/Qwen3-0.6B for academic question classification using the MMLU-Pro dataset.

🎯 Model Description

This model classifies academic questions into 14 categories using a generative instruction-following approach:

  • Base Model: Qwen3-0.6B (596M parameters)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Trainable Parameters: 10.1M (1.67% of total)
  • Task: Multi-class academic question classification
  • Approach: Generative (instruction-tuning) instead of classification head

Categories

biology, business, chemistry, computer science, economics, engineering, health, history, law, math, other, philosophy, physics, psychology

πŸš€ Quick Start

Installation

pip install transformers peft torch

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load base model and tokenizer
model_name = "Qwen/Qwen3-0.6B"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "YOUR_USERNAME/qwen3-mmlu-classifier")
model.eval()

# Prepare prompt
question = "What are the key principles of quantum mechanics?"
prompt = f"""You are an expert academic classifier. Classify the following question into exactly ONE category. Respond with ONLY the category name.

Categories: biology, business, chemistry, computer science, economics, engineering, health, history, law, math, other, philosophy, physics, psychology

Examples:
Q: What is the optimal capital structure for a corporation?
A: business

Q: How do neurons transmit signals?
A: biology

Q: What are the principles of contract law?
A: law

Now classify this question:
Q: {question}
A:"""

# Generate classification
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=10,
        temperature=0.1,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id
    )

# Parse result
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
category = generated_text.split("A:")[-1].strip().split()[0]
print(f"Category: {category}")  # Output: physics

Batch Classification

questions = [
    "What is the best strategy for corporate mergers?",
    "How does cognitive bias affect decision making?",
    "Explain the legal requirements for contract formation"
]

for q in questions:
    prompt = f"Q: {q}\nA:"  # Simplified for batch
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=5)
    category = tokenizer.decode(outputs[0], skip_special_tokens=True).split("A:")[-1].strip()
    print(f"{q[:50]}... -> {category}")

πŸ“Š Performance

Metric Value
Validation Accuracy 65-70%
Training Loss (final) 0.12
Validation Loss (best) 0.82 (epoch 4)
Training Samples 1,192
Validation Samples 398

Why Generative Approach?

Unlike traditional classification heads, this model generates the category name as text:

Approach Qwen3 Performance Reason
Classification Head ❌ 16% Decoder models don't have good sentence representations
Generative (This) βœ… 65-70% Natural for decoder models, aligned with pre-training

πŸ› οΈ Training Details

Training Configuration

{
    "base_model": "Qwen/Qwen3-0.6B",
    "lora_rank": 16,
    "lora_alpha": 32,
    "lora_dropout": 0.05,
    "epochs": 8,
    "learning_rate": 3e-4,
    "batch_size": 1,
    "gradient_accumulation": 16,
    "effective_batch_size": 16,
    "optimizer": "adamw_torch",
    "lr_scheduler": "cosine",
    "warmup_ratio": 0.1,
    "max_samples": 2000
}

LoRA Target Modules

[
    "q_proj",      # Query projection
    "k_proj",      # Key projection
    "v_proj",      # Value projection
    "o_proj",      # Output projection
    "gate_proj",   # MLP gate
    "up_proj",     # MLP up
    "down_proj",   # MLP down
]

Dataset

  • Source: TIGER-Lab/MMLU-Pro
  • Split: 60% train / 20% validation / 20% test
  • Balancing: Equal samples per category (~142 each)
  • Total Samples: 1,988 (from 12,032 available)

Training Environment

  • GPU: NVIDIA L4 (23GB VRAM)
  • Memory Usage: ~2.3GB during training
  • Training Time: ~32 minutes (8 epochs)
  • Framework: HuggingFace Transformers + PEFT

πŸ“ Prompt Template

The model was trained with this instruction template:

You are an expert academic classifier. Classify the following question into exactly ONE category. Respond with ONLY the category name.

Categories: biology, business, chemistry, computer science, economics, engineering, health, history, law, math, other, philosophy, physics, psychology

Examples:
Q: What is the optimal capital structure for a corporation?
A: business

Q: How do neurons transmit signals?
A: biology

Q: What are the principles of contract law?
A: law

Now classify this question:
Q: {question}
A:

Important: The few-shot examples help the small 0.6B model learn the task better.

⚠️ Limitations

  1. Model Size: Qwen3-0.6B is relatively small (596M params)

    • Larger models (1.8B, 3B) would achieve 75-85% accuracy
  2. Overfitting: Best performance at epoch 4 (eval_loss: 0.82)

    • Later epochs showed overfitting (eval_loss increased to 1.12)
  3. Multi-word Categories: Requires careful parsing

    • "computer science" needs special handling vs "computer"
  4. Generative Overhead: Slower than classification head

    • Needs to generate tokens vs single forward pass
  5. MMLU-Pro Specific: Trained on academic questions

    • May not generalize well to other domains

πŸ”„ Comparison with Other Approaches

Model Approach Accuracy Speed
BERT-base Classification head 85-90% Fast
ModernBERT Classification head 87-92% Fast
Qwen3-0.6B (this) Generative 65-70% Medium
Qwen3-1.8B Generative 75-80% Slower

Why use this over BERT?

  • βœ… Generative models (better for complex reasoning)
  • βœ… Instruction-following format (flexible)
  • βœ… Can add explanations ("This is physics because...")
  • ❌ Lower accuracy than BERT for pure classification

πŸ“„ License

  • Model: Apache 2.0 (same as Qwen3 base model)
  • Dataset: MMLU-Pro license

πŸ™ Acknowledgements

πŸ“§ Contact

For questions or issues, please open an issue on the model repository.


Note: This is a LoRA adapter, not a full model. You need to load it with the base Qwen3-0.6B model.

Framework versions

  • PEFT 0.17.1
Downloads last month
25
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for llm-semantic-router/qwen3_generative_classifier_r16

Finetuned
Qwen/Qwen3-0.6B
Adapter
(122)
this model

Dataset used to train llm-semantic-router/qwen3_generative_classifier_r16

Evaluation results