clapAI/mmBERT-small-multilingual-sentiment
Introduction
mmBERT-small-multilingual-sentiment is a multilingual sentiment classification model, part of the Multilingual-Sentiment collection.
The model is fine-tuned from jhu-clsp/mmBERT-small using the multilingual sentiment dataset clapAI/MultiLingualSentiment.
Model supports multilingual sentiment classification across 16+ languages, including English, Vietnamese, Chinese, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Arabic, and more.
Key Highlights
📈 Improved accuracy: Achieves F1 = 82.2.
📜 Long context support: Handles sequences up to 8192 tokens.
🪶 Efficient size: Only 140M parameters, smaller than RoBERTa-base (278M) with better performance.
⚡ FlashAttention-2 support: Enables much faster inference on modern GPUs.
Evaluation & Performance
Results on the test split of clapAI/MultiLingualSentiment
| Model | Pretrained Model | Parameters | Context-length | F1-score | 
|---|---|---|---|---|
| clapAI/mmBERT-small-multilingual-sentiment | jhu-clsp/mmBERT-small | 140M | 8192 | 82.2 | 
| modernBERT-base-multilingual-sentiment | ModernBERT-base | 150M | 8192 | 80.16 | 
| roberta-base-multilingual-sentiment | XLM-roberta-base | 278M | 512 | 81.8 | 
How to use
Installation
pip install torch==2.8
pip install transformers==4.55.0
Optional: accelerate inference with FlashAttention-2 (if supported by your GPU):
pip install packaging==25.0 ninja==1.13.0
MAX_JOBS=4 pip install flash-attn==2.8.3 --no-build-isolation
Example Usage
Try it on Google Colab
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_id = "clapAI/mmBERT-small-multilingual-sentiment"
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_id)
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    torch_dtype=dtype,
    # Uncomment if device supports FA2
    # attn_implementation="flash_attention_2" 
)
model.to(device)
model.eval()
# Retrieve labels from the model's configuration
id2label = model.config.id2label
texts = [
    "I absolutely love the new design of this app!",  # English
    "الخدمة كانت سيئة للغاية.",
    "Ich bin sehr zufrieden mit dem Kauf.",  # German
    "El producto llegó roto y no funciona.",  # Spanish
    "J'adore ce restaurant, la nourriture est délicieuse!",  # French
    "Makanannya benar-benar tidak enak.",  # Indonesian
    "この製品は本当に素晴らしいです!",  # Japanese
    "고객 서비스가 정말 실망스러웠어요.",  # Korean
    "Этот фильм просто потрясающий!",  # Russian
    "Tôi thực sự yêu thích sản phẩm này!",  # Vietnamese
    "质量真的很差。"  # Chinese
]
for text in texts:
    inputs = tokenizer(text, return_tensors="pt").to(device)
    with torch.inference_mode():
        outputs = model(**inputs)
        prediction = id2label[outputs.logits.argmax(dim=-1).item()]
    print(f"Text: {text} | Prediction: {prediction}")
Citation
If you use this model, please consider citing:
@misc{clapAI_mmbert_small_multilingual_sentiment,
      title={mmBERT-small-multilingual-sentiment: A Multilingual Sentiment Classification Model},
      author={clapAI},
      howpublished={\url{https://huggingface.co/clapAI/mmBERT-small-multilingual-sentiment}},
      year={2025},
}
- Downloads last month
- 102
Model tree for clapAI/mmBERT-small-multilingual-sentiment
Base model
jhu-clsp/mmBERT-small