File size: 4,712 Bytes
458db64 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
---
title: Turkish Sentiment Analysis (Fine-tuned)
emoji: 🚀
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
base_model: codealchemist01/turkish-sentiment-analysis
---
# Turkish Sentiment Analysis (Fine-tuned) 🇹🇷
Fine-tuned Turkish sentiment analysis model with improved neutral class detection. This model is based on [codealchemist01/turkish-sentiment-analysis](https://huggingface.co/codealchemist01/turkish-sentiment-analysis) and fine-tuned on a balanced dataset.
## Model Bilgileri
- **Model:** [codealchemist01/turkish-sentiment-analysis-finetuned](https://huggingface.co/codealchemist01/turkish-sentiment-analysis-finetuned)
- **Base Model:** [codealchemist01/turkish-sentiment-analysis](https://huggingface.co/codealchemist01/turkish-sentiment-analysis)
- **Task:** Text Classification (Sentiment Analysis)
- **Language:** Turkish
- **Labels:** positive, negative, neutral
- **Fine-tuning Type:** Continued fine-tuning on balanced dataset
## 🎯 Ana Özellikler
### İyileştirmeler:
- ✅ **Neutral sınıfı algılama:** %80 iyileşme (test örneklerinde)
- ✅ **Daha dengeli dataset:** 556,888 örnek (37.6% neutral)
- ✅ **Gerçek dünya performansı:** Daha iyi genelleme
- ✅ **Belirsiz ifadeler:** Daha doğru tahmin
### Performans:
- **Accuracy:** 91.96%
- **Neutral F1:** 90.57% ⬆️
- **Positive F1:** 94.61%
- **Negative F1:** 88.68%
## 📊 Eğitim Verisi
### Fine-tuning Dataset:
- **Toplam:** 556,888 örnek
- **Positive:** 237,966 (42.7%)
- **Neutral:** 209,668 (37.6%) ⬆️
- **Negative:** 109,254 (19.6%) ⬆️
### Kullanılan Dataset'ler:
1. **Orijinal Dataset:**
- `winvoker/turkish-sentiment-analysis-dataset`
- `WhiteAngelss/Turkce-Duygu-Analizi-Dataset`
2. **Ek Dataset'ler:**
- `maydogan/Turkish_SentimentAnalysis_TRSAv1` (150,000 samples)
- `turkish-nlp-suite/MusteriYorumlari` (73,920 samples)
- `W4nkel/turkish-sentiment-dataset` (4,800 samples)
- `mustfkeskin/turkish-movie-sentiment-analysis-dataset` (Kaggle, 83,227 samples)
## 🚀 Kullanım
### Python ile:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model
model_name = "codealchemist01/turkish-sentiment-analysis-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example text
text = "Ürün normal, beklediğim gibi. Özel bir şey yok."
# Tokenize
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
# Predict
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_label_id = predictions.argmax().item()
# Map to label
id2label = {0: "negative", 1: "neutral", 2: "positive"}
predicted_label = id2label[predicted_label_id]
confidence = predictions[0][predicted_label_id].item()
print(f"Label: {predicted_label}")
print(f"Confidence: {confidence:.4f}")
```
### Gradio Space:
Bu Space'te interaktif olarak test edebilirsiniz!
## 📈 İyileştirme Sonuçları
### Test Sonuçları (15 örnek test):
- **Genel Accuracy:** 66.7% → 86.7% (+20.0%)
- **Neutral:** 0% → 80% (+80.0%) 🚀
- **Negative:** 100% → 80%
- **Positive:** 100% → 100%
### Test Seti Performansı (55,689 örnek):
- **Accuracy:** 91.96%
- **Weighted F1:** 91.93%
- **Neutral F1:** 90.57%
- **Positive F1:** 94.61%
- **Negative F1:** 88.68%
## 🔧 Fine-tuning Detayları
- **Base Model:** codealchemist01/turkish-sentiment-analysis
- **Epochs:** 2
- **Learning Rate:** 1e-5 (fine-tuning için optimize edilmiş)
- **Batch Size:** 12
- **Max Length:** 128 tokens
- **Optimizer:** AdamW
## 💡 Kullanım Önerileri
- ✅ Neutral ifadeleri daha iyi algılar
- ✅ "Normal", "standart", "orta seviye" gibi ifadeleri doğru tahmin eder
- ✅ Daha dengeli sınıf performansı
- ✅ Gerçek dünya metinlerinde daha iyi genelleme
## ⚠️ Limitasyonlar
- Çok kısa metinlerde (< 3 kelime) performans düşebilir
- Farklı domainlerde (sosyal medya, haber, yorum) performans değişebilir
- Bazı belirsiz ifadeler hala yanlış tahmin edilebilir
## 📝 Citation
```bibtex
@misc{turkish-sentiment-analysis-finetuned,
title={Turkish Sentiment Analysis Model (Fine-tuned)},
author={codealchemist01},
year={2024},
base_model={codealchemist01/turkish-sentiment-analysis},
howpublished={\url{https://huggingface.co/codealchemist01/turkish-sentiment-analysis-finetuned}}
}
```
## 📄 License
Apache 2.0
|