File size: 4,712 Bytes
458db64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---

title: Turkish Sentiment Analysis (Fine-tuned)
emoji: 🚀
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
base_model: codealchemist01/turkish-sentiment-analysis
---


# Turkish Sentiment Analysis (Fine-tuned) 🇹🇷

Fine-tuned Turkish sentiment analysis model with improved neutral class detection. This model is based on [codealchemist01/turkish-sentiment-analysis](https://huggingface.co/codealchemist01/turkish-sentiment-analysis) and fine-tuned on a balanced dataset.

## Model Bilgileri

- **Model:** [codealchemist01/turkish-sentiment-analysis-finetuned](https://huggingface.co/codealchemist01/turkish-sentiment-analysis-finetuned)
- **Base Model:** [codealchemist01/turkish-sentiment-analysis](https://huggingface.co/codealchemist01/turkish-sentiment-analysis)
- **Task:** Text Classification (Sentiment Analysis)
- **Language:** Turkish
- **Labels:** positive, negative, neutral
- **Fine-tuning Type:** Continued fine-tuning on balanced dataset

## 🎯 Ana Özellikler

### İyileştirmeler:
-**Neutral sınıfı algılama:** %80 iyileşme (test örneklerinde)
-**Daha dengeli dataset:** 556,888 örnek (37.6% neutral)
-**Gerçek dünya performansı:** Daha iyi genelleme
-**Belirsiz ifadeler:** Daha doğru tahmin

### Performans:
- **Accuracy:** 91.96%
- **Neutral F1:** 90.57% ⬆️
- **Positive F1:** 94.61%
- **Negative F1:** 88.68%

## 📊 Eğitim Verisi

### Fine-tuning Dataset:
- **Toplam:** 556,888 örnek
- **Positive:** 237,966 (42.7%)
- **Neutral:** 209,668 (37.6%) ⬆️
- **Negative:** 109,254 (19.6%) ⬆️

### Kullanılan Dataset'ler:
1. **Orijinal Dataset:**
   - `winvoker/turkish-sentiment-analysis-dataset`
   - `WhiteAngelss/Turkce-Duygu-Analizi-Dataset`

2. **Ek Dataset'ler:**
   - `maydogan/Turkish_SentimentAnalysis_TRSAv1` (150,000 samples)
   - `turkish-nlp-suite/MusteriYorumlari` (73,920 samples)
   - `W4nkel/turkish-sentiment-dataset` (4,800 samples)
   - `mustfkeskin/turkish-movie-sentiment-analysis-dataset` (Kaggle, 83,227 samples)

## 🚀 Kullanım

### Python ile:

```python

from transformers import AutoTokenizer, AutoModelForSequenceClassification

import torch



# Load model

model_name = "codealchemist01/turkish-sentiment-analysis-finetuned"

tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForSequenceClassification.from_pretrained(model_name)



# Example text

text = "Ürün normal, beklediğim gibi. Özel bir şey yok."



# Tokenize

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)



# Predict

with torch.no_grad():

    outputs = model(**inputs)

    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

    predicted_label_id = predictions.argmax().item()



# Map to label

id2label = {0: "negative", 1: "neutral", 2: "positive"}

predicted_label = id2label[predicted_label_id]

confidence = predictions[0][predicted_label_id].item()



print(f"Label: {predicted_label}")

print(f"Confidence: {confidence:.4f}")

```

### Gradio Space:
Bu Space'te interaktif olarak test edebilirsiniz!

## 📈 İyileştirme Sonuçları

### Test Sonuçları (15 örnek test):
- **Genel Accuracy:** 66.7% → 86.7% (+20.0%)
- **Neutral:** 0% → 80% (+80.0%) 🚀
- **Negative:** 100% → 80%
- **Positive:** 100% → 100%

### Test Seti Performansı (55,689 örnek):
- **Accuracy:** 91.96%
- **Weighted F1:** 91.93%
- **Neutral F1:** 90.57%
- **Positive F1:** 94.61%
- **Negative F1:** 88.68%

## 🔧 Fine-tuning Detayları

- **Base Model:** codealchemist01/turkish-sentiment-analysis
- **Epochs:** 2
- **Learning Rate:** 1e-5 (fine-tuning için optimize edilmiş)
- **Batch Size:** 12
- **Max Length:** 128 tokens
- **Optimizer:** AdamW

## 💡 Kullanım Önerileri

- ✅ Neutral ifadeleri daha iyi algılar
- ✅ "Normal", "standart", "orta seviye" gibi ifadeleri doğru tahmin eder
- ✅ Daha dengeli sınıf performansı
- ✅ Gerçek dünya metinlerinde daha iyi genelleme

## ⚠️ Limitasyonlar

- Çok kısa metinlerde (< 3 kelime) performans düşebilir
- Farklı domainlerde (sosyal medya, haber, yorum) performans değişebilir
- Bazı belirsiz ifadeler hala yanlış tahmin edilebilir

## 📝 Citation

```bibtex

@misc{turkish-sentiment-analysis-finetuned,

  title={Turkish Sentiment Analysis Model (Fine-tuned)},

  author={codealchemist01},

  year={2024},

  base_model={codealchemist01/turkish-sentiment-analysis},

  howpublished={\url{https://huggingface.co/codealchemist01/turkish-sentiment-analysis-finetuned}}

}

```

## 📄 License

Apache 2.0