codealchemist01 commited on
Commit
458db64
·
verified ·
1 Parent(s): ac12c6d

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +149 -12
README.md CHANGED
@@ -1,12 +1,149 @@
1
- ---
2
- title: Turkish Sentiment Analysis Finetuned
3
- emoji: 😻
4
- colorFrom: red
5
- colorTo: yellow
6
- sdk: gradio
7
- sdk_version: 5.49.1
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Turkish Sentiment Analysis (Fine-tuned)
3
+ emoji: 🚀
4
+ colorFrom: purple
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ base_model: codealchemist01/turkish-sentiment-analysis
12
+ ---
13
+
14
+ # Turkish Sentiment Analysis (Fine-tuned) 🇹🇷
15
+
16
+ Fine-tuned Turkish sentiment analysis model with improved neutral class detection. This model is based on [codealchemist01/turkish-sentiment-analysis](https://huggingface.co/codealchemist01/turkish-sentiment-analysis) and fine-tuned on a balanced dataset.
17
+
18
+ ## Model Bilgileri
19
+
20
+ - **Model:** [codealchemist01/turkish-sentiment-analysis-finetuned](https://huggingface.co/codealchemist01/turkish-sentiment-analysis-finetuned)
21
+ - **Base Model:** [codealchemist01/turkish-sentiment-analysis](https://huggingface.co/codealchemist01/turkish-sentiment-analysis)
22
+ - **Task:** Text Classification (Sentiment Analysis)
23
+ - **Language:** Turkish
24
+ - **Labels:** positive, negative, neutral
25
+ - **Fine-tuning Type:** Continued fine-tuning on balanced dataset
26
+
27
+ ## 🎯 Ana Özellikler
28
+
29
+ ### İyileştirmeler:
30
+ - ✅ **Neutral sınıfı algılama:** %80 iyileşme (test örneklerinde)
31
+ - ✅ **Daha dengeli dataset:** 556,888 örnek (37.6% neutral)
32
+ - ✅ **Gerçek dünya performansı:** Daha iyi genelleme
33
+ - ✅ **Belirsiz ifadeler:** Daha doğru tahmin
34
+
35
+ ### Performans:
36
+ - **Accuracy:** 91.96%
37
+ - **Neutral F1:** 90.57% ⬆️
38
+ - **Positive F1:** 94.61%
39
+ - **Negative F1:** 88.68%
40
+
41
+ ## 📊 Eğitim Verisi
42
+
43
+ ### Fine-tuning Dataset:
44
+ - **Toplam:** 556,888 örnek
45
+ - **Positive:** 237,966 (42.7%)
46
+ - **Neutral:** 209,668 (37.6%) ⬆️
47
+ - **Negative:** 109,254 (19.6%) ⬆️
48
+
49
+ ### Kullanılan Dataset'ler:
50
+ 1. **Orijinal Dataset:**
51
+ - `winvoker/turkish-sentiment-analysis-dataset`
52
+ - `WhiteAngelss/Turkce-Duygu-Analizi-Dataset`
53
+
54
+ 2. **Ek Dataset'ler:**
55
+ - `maydogan/Turkish_SentimentAnalysis_TRSAv1` (150,000 samples)
56
+ - `turkish-nlp-suite/MusteriYorumlari` (73,920 samples)
57
+ - `W4nkel/turkish-sentiment-dataset` (4,800 samples)
58
+ - `mustfkeskin/turkish-movie-sentiment-analysis-dataset` (Kaggle, 83,227 samples)
59
+
60
+ ## 🚀 Kullanım
61
+
62
+ ### Python ile:
63
+
64
+ ```python
65
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
66
+ import torch
67
+
68
+ # Load model
69
+ model_name = "codealchemist01/turkish-sentiment-analysis-finetuned"
70
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
71
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
72
+
73
+ # Example text
74
+ text = "Ürün normal, beklediğim gibi. Özel bir şey yok."
75
+
76
+ # Tokenize
77
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
78
+
79
+ # Predict
80
+ with torch.no_grad():
81
+ outputs = model(**inputs)
82
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
83
+ predicted_label_id = predictions.argmax().item()
84
+
85
+ # Map to label
86
+ id2label = {0: "negative", 1: "neutral", 2: "positive"}
87
+ predicted_label = id2label[predicted_label_id]
88
+ confidence = predictions[0][predicted_label_id].item()
89
+
90
+ print(f"Label: {predicted_label}")
91
+ print(f"Confidence: {confidence:.4f}")
92
+ ```
93
+
94
+ ### Gradio Space:
95
+ Bu Space'te interaktif olarak test edebilirsiniz!
96
+
97
+ ## 📈 İyileştirme Sonuçları
98
+
99
+ ### Test Sonuçları (15 örnek test):
100
+ - **Genel Accuracy:** 66.7% → 86.7% (+20.0%)
101
+ - **Neutral:** 0% → 80% (+80.0%) 🚀
102
+ - **Negative:** 100% → 80%
103
+ - **Positive:** 100% → 100%
104
+
105
+ ### Test Seti Performansı (55,689 örnek):
106
+ - **Accuracy:** 91.96%
107
+ - **Weighted F1:** 91.93%
108
+ - **Neutral F1:** 90.57%
109
+ - **Positive F1:** 94.61%
110
+ - **Negative F1:** 88.68%
111
+
112
+ ## 🔧 Fine-tuning Detayları
113
+
114
+ - **Base Model:** codealchemist01/turkish-sentiment-analysis
115
+ - **Epochs:** 2
116
+ - **Learning Rate:** 1e-5 (fine-tuning için optimize edilmiş)
117
+ - **Batch Size:** 12
118
+ - **Max Length:** 128 tokens
119
+ - **Optimizer:** AdamW
120
+
121
+ ## 💡 Kullanım Önerileri
122
+
123
+ - ✅ Neutral ifadeleri daha iyi algılar
124
+ - ✅ "Normal", "standart", "orta seviye" gibi ifadeleri doğru tahmin eder
125
+ - ✅ Daha dengeli sınıf performansı
126
+ - ✅ Gerçek dünya metinlerinde daha iyi genelleme
127
+
128
+ ## ⚠️ Limitasyonlar
129
+
130
+ - Çok kısa metinlerde (< 3 kelime) performans düşebilir
131
+ - Farklı domainlerde (sosyal medya, haber, yorum) performans değişebilir
132
+ - Bazı belirsiz ifadeler hala yanlış tahmin edilebilir
133
+
134
+ ## 📝 Citation
135
+
136
+ ```bibtex
137
+ @misc{turkish-sentiment-analysis-finetuned,
138
+ title={Turkish Sentiment Analysis Model (Fine-tuned)},
139
+ author={codealchemist01},
140
+ year={2024},
141
+ base_model={codealchemist01/turkish-sentiment-analysis},
142
+ howpublished={\url{https://huggingface.co/codealchemist01/turkish-sentiment-analysis-finetuned}}
143
+ }
144
+ ```
145
+
146
+ ## 📄 License
147
+
148
+ Apache 2.0
149
+