codealchemist01 commited on
Commit
92386d0
·
verified ·
1 Parent(s): dda1843

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +128 -0
README.md ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: tr
3
+ tags:
4
+ - sentiment-analysis
5
+ - turkish
6
+ - bert
7
+ - text-classification
8
+ license: apache-2.0
9
+ datasets:
10
+ - winvoker/turkish-sentiment-analysis-dataset
11
+ - WhiteAngelss/Turkce-Duygu-Analizi-Dataset
12
+ metrics:
13
+ - accuracy
14
+ - f1
15
+ - precision
16
+ - recall
17
+ ---
18
+
19
+ # Turkish Sentiment Analysis Model
20
+
21
+ A fine-tuned BERT model for Turkish sentiment analysis, trained on a combined dataset of 439,384 labeled Turkish sentences.
22
+
23
+ ## Model Details
24
+
25
+ - **Base Model:** `dbmdz/bert-base-turkish-cased`
26
+ - **Task:** Text Classification (Sentiment Analysis)
27
+ - **Language:** Turkish
28
+ - **Labels:** positive, negative, neutral
29
+
30
+ ## Training Data
31
+
32
+ The model was trained on a combination of two high-quality Turkish sentiment datasets:
33
+ - `winvoker/turkish-sentiment-analysis-dataset` (440,641 samples)
34
+ - `WhiteAngelss/Turkce-Duygu-Analizi-Dataset` (440,641 samples)
35
+
36
+ After deduplication and preprocessing, the final training set consisted of:
37
+ - **Training:** 351,507 samples
38
+ - **Validation:** 43,938 samples
39
+ - **Test:** 43,939 samples
40
+
41
+ ### Label Distribution
42
+
43
+ - **Positive:** 234,957 (53.5%)
44
+ - **Neutral:** 153,809 (35.0%)
45
+ - **Negative:** 50,618 (11.5%)
46
+
47
+ ## Training
48
+
49
+ - **Epochs:** 3
50
+ - **Learning Rate:** 2e-5
51
+ - **Batch Size:** 16
52
+ - **Max Length:** 128
53
+ - **Optimizer:** AdamW
54
+
55
+ ## Usage
56
+
57
+ ```python
58
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
59
+ import torch
60
+
61
+ # Load model and tokenizer
62
+ model_name = "codealchemist01/turkish-sentiment-analysis"
63
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
64
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
65
+
66
+ # Example text
67
+ text = "Bu ürün gerçekten harika!"
68
+
69
+ # Tokenize
70
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
71
+
72
+ # Predict
73
+ with torch.no_grad():
74
+ outputs = model(**inputs)
75
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
76
+ predicted_label_id = predictions.argmax().item()
77
+
78
+ # Map to label
79
+ id2label = {0: "negative", 1: "neutral", 2: "positive"}
80
+ predicted_label = id2label[predicted_label_id]
81
+ confidence = predictions[0][predicted_label_id].item()
82
+
83
+ print(f"Label: {predicted_label}")
84
+ print(f"Confidence: {confidence:.4f}")
85
+ ```
86
+
87
+ ## Performance
88
+
89
+ Evaluation metrics on the test set (43,939 samples):
90
+
91
+ - **Accuracy:** 97.45%
92
+ - **Weighted F1:** 97.42%
93
+ - **Weighted Precision:** 97.41%
94
+ - **Weighted Recall:** 97.45%
95
+
96
+ ### Per-Class Performance
97
+
98
+ | Class | Precision | Recall | F1-Score | Support |
99
+ |----------|-----------|--------|----------|---------|
100
+ | Negative | 91.42% | 86.69% | 88.99% | 5,062 |
101
+ | Neutral | 99.79% | 99.96% | 99.87% | 15,381 |
102
+ | Positive | 97.15% | 98.12% | 97.63% | 23,496 |
103
+
104
+ **Note:** Negative class has lower performance due to class imbalance (only 11.5% of the dataset). The model performs excellently on neutral and positive classes.
105
+
106
+ ## Limitations
107
+
108
+ - The model may not perform well on very short texts (< 3 words)
109
+ - Performance may vary across different domains (social media, news, reviews)
110
+ - Class imbalance may affect performance on minority classes (negative)
111
+
112
+ ## Citation
113
+
114
+ If you use this model, please cite:
115
+
116
+ ```bibtex
117
+ @misc{turkish-sentiment-analysis,
118
+ title={Turkish Sentiment Analysis Model},
119
+ author={codealchemist01},
120
+ year={2024},
121
+ howpublished={\url{https://huggingface.co/codealchemist01/turkish-sentiment-analysis}}
122
+ }
123
+ ```
124
+
125
+ ## License
126
+
127
+ Apache 2.0
128
+