Spaces:
Runtime error
Runtime error
refactor the code for better scalability and update tsac naming to sentiment analysis, adding madar dataset for transliteration and normalization eval
bde1c71
| ## πΊοΈ Tunisian NLP Leaderboard Roadmap | |
| ### π Phase 1: Dataset Acquisition & Preparation | |
| #### 1. **Sentiment Analysis** | |
| * **Existing Dataset**: **TUNIZI** | |
| * **Description**: A large dataset containing 100,000 Tunisian Arabizi comments annotated as positive, negative, or neutral. | |
| * **Source**: [K4All Foundation](https://k4all.org/project/database-tunisian-arabizi/) | |
| * **Usage**: Utilize this dataset to evaluate models' performance in sentiment classification tasks. | |
| #### 2. **Named Entity Recognition (NER)** | |
| * **Existing Dataset**: **ArabNER** | |
| * **Description**: A comprehensive Arabic NER corpus that can be adapted for Tunisian dialects. | |
| * **Source**: [ResearchGate](https://www.researchgate.net/publication/374279027_Named_Entity_Recognition_of_Tunisian_Arabic_Using_the_Bi-LSTM-CRF_Model) | |
| * **Usage**: Fine-tune models on this dataset to assess their ability to recognize entities in Tunisian Arabic text. | |
| #### 3. **Corpus Coverage** | |
| * **Existing Dataset**: **Tunisian Dialect Corpus** | |
| * **Description**: A sizable collection of Tunisian dialect texts, useful for assessing vocabulary coverage. | |
| * **Source**: [Hugging Face](https://huggingface.co/collections/tunis-ai/arabic-datasets-66344cf0df31dc81eb1dcf55) | |
| * **Usage**: Evaluate models' coverage of the Tunisian dialect vocabulary using this corpus. | |
| #### 4. **Arabizi Robustness** | |
| * **Existing Dataset**: **TUNIZI** | |
| * **Description**: Since it's in Arabizi, it can also serve to evaluate models' robustness to this writing style. | |
| * **Source**: [K4All Foundation](https://k4all.org/project/database-tunisian-arabizi/) | |
| * **Usage**: Assess models' robustness to Arabizi by evaluating their performance on this dataset. | |
| #### 5. **Code-Switching** | |
| * **Existing Dataset**: **TunSwitch** | |
| * **Description**: A dataset of code-switched Tunisian Arabic speech, valuable for training and evaluating models on code-switching tasks. | |
| * **Source**: [Zenodo](https://zenodo.org/records/8342762) | |
| * **Usage**: Evaluate models' ability to handle code-switching between Tunisian Arabic and other languages using this dataset. | |
| #### 6. **Typo Robustness** | |
| * **Existing Dataset**: **TUNIZI** | |
| * **Description**: Its informal nature includes typographical variations, making it suitable for evaluating models' tolerance to typos. | |
| * **Source**: [K4All Foundation](https://k4all.org/project/database-tunisian-arabizi/) | |
| * **Usage**: Assess models' robustness to typographical errors by evaluating their performance on this dataset. | |
| #### 7. **Zero-Shot Transfer** | |
| * **Existing Dataset**: **TUNIZI** | |
| * **Description**: Can be used to test models' ability to generalize to tasks they weren't explicitly trained on. | |
| * **Source**: [K4All Foundation](https://k4all.org/project/database-tunisian-arabizi/) | |
| * **Usage**: Evaluate models' zero-shot transfer capabilities by assessing their performance on this dataset. | |
| #### 8. **Domain Shift** | |
| * **Existing Dataset**: **TUNIZI** | |
| * **Description**: Its diverse sources provide a foundation for testing domain adaptation capabilities. | |
| * **Source**: [K4All Foundation](https://k4all.org/project/database-tunisian-arabizi/) | |
| * **Usage**: Assess models' ability to adapt to different domains by evaluating their performance on this dataset. | |
| --- | |
| ### π§ͺ Phase 2: Metric Development & Evaluation Tasks | |
| For each task, define the evaluation metric and the corresponding dataset: | |
| | Task | Metric | Dataset | | |
| | ------------------------ | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | | |
| | Sentiment Analysis | Accuracy / F1 Score | [TUNIZI](https://k4all.org/project/database-tunisian-arabizi/) | | |
| | Named Entity Recognition | F1 Score | [ArabNER](https://www.researchgate.net/publication/374279027_Named_Entity_Recognition_of_Tunisian_Arabic_Using_the_Bi-LSTM-CRF_Model) | | |
| | Corpus Coverage | Vocabulary Coverage (%) | [Tunisian Dialect Corpus](https://huggingface.co/collections/tunis-ai/arabic-datasets-66344cf0df31dc81eb1dcf55) | | |
| | Arabizi Robustness | Accuracy / F1 Score | [TUNIZI](https://k4all.org/project/database-tunisian-arabizi/) | | |
| | Code-Switching | Accuracy / F1 Score | [TunSwitch](https://zenodo.org/records/8342762) | | |
| | Typo Robustness | Accuracy / F1 Score | [TUNIZI]([https://k4all.org/project/database-tunisian](https://k4all.org/project/database-tunisian) | | |
| Certainly! Here's a comprehensive roadmap to guide you through enhancing your **TunisianEncoderModelsLeaderboard** project, focusing on dataset acquisition, metric development, and evaluation tasks. | |
| --- | |
| ## πΊοΈ Tunisian NLP Leaderboard Roadmap | |
| ### π Phase 1: Dataset Acquisition & Preparation | |
| #### 1. **Sentiment Analysis** | |
| * **Existing Dataset**: **TUNIZI** | |
| * **Description**: A large dataset containing 100,000 Tunisian Arabizi comments annotated as positive, negative, or neutral. | |
| * **Source**: [K4All Foundation](https://k4all.org/project/database-tunisian-arabizi/) | |
| * **Usage**: Utilize this dataset to evaluate models' performance in sentiment classification tasks. | |
| #### 2. **Named Entity Recognition (NER)** | |
| * **Existing Dataset**: **ArabNER** | |
| * **Description**: A comprehensive Arabic NER corpus that can be adapted for Tunisian dialects. | |
| * **Source**: [ResearchGate](https://www.researchgate.net/publication/374279027_Named_Entity_Recognition_of_Tunisian_Arabic_Using_the_Bi-LSTM-CRF_Model) | |
| * **Usage**: Fine-tune models on this dataset to assess their ability to recognize entities in Tunisian Arabic text. | |
| #### 3. **Corpus Coverage** | |
| * **Existing Dataset**: **Tunisian Dialect Corpus** | |
| * **Description**: A sizable collection of Tunisian dialect texts, useful for assessing vocabulary coverage. | |
| * **Source**: [Hugging Face](https://huggingface.co/collections/tunis-ai/arabic-datasets-66344cf0df31dc81eb1dcf55) | |
| * **Usage**: Evaluate models' coverage of the Tunisian dialect vocabulary using this corpus. | |
| #### 4. **Arabizi Robustness** | |
| * **Existing Dataset**: **TUNIZI** | |
| * **Description**: Since it's in Arabizi, it can also serve to evaluate models' robustness to this writing style. | |
| * **Source**: [K4All Foundation](https://k4all.org/project/database-tunisian-arabizi/) | |
| * **Usage**: Assess models' robustness to Arabizi by evaluating their performance on this dataset. | |
| #### 5. **Code-Switching** | |
| * **Existing Dataset**: **TunSwitch** | |
| * **Description**: A dataset of code-switched Tunisian Arabic speech, valuable for training and evaluating models on code-switching tasks. | |
| * **Source**: [Zenodo](https://zenodo.org/records/8342762) | |
| * **Usage**: Evaluate models' ability to handle code-switching between Tunisian Arabic and other languages using this dataset. | |
| #### 6. **Typo Robustness** | |
| * **Existing Dataset**: **TUNIZI** | |
| * **Description**: Its informal nature includes typographical variations, making it suitable for evaluating models' tolerance to typos. | |
| * **Source**: [K4All Foundation](https://k4all.org/project/database-tunisian-arabizi/) | |
| * **Usage**: Assess models' robustness to typographical errors by evaluating their performance on this dataset. | |
| #### 7. **Zero-Shot Transfer** | |
| * **Existing Dataset**: **TUNIZI** | |
| * **Description**: Can be used to test models' ability to generalize to tasks they weren't explicitly trained on. | |
| * **Source**: [K4All Foundation](https://k4all.org/project/database-tunisian-arabizi/) | |
| * **Usage**: Evaluate models' zero-shot transfer capabilities by assessing their performance on this dataset. | |
| #### 8. **Domain Shift** | |
| * **Existing Dataset**: **TUNIZI** | |
| * **Description**: Its diverse sources provide a foundation for testing domain adaptation capabilities. | |
| * **Source**: [K4All Foundation](https://k4all.org/project/database-tunisian-arabizi/) | |
| * **Usage**: Assess models' ability to adapt to different domains by evaluating their performance on this dataset. | |
| --- | |
| ### π§ͺ Phase 2: Metric Development & Evaluation Tasks | |
| For each task, define the evaluation metric and the corresponding dataset: | |
| | Task | Metric | Dataset | | |
| | ------------------------ | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | | |
| | Sentiment Analysis | Accuracy / F1 Score | [TUNIZI](https://k4all.org/project/database-tunisian-arabizi/) | | |
| | Named Entity Recognition | F1 Score | [ArabNER](https://www.researchgate.net/publication/374279027_Named_Entity_Recognition_of_Tunisian_Arabic_Using_the_Bi-LSTM-CRF_Model) | | |
| | Corpus Coverage | Vocabulary Coverage (%) | [Tunisian Dialect Corpus](https://huggingface.co/collections/tunis-ai/arabic-datasets-66344cf0df31dc81eb1dcf55) | | |
| | Arabizi Robustness | Accuracy / F1 Score | [TUNIZI](https://k4all.org/project/database-tunisian-arabizi/) | | |
| | Code-Switching | Accuracy / F1 Score | [TunSwitch](https://zenodo.org/records/8342762) | | |
| | Typo Robustness | Accuracy / F1 Score | [TUNIZI](https://k4all.org/project/database-tunisian-arabizi/) | | |
| | Zero-Shot Transfer | Accuracy / F1 Score | [TUNIZI](https://k4all.org/project/database-tunisian-arabizi/) | | |
| | Domain Shift | Accuracy / F1 Score | [TUNIZI](https://k4all.org/project/database-tunisian-arabizi/) | | |
| --- | |
| ### ποΈ Suggested Folder Structure | |
| To maintain organization and clarity, consider the following structure: | |
| ``` | |
| TunisianEncoderModelsLeaderboard/ | |
| βββ datasets/ | |
| β βββ sentiment/ | |
| β β βββ tunizi.json | |
| β βββ ner/ | |
| β β βββ arabner.json | |
| β βββ coverage/ | |
| β β βββ tunisian_dialect_corpus.json | |
| β βββ arabizi_robustness/ | |
| β β βββ tunizi.json | |
| β βββ code_switching/ | |
| β β βββ tunswitch.json | |
| β βββ typo_robustness/ | |
| β β βββ tunizi_with_typos.json | |
| β βββ zero_shot/ | |
| β β βββ tunizi.json | |
| β βββ domain_shift/ | |
| β βββ tunisian_domain_shift.json | |
| βββ scripts/ | |
| β βββ preprocess.py | |
| β βββ evaluate.py | |
| β βββ visualize.py | |
| βββ README.md | |
| ``` | |
| --- | |
| ### β Next Steps | |
| 1. **Integrate Existing Datasets**: Incorporate the datasets mentioned above into your repository, ensuring they are properly formatted and documented. | |
| 2. **Develop Evaluation Scripts**: Write scripts to evaluate models on each task, ensuring they are compatible with the leaderboard format. | |
| 3. **Populate the Leaderboard**: As models are evaluated, update the leaderboard to reflect their performance across tasks. | |
| 4. **Documentation**: Update the README.md file to provide clear instructions on how to use the leaderboard, contribute models, and interpret results. | |
| --- | |
| If you need assistance with data collection, annotation guidelines, or script development, feel free to ask! | |