germeval2025 / README.md
Christian Rene Thelen
Mixed up gradio and python version
fbdafda

A newer version of the Gradio SDK is available: 5.49.1

Upgrade
metadata
title: AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection
emoji: 🍭
colorFrom: yellow
colorTo: pink
sdk: gradio
python_version: 3.12.9
sdk_version: 5.35.0
app_file: app.py
pinned: true

AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection 🍭

Results

Subtask Submission Model (strict) F1 Score
1 1 Qwen3-Embedding-8B 0.875 Notebook
1 2 XLM-RoBERTa-Large 0.891 Notebook
2 1 GBERT-Large 0.623 Notebook
2 2 XLM-RoBERTa-Large 0.631 Notebook

Setup

python_version="$(cat .python-version)"

# install the interpreter if it’s missing
pyenv install -s "${python_version}"

# select python version for current shell
pyenv shell "${python_version}"

# create venv if missing
if [[ ! -d venv ]]; then
  python -m venv venv
fi

# activate venv & install packages
source venv/bin/activate

pip install -U pip setuptools wheel
pip install -r requirements.txt

:trophy: Model

Model on Huggingface

Model Details

Model Description

This model is a fine-tuned XLM-RoBERTa-Large adapted for the GermEval 2025 Shared Task on Candy Speech Detection. It was trained to identify candy speech at both:

  • Binary level: Classify whether a comment contains candy speech.
  • Span level: Detect the exact spans and categories of candy speech within comments, using a BIO tagging scheme across 10 categories (positive feedback, compliment, affection declaration, encouragement, gratitude, agreement, ambiguous, implicit, group membership, sympathy).

The span-level model also proved effective for binary detection by classifying a comment as candy speech if at least one positive span was detected.

Intended Uses

  • Research: Analysis of positive/supportive communication in German social media.
  • Applications: Social media analytics, conversational AI safety (mitigating sycophancy), computational social science.
  • Not for: Deployments without fairness/robustness testing on out-of-domain data.

Performance

  • Dataset: 46k German YouTube comments, annotated with candy speech spans.

  • Training Data Split: 37,057 comments (train), 9,229 (test).

  • Shared Task Results:

    • Subtask 1 (binary detection): Positive F1 = 0.891 (ranked 1st)
    • Subtask 2 (span detection): Strict F1 = 0.631 (ranked 1st)

Training Procedure

  • Architecture: XLM-RoBERTa-Large + linear classification layer (BIO tagging, 21 labels including “O”).
  • Optimizer: AdamW
  • Learning Rate: Peak 2e-5 with linear decay and warmup (500 steps).
  • Epochs: 20 (with early stopping).
  • Batch Size: 32
  • Regularization: Dropout (0.1), weight decay (0.01), gradient clipping (L2 norm 1.0).
  • Postprocessing: BIO tag correction and subword alignment.

Limitations

  • Domain Specificity: Trained only on German YouTube comments; performance may degrade on other platforms, genres, or languages.
  • Overlapping Spans: Cannot handle overlapping spans, as they were rare (<2%) in the training data.
  • Biases: May reflect biases present in the dataset (e.g., demographic skews in YouTube communities).
  • Generalization: Needs evaluation before deployment in real-world moderation systems.

Ethical Considerations

  • Positive speech detection is less studied than toxic speech, but automatic labeling of “supportiveness” may reinforce cultural biases about what counts as “positive.”
  • Must be complemented with human-in-the-loop moderation to avoid misuse.

Citation

If you use this model, please cite:

@inproceedings{thelen-etal-2025-aixcellent,
    title = "{AI}xcellent Vibes at {G}erm{E}val 2025 Shared Task on Candy Speech Detection: Improving Model Performance by Span-Level Training",
    author = "Thelen, Christian Rene  and
      Blaneck, Patrick Gustav  and
      Bornheim, Tobias  and
      Grieger, Niklas  and
      Bialonski, Stephan",
    editor = "Wartena, Christian  and
      Heid, Ulrich",
    booktitle = "Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Workshops",
    month = sep,
    year = "2025",
    address = "Hannover, Germany",
    publisher = "HsH Applied Academics",
    url = "https://aclanthology.org/2025.konvens-2.33/",
    pages = "398--403"
}