Spaces:
Sleeping
Sleeping
| title: AIDetector | |
| emoji: π | |
| colorFrom: purple | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: 5.45.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| # Advanced AI Text Detector π | |
| An advanced AI text detection system that identifies AI-generated content, particularly from ChatGPT and similar language models. | |
| ## Features | |
| ### π€ Dual Detection Methods | |
| - **Transformer-based Detection**: Uses fine-tuned RoBERTa model specifically trained on ChatGPT detection | |
| - **Statistical Analysis**: Employs multiple linguistic metrics for robust detection | |
| ### π Comprehensive Analysis Metrics | |
| - **Burstiness Analysis**: Measures sentence length variation (human text is typically more "bursty") | |
| - **Vocabulary Diversity**: Analyzes lexical richness and word variety | |
| - **Repetition Detection**: Identifies repeated phrases and patterns | |
| - **Perplexity Scoring**: Evaluates text predictability | |
| - **Punctuation Patterns**: Analyzes punctuation consistency | |
| ### π― High Accuracy Features | |
| - Multi-method ensemble approach for improved accuracy | |
| - Confidence scoring system | |
| - Detailed explanations for each detection | |
| - Visual probability distribution | |
| ## How It Works | |
| 1. **Input Processing**: The text is tokenized and prepared for analysis | |
| 2. **Transformer Analysis**: If available, the RoBERTa model provides initial AI probability | |
| 3. **Statistical Analysis**: Multiple linguistic features are extracted and analyzed | |
| 4. **Score Combination**: Results are weighted and combined for final prediction | |
| 5. **Result Generation**: Detailed report with classification, confidence, and explanations | |
| ## Detection Categories | |
| - **AI-Generated**: >80% AI probability (High confidence) | |
| - **Likely AI-Generated**: 60-80% AI probability (Medium confidence) | |
| - **Uncertain**: 40-60% AI probability (Low confidence) | |
| - **Likely Human-Written**: 20-40% AI probability (Medium confidence) | |
| - **Human-Written**: <20% AI probability (High confidence) | |
| ## Usage Tips | |
| - Provide at least 100 words for optimal accuracy | |
| - Longer texts generally yield more reliable results | |
| - The detector works best with English text | |
| - Results are probabilistic - use them as guidance, not absolute truth | |
| ## Technical Stack | |
| - **Gradio**: Interactive web interface | |
| - **Transformers**: Hugging Face transformer models | |
| - **PyTorch**: Deep learning backend | |
| - **SciPy/NumPy**: Statistical analysis | |
| ## Limitations | |
| - Best performance with English text | |
| - Requires sufficient text length (minimum 50 characters, optimal 100+ words) | |
| - Detection accuracy may vary with highly technical or specialized content | |
| - Should be used as a tool for guidance, not definitive judgment | |
| ## Deployment | |
| This app is designed to run on Hugging Face Spaces. Simply upload the files to your Space and it will automatically deploy. | |
| ## Model Credit | |
| This detector uses the `Hello-SimpleAI/chatgpt-detector-roberta` model from Hugging Face, combined with custom statistical analysis methods. | |
| --- | |
| **Note**: AI detection is a rapidly evolving field. No detector is 100% accurate, and results should be interpreted with appropriate context and judgment. |