File size: 3,077 Bytes
67550e0
 
 
 
 
 
 
 
 
 
 
 
f659ec0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
title: AIDetector
emoji: πŸ“‰
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.45.0
app_file: app.py
pinned: false
license: mit
---

# Advanced AI Text Detector πŸ”

An advanced AI text detection system that identifies AI-generated content, particularly from ChatGPT and similar language models.

## Features

### πŸ€– Dual Detection Methods
- **Transformer-based Detection**: Uses fine-tuned RoBERTa model specifically trained on ChatGPT detection
- **Statistical Analysis**: Employs multiple linguistic metrics for robust detection

### πŸ“Š Comprehensive Analysis Metrics
- **Burstiness Analysis**: Measures sentence length variation (human text is typically more "bursty")
- **Vocabulary Diversity**: Analyzes lexical richness and word variety
- **Repetition Detection**: Identifies repeated phrases and patterns
- **Perplexity Scoring**: Evaluates text predictability
- **Punctuation Patterns**: Analyzes punctuation consistency

### 🎯 High Accuracy Features
- Multi-method ensemble approach for improved accuracy
- Confidence scoring system
- Detailed explanations for each detection
- Visual probability distribution

## How It Works

1. **Input Processing**: The text is tokenized and prepared for analysis
2. **Transformer Analysis**: If available, the RoBERTa model provides initial AI probability
3. **Statistical Analysis**: Multiple linguistic features are extracted and analyzed
4. **Score Combination**: Results are weighted and combined for final prediction
5. **Result Generation**: Detailed report with classification, confidence, and explanations

## Detection Categories

- **AI-Generated**: >80% AI probability (High confidence)
- **Likely AI-Generated**: 60-80% AI probability (Medium confidence)
- **Uncertain**: 40-60% AI probability (Low confidence)
- **Likely Human-Written**: 20-40% AI probability (Medium confidence)
- **Human-Written**: <20% AI probability (High confidence)

## Usage Tips

- Provide at least 100 words for optimal accuracy
- Longer texts generally yield more reliable results
- The detector works best with English text
- Results are probabilistic - use them as guidance, not absolute truth

## Technical Stack

- **Gradio**: Interactive web interface
- **Transformers**: Hugging Face transformer models
- **PyTorch**: Deep learning backend
- **SciPy/NumPy**: Statistical analysis

## Limitations

- Best performance with English text
- Requires sufficient text length (minimum 50 characters, optimal 100+ words)
- Detection accuracy may vary with highly technical or specialized content
- Should be used as a tool for guidance, not definitive judgment

## Deployment

This app is designed to run on Hugging Face Spaces. Simply upload the files to your Space and it will automatically deploy.

## Model Credit

This detector uses the `Hello-SimpleAI/chatgpt-detector-roberta` model from Hugging Face, combined with custom statistical analysis methods.

---

**Note**: AI detection is a rapidly evolving field. No detector is 100% accurate, and results should be interpreted with appropriate context and judgment.