Spaces:

Jay-Rajput
/

AIDetector

Sleeping

App Files Files Community

Jay-Rajput commited on Sep 14

Commit

f659ec0

1 Parent(s): 23c23e6

ai detector new

Browse files

Files changed (3) hide show

README.md +71 -1
app.py +282 -1012
requirements.txt +6 -8

README.md CHANGED Viewed

@@ -10,4 +10,74 @@ pinned: false
 license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: mit
 ---
+# Advanced AI Text Detector 🔍
+An advanced AI text detection system that identifies AI-generated content, particularly from ChatGPT and similar language models.
+## Features
+### 🤖 Dual Detection Methods
+- **Transformer-based Detection**: Uses fine-tuned RoBERTa model specifically trained on ChatGPT detection
+- **Statistical Analysis**: Employs multiple linguistic metrics for robust detection
+### 📊 Comprehensive Analysis Metrics
+- **Burstiness Analysis**: Measures sentence length variation (human text is typically more "bursty")
+- **Vocabulary Diversity**: Analyzes lexical richness and word variety
+- **Repetition Detection**: Identifies repeated phrases and patterns
+- **Perplexity Scoring**: Evaluates text predictability
+- **Punctuation Patterns**: Analyzes punctuation consistency
+### 🎯 High Accuracy Features
+- Multi-method ensemble approach for improved accuracy
+- Confidence scoring system
+- Detailed explanations for each detection
+- Visual probability distribution
+## How It Works
+1. **Input Processing**: The text is tokenized and prepared for analysis
+2. **Transformer Analysis**: If available, the RoBERTa model provides initial AI probability
+3. **Statistical Analysis**: Multiple linguistic features are extracted and analyzed
+4. **Score Combination**: Results are weighted and combined for final prediction
+5. **Result Generation**: Detailed report with classification, confidence, and explanations
+## Detection Categories
+- **AI-Generated**: >80% AI probability (High confidence)
+- **Likely AI-Generated**: 60-80% AI probability (Medium confidence)
+- **Uncertain**: 40-60% AI probability (Low confidence)
+- **Likely Human-Written**: 20-40% AI probability (Medium confidence)
+- **Human-Written**: <20% AI probability (High confidence)
+## Usage Tips
+- Provide at least 100 words for optimal accuracy
+- Longer texts generally yield more reliable results
+- The detector works best with English text
+- Results are probabilistic - use them as guidance, not absolute truth
+## Technical Stack
+- **Gradio**: Interactive web interface
+- **Transformers**: Hugging Face transformer models
+- **PyTorch**: Deep learning backend
+- **SciPy/NumPy**: Statistical analysis
+## Limitations
+- Best performance with English text
+- Requires sufficient text length (minimum 50 characters, optimal 100+ words)
+- Detection accuracy may vary with highly technical or specialized content
+- Should be used as a tool for guidance, not definitive judgment
+## Deployment
+This app is designed to run on Hugging Face Spaces. Simply upload the files to your Space and it will automatically deploy.
+## Model Credit
+This detector uses the `Hello-SimpleAI/chatgpt-detector-roberta` model from Hugging Face, combined with custom statistical analysis methods.
+---
+**Note**: AI detection is a rapidly evolving field. No detector is 100% accurate, and results should be interpreted with appropriate context and judgment.

app.py CHANGED Viewed

@@ -1,1031 +1,301 @@
-"""
-Enhanced AI Text Detector - Superior Pattern Recognition
-Significantly improved ChatGPT detection with advanced linguistic analysis
-Addresses missed patterns in formal, academic, and corporate writing styles
-"""
 import gradio as gr
 import torch
 import numpy as np
 import re
-import time
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-from typing import Dict, List, Tuple
-import statistics
-import string
 from collections import Counter
-import json
-import plotly.graph_objects as go
-import plotly.express as px
-class EnhancedAIDetector:
-    """
-    Enhanced AI text detector with superior pattern recognition
-    Specifically improved for ChatGPT's formal, academic, and corporate writing styles
-    """
     def __init__(self):
-        self.primary_tokenizer = None
-        self.primary_model = None
-        self.backup_models = []
-        self.load_models()
-    def load_models(self):
-        """Load multiple detection models for ensemble approach"""
         try:
-            # Primary model - RoBERTa based
-            primary_model_name = "roberta-base-openai-detector"
-            self.primary_tokenizer = AutoTokenizer.from_pretrained(primary_model_name)
-            self.primary_model = AutoModelForSequenceClassification.from_pretrained(primary_model_name)
-            # Try to load additional models if available
-            alternative_models = [
-                "Hello-SimpleAI/chatgpt-detector-roberta",
-                "andreas122001/roberta-mixed-detector",
-                "TrustSafeAI/GUARD-1B"
-            ]
-            for model_name in alternative_models:
-                try:
-                    tokenizer = AutoTokenizer.from_pretrained(model_name)
-                    model = AutoModelForSequenceClassification.from_pretrained(model_name)
-                    self.backup_models.append((tokenizer, model, model_name))
-                    print(f"✓ Loaded additional model: {model_name}")
-                except:
-                    continue
-            print(f"✓ Models loaded successfully - {1 + len(self.backup_models)} total models")
-        except Exception as e:
-            print(f"⚠️ Model loading failed: {e}")
-            self.primary_tokenizer = None
-            self.primary_model = None
-    def extract_enhanced_ai_features(self, text: str) -> Dict[str, float]:
-        """Extract enhanced features with better ChatGPT pattern recognition"""
-        if len(text.strip()) < 10:
-            return {}
-        features = {}
-        sentences = re.split(r'[.!?]+', text)
-        sentences = [s.strip() for s in sentences if s.strip()]
         words = text.split()
-        if not sentences or not words:
-            return {}
-        # ENHANCED: Academic/Corporate Language Patterns (MAJOR IMPROVEMENT)
-        academic_phrases = [
-            "demonstrates", "is defined by", "functions as", "serves as", "operates as",
-            "characterized by", "exemplifies", "represents", "constitutes", "embodies",
-            "encompasses", "facilitates", "enables", "promotes", "establishes",
-            "technological object", "systematic approach", "comprehensive analysis",
-            "strategic implementation", "optimal solution", "integrated system"
-        ]
-        academic_count = sum(1 for phrase in academic_phrases if phrase in text.lower())
-        features['academic_language'] = min(academic_count / len(sentences) * 3, 1.0)
-        # ENHANCED: Corporate Buzzwords (MAJOR IMPROVEMENT)
-        corporate_buzzwords = [
-            "ecosystem", "framework", "scalability", "optimization", "integration",
-            "synergy", "leverage", "streamline", "enhance", "maximize", "utilize",
-            "implement", "facilitate", "comprehensive", "strategic", "innovative",
-            "efficient", "effective", "robust", "seamless", "dynamic", "paradigm",
-            "methodology", "infrastructure", "architecture", "deployment"
-        ]
-        buzzword_count = sum(1 for word in words if word.lower() in corporate_buzzwords)
-        features['corporate_buzzwords'] = min(buzzword_count / len(words) * 20, 1.0)
-        # ENHANCED: Technical Jargon Overuse (NEW)
-        technical_terms = [
-            "iterative", "predictable", "standardized", "regulated", "uniform",
-            "optimized", "systematic", "consistent", "scalable", "integrated",
-            "automated", "synchronized", "configured", "calibrated", "validated"
-        ]
-        technical_count = sum(1 for word in words if word.lower() in technical_terms)
-        features['technical_jargon'] = min(technical_count / len(words) * 15, 1.0)
-        # ENHANCED: Abstract Conceptualization (NEW)
-        abstract_patterns = [
-            "in this framework", "in this context", "within this paradigm",
-            "from this perspective", "in this regard", "in this manner",
-            "serves as a", "functions as a", "operates as a", "acts as a",
-            "not only.*but also", "both.*and", "either.*or"
-        ]
-        abstract_count = sum(1 for pattern in abstract_patterns if re.search(pattern, text.lower()))
-        features['abstract_conceptualization'] = min(abstract_count / len(sentences) * 2, 1.0)
-        # ENHANCED: Formal Hedging Language (NEW)
-        hedging_patterns = [
-            "not only", "but also", "furthermore", "moreover", "additionally",
-            "consequently", "therefore", "thus", "hence", "accordingly",
-            "in conclusion", "to summarize", "overall", "in summary",
-            "it should be noted", "it is important to", "it is worth noting"
-        ]
-        hedging_count = sum(1 for pattern in hedging_patterns if pattern in text.lower())
-        features['formal_hedging'] = min(hedging_count / len(sentences) * 2, 1.0)
-        # ENHANCED: Objective/Neutral Tone Detection (NEW)
-        subjective_indicators = [
-            "i think", "i believe", "i feel", "in my opinion", "personally",
-            "i love", "i hate", "amazing", "terrible", "awesome", "sucks",
-            "definitely", "probably", "maybe", "might", "could be", "seems like"
-        ]
-        subjective_count = sum(1 for phrase in subjective_indicators if phrase in text.lower())
-        features['objective_tone'] = 1.0 - min(subjective_count / len(sentences), 1.0)
-        # ENHANCED: Systematic Structure Indicators (NEW)
-        structure_words = [
-            "first", "second", "third", "finally", "initially", "subsequently",
-            "furthermore", "moreover", "however", "nevertheless", "in addition",
-            "on the other hand", "in contrast", "similarly", "likewise"
-        ]
-        structure_count = sum(1 for word in text.lower().split() if word in structure_words)
-        features['systematic_structure'] = min(structure_count / len(words) * 10, 1.0)
-        # ENHANCED: Passive Voice Usage (ChatGPT loves passive voice)
-        passive_indicators = [
-            "is defined", "are defined", "is characterized", "are characterized",
-            "is demonstrated", "are demonstrated", "is established", "are established",
-            "is implemented", "are implemented", "is facilitated", "are facilitated",
-            "is regulated", "are regulated", "is standardized", "are standardized"
-        ]
-        passive_count = sum(1 for phrase in passive_indicators if phrase in text.lower())
-        features['passive_voice'] = min(passive_count / len(sentences) * 3, 1.0)
-        # ORIGINAL: Politeness and helpful language patterns (REWEIGHTED)
-        polite_phrases = [
-            "i hope this helps", "i would be happy to", "please let me know",
-            "feel free to", "i would recommend", "you might want to", "you might consider",
-            "it is worth noting", "it is important to", "keep in mind",
-            "i understand", "certainly", "absolutely", "definitely"
-        ]
-        polite_count = sum(1 for phrase in polite_phrases if phrase in text.lower())
-        features['politeness_score'] = min(polite_count / len(sentences), 1.0)
-        # ORIGINAL: Explanation and clarification patterns (REWEIGHTED)
-        explanation_patterns = [
-            'this means', 'in other words', 'specifically', 'for example',
-            'for instance', 'such as', 'including', 'that is',
-            'i.e.', 'e.g.', 'namely', 'particularly'
-        ]
-        explanation_count = sum(1 for phrase in explanation_patterns if phrase in text.lower())
-        features['explanation_score'] = min(explanation_count / len(sentences), 1.0)
-        # ORIGINAL: Lack of personal experiences (ENHANCED)
-        personal_indicators = [
-            'i remember', 'when i was', 'my experience', 'i once', 'i personally',
-            'in my opinion', 'i think', 'i believe', 'i feel', 'my view',
-            'from my perspective', 'i have seen', 'i have noticed', 'i have found',
-            'my friend', 'my family', 'my colleague', 'yesterday', 'last week',
-            'last month', 'last year', 'when i', 'my boss', 'my teacher'
-        ]
-        personal_count = sum(1 for phrase in personal_indicators if phrase in text.lower())
-        features['personal_absence'] = 1.0 - min(personal_count / len(sentences), 1.0)
-        # ENHANCED: Sentence Complexity and Length Consistency
-        if len(sentences) > 1:
-            sentence_lengths = [len(s.split()) for s in sentences]
-            avg_length = np.mean(sentence_lengths)
-            length_variance = np.var(sentence_lengths)
-            # ChatGPT tends to have consistent, moderate-length sentences
-            features['sentence_consistency'] = 1.0 - min(length_variance / max(avg_length, 1), 1.0)
-            features['optimal_length'] = 1.0 if 10 <= avg_length <= 20 else max(0, 1.0 - abs(avg_length - 15) / 15)
-        else:
-            features['sentence_consistency'] = 0.5
-            features['optimal_length'] = 0.5
-        # ENHANCED: Punctuation and Grammar Perfection
-        exclamation_count = text.count('!')
-        question_count = text.count('?')
-        period_count = text.count('.')
-        # ChatGPT rarely uses exclamations or questions in formal text
-        features['punctuation_perfection'] = 1.0 - min((exclamation_count + question_count) / max(period_count, 1), 1.0)
-        # ENHANCED: Vocabulary Sophistication
-        sophisticated_words = [
-            "demonstrates", "facilitates", "encompasses", "constitutes", "exemplifies",
-            "characterizes", "emphasizes", "indicates", "suggests", "implies",
-            "encompasses", "encompasses", "substantial", "significant", "considerable",
-            "comprehensive", "extensive", "thorough", "meticulous", "systematic"
-        ]
-        sophisticated_count = sum(1 for word in words if word.lower() in sophisticated_words)
-        features['vocabulary_sophistication'] = min(sophisticated_count / len(words) * 20, 1.0)
-        return features
-    def calculate_ensemble_ai_probability(self, text: str) -> float:
-        """Use multiple models to calculate AI probability with ensemble approach"""
-        probabilities = []
-        # Primary model prediction
-        if self.primary_model and self.primary_tokenizer:
-            try:
-                inputs = self.primary_tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
-                with torch.no_grad():
-                    outputs = self.primary_model(**inputs)
-                    probs = torch.softmax(outputs.logits, dim=-1)
-                    ai_prob = probs[0][1].item()
-                    probabilities.append(ai_prob * 0.6)  # Primary model gets 60% weight
-            except:
-                probabilities.append(0.5)
-        # Backup models predictions
-        for tokenizer, model, model_name in self.backup_models:
-            try:
-                inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
-                with torch.no_grad():
-                    outputs = model(**inputs)
-                    probs = torch.softmax(outputs.logits, dim=-1)
-                    ai_prob = probs[0][1].item()
-                    probabilities.append(ai_prob * (0.4 / len(self.backup_models)))
-            except:
-                continue
-        # If no models worked, return default
-        if not probabilities:
             return 0.5
-        return sum(probabilities)
-    def classify_text_category(self, text: str) -> Tuple[str, Dict[str, float], float]:
-        """Enhanced classification with superior AI pattern recognition"""
-        if len(text.strip()) < 10:
-            return "Uncertain", {"ai_generated": 0.25, "ai_refined": 0.25, "human_ai_refined": 0.25, "human_written": 0.25}, 0.3
-        # Extract enhanced AI-specific features
-        ai_features = self.extract_enhanced_ai_features(text)
-        # Get ensemble model prediction
-        ensemble_ai_prob = self.calculate_ensemble_ai_probability(text)
-        # ENHANCED SCORING WITH BETTER WEIGHTS FOR CHATGPT PATTERNS
-        scores = {}
-        # AI-generated score (SIGNIFICANTLY ENHANCED)
-        formal_ai_indicators = [
-            ai_features.get('academic_language', 0) * 0.15,      # Academic language is a strong ChatGPT indicator
-            ai_features.get('corporate_buzzwords', 0) * 0.15,    # Corporate buzzwords
-            ai_features.get('technical_jargon', 0) * 0.12,       # Technical jargon overuse
-            ai_features.get('abstract_conceptualization', 0) * 0.10,  # Abstract concepts
-            ai_features.get('formal_hedging', 0) * 0.08,         # Formal hedging language
-            ai_features.get('objective_tone', 0) * 0.12,         # Objective, neutral tone
-            ai_features.get('systematic_structure', 0) * 0.08,   # Systematic presentation
-            ai_features.get('passive_voice', 0) * 0.10,          # Passive voice usage
-            ai_features.get('vocabulary_sophistication', 0) * 0.10  # Sophisticated vocabulary
-        ]
-        traditional_ai_indicators = [
-            ai_features.get('politeness_score', 0) * 0.05,       # Reduced weight
-            ai_features.get('explanation_score', 0) * 0.03,      # Reduced weight
-            ai_features.get('personal_absence', 0) * 0.08,       # Still important
-            ai_features.get('punctuation_perfection', 0) * 0.04  # Reduced weight
-        ]
         ai_score = (
-            ensemble_ai_prob * 0.35 +  # Reduced model weight to make room for features
-            sum(formal_ai_indicators) * 0.45 +  # MAJOR EMPHASIS on formal patterns
-            sum(traditional_ai_indicators) * 0.20   # Traditional patterns
-        )
-        scores['ai_generated'] = min(max(ai_score, 0.0), 1.0)
-        # AI-generated & AI-refined score (ENHANCED)
-        ai_refined_score = (
-            ensemble_ai_prob * 0.3 +
-            ai_features.get('formal_hedging', 0) * 0.2 +
-            ai_features.get('vocabulary_sophistication', 0) * 0.2 +
-            ai_features.get('punctuation_perfection', 0) * 0.15 +
-            ai_features.get('systematic_structure', 0) * 0.15
-        )
-        scores['ai_refined'] = min(max(ai_refined_score, 0.0), 1.0)
-        # Human-written & AI-refined score
-        human_ai_refined_score = (
-            (1.0 - ensemble_ai_prob) * 0.4 +
-            (1.0 - ai_features.get('personal_absence', 0.5)) * 0.2 +
-            ai_features.get('explanation_score', 0) * 0.2 +
-            ai_features.get('systematic_structure', 0) * 0.2
-        )
-        scores['human_ai_refined'] = min(max(human_ai_refined_score, 0.0), 1.0)
-        # Human-written score (ENHANCED TO REDUCE FALSE NEGATIVES)
-        human_written_score = (
-            (1.0 - ensemble_ai_prob) * 0.3 +  # Reduced model influence
-            (1.0 - ai_features.get('academic_language', 0.5)) * 0.15 +  # Penalize academic language
-            (1.0 - ai_features.get('corporate_buzzwords', 0.5)) * 0.15 +  # Penalize buzzwords
-            (1.0 - ai_features.get('objective_tone', 0.5)) * 0.15 +      # Penalize overly objective tone
-            (1.0 - ai_features.get('formal_hedging', 0.5)) * 0.1 +       # Penalize formal hedging
-            (1.0 - ai_features.get('vocabulary_sophistication', 0.5)) * 0.15  # Penalize over-sophistication
         )
-        scores['human_written'] = min(max(human_written_score, 0.0), 1.0)
-        # Normalize scores
-        total_score = sum(scores.values())
-        if total_score > 0:
-            scores = {k: v / total_score for k, v in scores.items()}
-        else:
-            scores = {"ai_generated": 0.25, "ai_refined": 0.25, "human_ai_refined": 0.25, "human_written": 0.25}
-        # Determine primary category
-        primary_category = max(scores, key=scores.get)
-        confidence = scores[primary_category]
-        # Map to readable names
-        category_names = {
-            'ai_generated': 'AI-generated',
-            'ai_refined': 'AI-generated & AI-refined',
-            'human_ai_refined': 'Human-written & AI-refined',
-            'human_written': 'Human-written'
         }
-        return category_names[primary_category], scores, confidence
-    def split_into_sentences(self, text: str) -> List[str]:
-        """Split text into sentences for individual analysis"""
-        sentences = re.split(r'(?<=[.!?])\s+', text.strip())
-        sentences = [s.strip() for s in sentences if len(s.strip()) > 10]
-        return sentences
-    def analyze_sentence_ai_probability(self, sentence: str) -> float:
-        """Analyze individual sentence for AI probability with enhanced features"""
-        if len(sentence.strip()) < 10:
-            return 0.5
-        # Use ensemble approach for sentence-level detection
-        ensemble_prob = self.calculate_ensemble_ai_probability(sentence)
-        # Add enhanced sentence-level features
-        sentence_features = self.extract_enhanced_ai_features(sentence)
-        # Enhanced sentence scoring
-        ai_sentence_score = (
-            ensemble_prob * 0.4 +
-            sentence_features.get('academic_language', 0) * 0.15 +
-            sentence_features.get('corporate_buzzwords', 0) * 0.15 +
-            sentence_features.get('technical_jargon', 0) * 0.1 +
-            sentence_features.get('formal_hedging', 0) * 0.1 +
-            sentence_features.get('objective_tone', 0) * 0.1
-        )
-        return min(max(ai_sentence_score, 0.0), 1.0)
-    def highlight_ai_text(self, text: str, threshold: float = 0.55) -> str:
-        """Highlight sentences with LOWER threshold for better sensitivity"""
-        sentences = self.split_into_sentences(text)
-        if not sentences:
-            return text
-        highlighted_text = text
-        sentence_scores = []
-        # Analyze each sentence
-        for sentence in sentences:
-            ai_prob = self.analyze_sentence_ai_probability(sentence)
-            sentence_scores.append((sentence, ai_prob))
-        # Sort by AI probability
-        sentence_scores.sort(key=lambda x: x[1], reverse=True)
-        # Highlight sentences above threshold (LOWERED THRESHOLD)
-        for sentence, ai_prob in sentence_scores:
-            if ai_prob > threshold:
-                # Use different colors based on confidence
-                if ai_prob > 0.75:
-                    # High confidence - red highlight
-                    highlighted_sentence = f'<mark style="background-color: #ffe6e6; padding: 2px 4px; border-radius: 3px; border-left: 3px solid #dc3545; color: #721c24;">{sentence}</mark>'
-                elif ai_prob > 0.65:
-                    # Medium-high confidence - orange-red highlight
-                    highlighted_sentence = f'<mark style="background-color: #fff0e6; padding: 2px 4px; border-radius: 3px; border-left: 3px solid #fd7e14;">{sentence}</mark>'
-                else:
-                    # Medium confidence - orange highlight
-                    highlighted_sentence = f'<mark style="background-color: #fff3cd; padding: 2px 4px; border-radius: 3px; border-left: 3px solid #ffc107;">{sentence}</mark>'
-                highlighted_text = highlighted_text.replace(sentence, highlighted_sentence)
-        return highlighted_text
-    def get_analysis_json(self, text: str) -> Dict:
-        """Get analysis results in JSON format"""
-        start_time = time.time()
-        if not text or len(text.strip()) < 10:
-            return {
-                "error": "Text must be at least 10 characters long",
-                "ai_percentage": 0,
-                "human_percentage": 0,
-                "ai_likelihood": 0,
-                "category_scores": {
-                    "ai_generated": 0,
-                    "ai_refined": 0,
-                    "human_ai_refined": 0,
-                    "human_written": 0
-                },
-                "primary_category": "uncertain",
-                "confidence": 0,
-                "processing_time_ms": 0,
-                "highlighted_text": text
-            }
         try:
-            primary_category, category_scores, confidence = self.classify_text_category(text)
-            highlighted_text = self.highlight_ai_text(text)
-            ai_percentage = (category_scores['ai_generated'] + category_scores['ai_refined']) * 100
-            human_percentage = (category_scores['human_ai_refined'] + category_scores['human_written']) * 100
-            ai_likelihood = category_scores['ai_generated'] * 100
-            processing_time = (time.time() - start_time) * 1000
-            return {
-                "ai_percentage": round(ai_percentage, 1),
-                "human_percentage": round(human_percentage, 1),
-                "ai_likelihood": round(ai_likelihood, 1),
-                "category_scores": {
-                    "ai_generated": round(category_scores['ai_generated'] * 100, 1),
-                    "ai_refined": round(category_scores['ai_refined'] * 100, 1),
-                    "human_ai_refined": round(category_scores['human_ai_refined'] * 100, 1),
-                    "human_written": round(category_scores['human_written'] * 100, 1)
-                },
-                "primary_category": primary_category.lower().replace(' ', '_').replace('-', '_'),
-                "confidence": round(confidence * 100, 1),
-                "processing_time_ms": round(processing_time, 1),
-                "highlighted_text": highlighted_text
-            }
         except Exception as e:
             return {
-                "error": str(e),
-                "ai_percentage": 0,
-                "human_percentage": 0,
-                "ai_likelihood": 0,
-                "category_scores": {
-                    "ai_generated": 0,
-                    "ai_refined": 0,
-                    "human_ai_refined": 0,
-                    "human_written": 0
-                },
-                "primary_category": "error",
-                "confidence": 0,
-                "processing_time_ms": 0,
-                "highlighted_text": text
             }
-# Initialize the enhanced detector
-detector = EnhancedAIDetector()
-def create_bar_chart(ai_percentage, human_percentage):
-    """Create vertical bar chart showing AI vs Human percentages"""
-    fig = go.Figure(data=[
-        go.Bar(
-            x=['AI', 'Human'],
-            y=[ai_percentage, human_percentage],
-            marker=dict(
-                color=['#FF6B6B', '#4ECDC4'],
-                line=dict(color='rgba(0,0,0,0.3)', width=2)
-            ),
-            text=[f'{ai_percentage:.0f}%', f'{human_percentage:.0f}%'],
-            textposition='auto',
-            textfont=dict(size=14, color='white', family='Arial Black'),
-            hovertemplate='<b>%{x}</b><br>%{y:.1f}%<extra></extra>'
-        )
-    ])
-    fig.update_layout(
-        title=dict(
-            text='AI vs Human Content Distribution',
-            x=0.5,
-            font=dict(size=16, color='#2c3e50', family='Arial')
-        ),
-        xaxis=dict(
-            title=dict(
-                text='Content Type',
-                font=dict(size=14, color='#34495e')
-            ),
-            tickfont=dict(size=12, color='#34495e'),
-            showgrid=False,
-            zeroline=False
-        ),
-        yaxis=dict(
-            title=dict(
-                text='Percentage (%)',
-                font=dict(size=14, color='#34495e')
-            ),
-            tickfont=dict(size=12, color='#34495e'),
-            range=[0, 100],
-            showgrid=True,
-            gridwidth=1,
-            gridcolor='rgba(0,0,0,0.1)'
-        ),
-        plot_bgcolor='rgba(0,0,0,0)',
-        paper_bgcolor='rgba(0,0,0,0)',
-        showlegend=False,
-        height=400,
-        margin=dict(t=60, b=50, l=50, r=50)
-    )
-    return fig
-def analyze_text_enhanced(text):
-    """Enhanced analysis function with superior pattern recognition"""
-    if not text or len(text.strip()) < 10:
-        return (
-            "⚠️ Please provide at least 10 characters of text for accurate AI detection.",
-            text,
-            None,
-            "",
-            f"Text length: {len(text.strip())} characters"
-        )
-    start_time = time.time()
-    try:
-        # Get enhanced analysis results
-        primary_category, category_scores, confidence = detector.classify_text_category(text)
-        # Get highlighted text with enhanced sensitivity
-        highlighted_text = detector.highlight_ai_text(text)
-        # Calculate percentages
-        ai_percentage = (category_scores['ai_generated'] + category_scores['ai_refined']) * 100
-        human_percentage = (category_scores['human_ai_refined'] + category_scores['human_written']) * 100
-        ai_likelihood = category_scores['ai_generated'] * 100
-        processing_time = (time.time() - start_time) * 1000
-        # Enhanced summary
-        summary_html = f"""
-        <div style="text-align: center; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
-                    color: white; padding: 30px; border-radius: 15px; margin: 20px 0; box-shadow: 0 8px 25px rgba(0,0,0,0.15);">
-            <div style="font-size: 48px; font-weight: bold; margin-bottom: 10px; text-shadow: 2px 2px 4px rgba(0,0,0,0.3);">
-                {ai_percentage:.0f}%
-            </div>
-            <div style="font-size: 18px; line-height: 1.4; margin-bottom: 10px;">
-                of this text is likely <strong>AI-generated or AI-refined</strong>
-            </div>
-            <div style="font-size: 16px; line-height: 1.4; margin-bottom: 5px; background: rgba(255,255,255,0.2); padding: 8px; border-radius: 5px;">
-                🎯 <strong>AI Content Likelihood: {ai_likelihood:.0f}%</strong>
-            </div>
-            <div style="font-size: 14px; opacity: 0.9; font-style: italic;">
-                (Enhanced detection with superior pattern recognition for formal AI writing)
-            </div>
-        </div>
-        """
-        # Create bar chart
-        bar_chart = create_bar_chart(ai_percentage, human_percentage)
-        # Enhanced metrics with confidence indicators
-        confidence_color = "#28a745" if confidence > 0.7 else "#ffc107" if confidence > 0.5 else "#dc3545"
-        confidence_text = "High" if confidence > 0.7 else "Medium" if confidence > 0.5 else "Low"
-        metrics_html = f"""
-        <div style="margin: 20px 0; padding: 20px; background: #f8f9fa; border-radius: 12px; border-left: 5px solid #667eea;">
-            <h4 style="color: #2c3e50; margin-bottom: 15px; font-size: 16px;">📊 Enhanced Detection Results</h4>
-            <div style="background: #fff; padding: 15px; border-radius: 8px; margin-bottom: 15px; border: 2px solid #667eea;">
-                <div style="text-align: center;">
-                    <h5 style="color: #667eea; margin-bottom: 10px;">🤖 AI Detection Score</h5>
-                    <div style="font-size: 32px; font-weight: bold; color: #667eea;">{ai_likelihood:.0f}%</div>
-                    <div style="font-size: 14px; color: #6c757d; margin-top: 5px;">
-                        Likelihood this text was generated by AI models
-                    </div>
-                    <div style="margin-top: 8px; padding: 4px 8px; background: {confidence_color}; color: white; border-radius: 4px; font-size: 12px; display: inline-block;">
-                        {confidence_text} Confidence ({confidence*100:.0f}%)
-                    </div>
-                </div>
-            </div>
-            <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 15px; margin-bottom: 20px;">
-                <div style="background: white; padding: 15px; border-radius: 8px; border: 1px solid #e9ecef;">
-                    <div style="display: flex; align-items: center; margin-bottom: 8px;">
-                        <span style="font-size: 20px; margin-right: 8px;">🤖</span>
-                        <span style="font-weight: 600; color: #2c3e50;">AI-generated</span>
-                        <span title="Text likely generated by AI models with enhanced pattern detection." style="margin-left: 5px; cursor: help; color: #6c757d;">ⓘ</span>
-                    </div>
-                    <div style="font-size: 24px; font-weight: bold; color: #FF6B6B;">
-                        {category_scores['ai_generated']*100:.0f}%
-                    </div>
-                </div>
-                <div style="background: white; padding: 15px; border-radius: 8px; border: 1px solid #e9ecef;">
-                    <div style="display: flex; align-items: center; margin-bottom: 8px;">
-                        <span style="font-size: 20px; margin-right: 8px;">🛠️</span>
-                        <span style="font-weight: 600; color: #2c3e50;">AI-generated & AI-refined</span>
-                        <span title="AI text that has been further processed or polished using AI tools." style="margin-left: 5px; cursor: help; color: #6c757d;">ⓘ</span>
-                    </div>
-                    <div style="font-size: 24px; font-weight: bold; color: #FFA07A;">
-                        {category_scores['ai_refined']*100:.0f}%
-                    </div>
-                </div>
-                <div style="background: white; padding: 15px; border-radius: 8px; border: 1px solid #e9ecef;">
-                    <div style="display: flex; align-items: center; margin-bottom: 8px;">
-                        <span style="font-size: 20px; margin-right: 8px;">✍️</span>
-                        <span style="font-weight: 600; color: #2c3e50;">Human-written & AI-refined</span>
-                        <span title="Human text that has been enhanced or edited using AI tools." style="margin-left: 5px; cursor: help; color: #6c757d;">ⓘ</span>
-                    </div>
-                    <div style="font-size: 24px; font-weight: bold; color: #98D8C8;">
-                        {category_scores['human_ai_refined']*100:.0f}%
-                    </div>
-                </div>
-                <div style="background: white; padding: 15px; border-radius: 8px; border: 1px solid #e9ecef;">
-                    <div style="display: flex; align-items: center; margin-bottom: 8px;">
-                        <span style="font-size: 20px; margin-right: 8px;">👤</span>
-                        <span style="font-weight: 600; color: #2c3e50;">Human-written</span>
-                        <span title="Text written entirely by humans without AI assistance." style="margin-left: 5px; cursor: help; color: #6c757d;">ⓘ</span>
-                    </div>
-                    <div style="font-size: 24px; font-weight: bold; color: #4ECDC4;">
-                        {category_scores['human_written']*100:.0f}%
-                    </div>
-                </div>
-            </div>
-            <div style="text-align: center; padding: 10px; background: white; border-radius: 8px; border: 1px solid #e9ecef;">
-                <div style="font-size: 14px; color: #6c757d; margin-bottom: 5px;">Primary Classification</div>
-                <div style="font-size: 18px; font-weight: bold; color: #2c3e50;">{primary_category}</div>
-                <div style="font-size: 14px; color: #6c757d;">Processing: {processing_time:.0f}ms | Enhanced Pattern Recognition</div>
-            </div>
-        </div>
-        """
-        return (
-            summary_html,
-            highlighted_text,
-            bar_chart,
-            metrics_html,
-            f"Text length: {len(text)} characters, {len(text.split())} words"
-        )
-    except Exception as e:
-        return (
-            f"❌ Error during enhanced AI analysis: {str(e)}",
-            text,
-            None,
-            "",
-            "Error"
-        )
-def batch_analyze_enhanced(file):
-    """Enhanced batch analysis"""
-    if file is None:
-        return "Please upload a text file."
-    try:
-        content = file.read().decode('utf-8')
-        texts = [line.strip() for line in content.split('\n') if line.strip() and len(line.strip()) >= 10]
-        if not texts:
-            return "No valid texts found in the uploaded file (each line should have at least 10 characters)."
-        results = []
-        category_counts = {'AI-generated': 0, 'AI-generated & AI-refined': 0, 'Human-written & AI-refined': 0, 'Human-written': 0}
-        total_ai_percentage = 0
-        total_ai_likelihood = 0
-        for i, text in enumerate(texts[:15]):
-            primary_category, category_scores, confidence = detector.classify_text_category(text)
-            category_counts[primary_category] += 1
-            ai_percentage = (category_scores['ai_generated'] + category_scores['ai_refined']) * 100
-            ai_likelihood = category_scores['ai_generated'] * 100
-            total_ai_percentage += ai_percentage
-            total_ai_likelihood += ai_likelihood
-            results.append(f"""
-**Text {i+1}:** {text[:80]}{'...' if len(text) > 80 else ''}
-**Result:** {primary_category} ({confidence:.1%} confidence)
-**AI Likelihood:** {ai_likelihood:.0f}% | **AI Content:** {ai_percentage:.0f}% | **Breakdown:** AI-gen: {category_scores['ai_generated']:.0%}, AI-refined: {category_scores['ai_refined']:.0%}, Human+AI: {category_scores['human_ai_refined']:.0%}, Human: {category_scores['human_written']:.0%}
-            """)
-        avg_ai_percentage = total_ai_percentage / len(results) if results else 0
-        avg_ai_likelihood = total_ai_likelihood / len(results) if results else 0
-        summary = f"""
-## 📊 Enhanced AI Detection Batch Analysis
-**Total texts analyzed:** {len(results)}
-**Average AI likelihood:** {avg_ai_likelihood:.1f}%
-**Average AI content:** {avg_ai_percentage:.1f}%
-### Category Distribution:
-- **AI-generated:** {category_counts['AI-generated']} texts ({category_counts['AI-generated']/len(results)*100:.0f}%)
-- **AI-generated & AI-refined:** {category_counts['AI-generated & AI-refined']} texts ({category_counts['AI-generated & AI-refined']/len(results)*100:.0f}%)
-- **Human-written & AI-refined:** {category_counts['Human-written & AI-refined']} texts ({category_counts['Human-written & AI-refined']/len(results)*100:.0f}%)
-- **Human-written:** {category_counts['Human-written']} texts ({category_counts['Human-written']/len(results)*100:.0f}%)
----
-### Individual Results:
-        """
-        return summary + "\n".join(results)
-    except Exception as e:
-        return f"Error processing file: {str(e)}"
-def create_enhanced_interface():
-    """Create enhanced Gradio interface with superior detection"""
-    custom_css = """
-    .gradio-container {
-        font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
-        max-width: 1400px;
-        margin: 0 auto;
-    }
-    .gr-button-primary {
-        background: linear-gradient(45deg, #667eea 0%, #764ba2 100%);
-        border: none;
-        border-radius: 8px;
-        font-weight: 600;
-        padding: 12px 24px;
-    }
-    .gr-button-primary:hover {
-        transform: translateY(-2px);
-        box-shadow: 0 8px 25px rgba(102, 126, 234, 0.3);
-    }
-    .highlighted-text {
-        line-height: 1.6;
-        padding: 15px;
-        background: #f8f9fa;
-        border-radius: 8px;
-        border: 1px solid #e9ecef;
-    }
-    mark {
-        background-color: #ffe6e6 !important;
-        padding: 2px 4px !important;
-        border-radius: 3px !important;
-        border-left: 3px solid #dc3545 !important;
-    }
-    """
-    with gr.Blocks(css=custom_css, title="Enhanced AI Text Detector", theme=gr.themes.Soft()) as interface:
-        gr.HTML("""
-        <div style="text-align: center; padding: 25px; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
-                    color: white; border-radius: 15px; margin-bottom: 25px; box-shadow: 0 10px 30px rgba(0,0,0,0.2);">
-            <h1 style="margin-bottom: 10px; font-size: 2.2em; text-shadow: 2px 2px 4px rgba(0,0,0,0.3);">🔍 Enhanced AI Text Detector</h1>
-            <p style="font-size: 1.1em; margin: 0; opacity: 0.95;">
-                Superior pattern recognition for formal, academic, and corporate AI writing
-            </p>
-            <p style="font-size: 0.9em; margin-top: 8px; opacity: 0.8;">
-                Enhanced detection with 30+ linguistic features and advanced ensemble models
-            </p>
-        </div>
-        """)
-        with gr.Tabs() as tabs:
-            # Single text analysis tab
-            with gr.Tab("🔍 Enhanced AI Detection", elem_id="enhanced-analysis"):
-                with gr.Row():
-                    with gr.Column(scale=1):
-                        text_input = gr.Textbox(
-                            label="📝 Enter text to analyze with enhanced AI detection",
-                            placeholder="Paste your text here (enhanced detection works best with 20+ words)...",
-                            lines=10,
-                            max_lines=20,
-                            show_label=True
-                        )
-                        analyze_btn = gr.Button(
-                            "🔍 Analyze with Enhanced Detection",
-                            variant="primary",
-                            size="lg"
-                        )
-                        text_info = gr.Textbox(
-                            label="📊 Text Information",
-                            interactive=False,
-                            show_label=True
-                        )
-                    with gr.Column(scale=1):
-                        # Enhanced results
-                        summary_result = gr.HTML(
-                            label="📊 Enhanced Detection Results",
-                            value="<div style='text-align: center; padding: 20px; color: #6c757d;'>Results will appear here after enhanced analysis...</div>"
-                        )
-                        # Bar Chart
-                        bar_chart = gr.Plot(
-                            label="📈 AI vs Human Distribution",
-                            show_label=True
-                        )
-                        # Enhanced Metrics
-                        detailed_metrics = gr.HTML(
-                            label="📋 Enhanced Detection Metrics",
-                            value=""
-                        )
-                # Enhanced Highlighted Text Section
-                gr.HTML("<hr style='margin: 20px 0;'><h3>🎯 Enhanced Pattern Analysis with Highlighting</h3>")
-                gr.HTML("""
-                <div style="background: #e8f4fd; padding: 15px; border-radius: 8px; margin-bottom: 15px; border-left: 4px solid #2196F3;">
-                    <p style="margin: 0; color: #1565C0; font-size: 14px;">
-                        <strong>🎯 Enhanced Pattern Detection:</strong> Now detects formal, academic, and corporate AI writing patterns.
-                        <span style="background-color: #ffe6e6; padding: 2px 4px; border-radius: 3px; border-left: 3px solid #dc3545;">Very high confidence (75%+)</span>,
-                        <span style="background-color: #fff0e6; padding: 2px 4px; border-radius: 3px; border-left: 3px solid #fd7e14;">high confidence (65-75%)</span>,
-                        <span style="background-color: #fff3cd; padding: 2px 4px; border-radius: 3px; border-left: 3px solid #ffc107;">medium confidence (55-65%)</span> highlighting.
-                    </p>
-                </div>
-                """)
-                highlighted_text_display = gr.HTML(
-                    label="📝 Text with Enhanced AI Pattern Highlights",
-                    value="<div style='padding: 15px; background: #f8f9fa; border-radius: 8px; border: 1px solid #e9ecef; color: #6c757d;'>Enhanced highlighted text with AI patterns will appear here after analysis...</div>"
-                )
-                # Enhanced Understanding Section
-                with gr.Accordion("🧠 Understanding Enhanced AI Detection", open=False):
-                    gr.HTML("""
-                    <div style="padding: 20px; line-height: 1.6;">
-                        <h4 style="color: #2c3e50; margin-bottom: 15px;">🎯 Enhanced Detection Capabilities</h4>
-                        <p><strong>This enhanced detector now identifies formal, academic, and corporate AI writing patterns</strong>
-                        that were previously missed, providing significantly improved accuracy for professional AI-generated text.</p>
-                        <h5 style="color: #34495e; margin-top: 20px; margin-bottom: 10px;">🆕 New Enhanced Features:</h5>
-                        <ul style="margin-left: 20px;">
-                            <li><strong>📚 Academic Language Detection:</strong> "demonstrates", "is defined by", "constitutes", "encompasses"</li>
-                            <li><strong>🏢 Corporate Buzzword Analysis:</strong> "ecosystem", "framework", "scalability", "optimization", "synergy"</li>
-                            <li><strong>🔧 Technical Jargon Recognition:</strong> "iterative", "standardized", "systematic", "optimized"</li>
-                            <li><strong>🎭 Abstract Conceptualization:</strong> "In this framework", "serves as a", "functions as a"</li>
-                            <li><strong>📝 Formal Hedging Language:</strong> "not only... but also", "furthermore", "consequently"</li>
-                            <li><strong>⚖️ Objective Tone Analysis:</strong> Detects overly neutral, impersonal writing</li>
-                            <li><strong>🎯 Passive Voice Detection:</strong> "is defined", "are characterized", "is demonstrated"</li>
-                            <li><strong>📊 Vocabulary Sophistication:</strong> Identifies unnecessarily complex word choices</li>
-                        </ul>
-                        <h5 style="color: #34495e; margin-top: 20px; margin-bottom: 10px;">🎨 Enhanced Highlighting System:</h5>
-                        <ul style="margin-left: 20px;">
-                            <li><strong>🔴 Red highlighting (75%+ confidence):</strong> Very high likelihood of AI generation</li>
-                            <li><strong>🟠 Orange-red highlighting (65-75% confidence):</strong> High likelihood with formal patterns</li>
-                            <li><strong>🟡 Orange highlighting (55-65% confidence):</strong> Medium confidence with AI patterns</li>
-                            <li><strong>🎯 Lower threshold (55%):</strong> More sensitive detection for comprehensive analysis</li>
-                        </ul>
-                        <h5 style="color: #34495e; margin-top: 20px; margin-bottom: 10px;">⚡ Enhanced Accuracy:</h5>
-                        <ul style="margin-left: 20px;">
-                            <li><strong>🎯 Formal AI Text:</strong> 40% improvement in detecting academic/corporate AI writing</li>
-                            <li><strong>📈 Pattern Recognition:</strong> 30+ linguistic features analyzed (vs 20 previously)</li>
-                            <li><strong>🔍 Sentence Analysis:</strong> Enhanced sentence-level pattern detection</li>
-                            <li><strong>⚖️ Weighted Scoring:</strong> Optimized weights for formal AI writing patterns</li>
-                            <li><strong>📊 False Negative Reduction:</strong> Significantly fewer missed AI texts</li>
-                        </ul>
-                        <div style="background: #d4edda; border: 1px solid #c3e6cb; border-radius: 8px; padding: 15px; margin-top: 20px;">
-                            <h5 style="color: #155724; margin-bottom: 10px;">✅ Enhanced Performance:</h5>
-                            <p style="margin: 0; color: #155724;">
-                                The enhanced detector now catches formal AI writing that appeared "too professional" for previous versions.
-                                It specifically targets academic, corporate, and technical writing styles commonly used by modern AI models.
-                                <strong>Test case: The iPhone example now properly detects as AI-generated.</strong>
-                            </p>
-                        </div>
-                    </div>
-                    """)
-            # Batch analysis tab
-            with gr.Tab("📄 Enhanced Batch Analysis", elem_id="batch-enhanced-analysis"):
-                gr.HTML("""
-                <div style="background: #e8f4fd; padding: 20px; border-radius: 12px; border-left: 5px solid #2196F3; margin-bottom: 20px;">
-                    <h4 style="color: #1565C0; margin-bottom: 15px;">📋 Enhanced Batch Analysis</h4>
-                    <ul style="color: #1976D2; line-height: 1.6;">
-                        <li>Upload a <strong>.txt</strong> file with one text sample per line</li>
-                        <li>Enhanced detection works best with texts of 20+ words each</li>
-                        <li>Maximum 15 texts processed for optimal performance</li>
-                        <li>Now includes enhanced formal and academic AI pattern detection</li>
-                        <li>Significantly improved accuracy for professional AI-generated content</li>
-                    </ul>
-                </div>
-                """)
-                file_input = gr.File(
-                    label="📁 Upload text file (.txt)",
-                    file_types=[".txt"],
-                    type="binary"
-                )
-                batch_analyze_btn = gr.Button("🔍 Enhanced Batch Analysis", variant="primary", size="lg")
-                batch_results = gr.Markdown(label="📊 Enhanced Detection Results")
-            # About tab
-            with gr.Tab("ℹ️ About Enhanced Detection", elem_id="about-tab"):
-                gr.Markdown("""
-                # 🔍 Enhanced AI Text Detector
-                ## 🚀 Superior Pattern Recognition Technology
-                This **enhanced version** specifically addresses formal, academic, and corporate AI writing patterns
-                that were previously missed by standard detection methods.
-                ### 🎯 Enhanced Detection Capabilities
-                **New Pattern Recognition:**
-                1. **📚 Academic Language**: Formal academic phrases and structures
-                2. **🏢 Corporate Buzzwords**: Business and technical terminology overuse
-                3. **🔧 Technical Jargon**: Unnecessary technical complexity
-                4. **🎭 Abstract Concepts**: Over-conceptualization of simple topics
-                5. **📝 Formal Hedging**: Academic writing connectors and transitions
-                6. **⚖️ Objective Tone**: Overly neutral and impersonal writing
-                7. **🎯 Passive Voice**: Systematic use of passive constructions
-                8. **📊 Vocabulary**: Unnecessarily sophisticated word choices
-                ### 📈 Performance Improvements
-                **Compared to previous version:**
-                - **+40% better** detection of formal AI writing
-                - **+35% improvement** on academic/corporate AI text
-                - **+50% fewer** false negatives on professional AI content
-                - **+25% better** overall accuracy across all text types
-                ### 🔬 Enhanced Methodology
-                **Advanced Feature Analysis:**
-                - **30+ linguistic patterns** (vs 20 in standard version)
-                - **Weighted scoring** optimized for formal AI writing
-                - **Enhanced sentence analysis** with formal pattern detection
-                - **Improved thresholds** for better sensitivity
-                - **Ensemble validation** with multiple specialized models
-                ### 📊 Technical Specifications
-                - **Model Architecture**: Enhanced ensemble with formal pattern weights
-                - **Feature Count**: 30+ linguistic and stylistic features
-                - **Processing Speed**: <2 seconds for most texts
-                - **Optimal Length**: 20+ words for enhanced accuracy
-                - **Highlighting Threshold**: Lowered to 55% for better sensitivity
-                ### ⚡ What Makes This Enhanced
-                **Specifically targets AI writing that:**
-                - Uses formal academic language unnecessarily
-                - Employs corporate buzzwords and jargon
-                - Sounds like textbook or corporate documentation
-                - Lacks personal voice or subjective opinions
-                - Uses systematic, mechanical presentation styles
-                - Employs passive voice and abstract conceptualization
-                ### 🎯 Test Case Performance
-                **Example improvement:**
-                ```
-                Previous version: iPhone text → 43% AI (MISSED)
-                Enhanced version: iPhone text → 85%+ AI (DETECTED)
-                ```
-                The enhanced detector successfully identifies formal AI writing patterns
-                that appear professional but lack human authenticity.
-                ---
-                **Version**: 5.0.0 | **Updated**: September 2025 | **Status**: Enhanced Pattern Recognition
-                """)
-        # Event handlers
-        analyze_btn.click(
-            fn=analyze_text_enhanced,
-            inputs=[text_input],
-            outputs=[summary_result, highlighted_text_display, bar_chart, detailed_metrics, text_info]
-        )
-        batch_analyze_btn.click(
-            fn=batch_analyze_enhanced,
-            inputs=[file_input],
-            outputs=[batch_results]
-        )
-        # Test examples including the problematic iPhone text
-        gr.Examples(
-            examples=[
-                ["The iPhone is a technological object that demonstrates consistency, scalability, and precision. It is defined by iterative updates, predictable release cycles, and optimized integration between hardware and software. The system functions as a closed ecosystem where inputs are standardized, processes are regulated, and outputs are uniform. In this framework, the iPhone is not only a communication tool but also a controlled environment for digital interaction."],
-                ["Hey everyone! I just got the new iPhone and I'm absolutely loving it! The camera quality is insane - took some photos yesterday at the beach and they look professional. Battery life is way better than my old phone too. Definitely worth the upgrade if you're thinking about it. Anyone else get one yet?"],
-                ["The implementation of sustainable energy solutions requires comprehensive analysis of environmental factors, economic considerations, and technological feasibility to ensure optimal outcomes for stakeholders. Organizations must systematically evaluate various renewable energy options before making strategic investment decisions. This framework facilitates the optimization of resource allocation."],
-                ["I cannot believe what happened at work today! My boss actually praised the report I spent weeks on. Turns out all those late nights were worth it. My coworker Mike was shocked too - he has been there for 10 years and says he has never seen the boss so enthusiastic about anything. Guess I am finally getting the hang of this job!"]
-            ],
-            inputs=text_input,
-            outputs=[summary_result, highlighted_text_display, bar_chart, detailed_metrics, text_info],
-            fn=analyze_text_enhanced,
-            cache_examples=False
-        )
-    return interface
-# Launch the enhanced interface
 if __name__ == "__main__":
-    interface = create_enhanced_interface()
-    interface.launch(
-        server_name="0.0.0.0",
-        server_port=7860,
-        share=True,
-        show_error=True,
-        debug=False
-    )

 import gradio as gr
 import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import numpy as np
+from scipy import stats
 import re
 from collections import Counter
+import math
+class AdvancedAITextDetector:
     def __init__(self):
+        """Initialize the AI Text Detector with multiple detection methods"""
+        # Load pre-trained model for AI detection
+        self.model_name = "Hello-SimpleAI/chatgpt-detector-roberta"
         try:
+            self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
+            self.model = AutoModelForSequenceClassification.from_pretrained(self.model_name)
+            self.model.eval()
+            self.model_loaded = True
+        except:
+            print("Warning: Could not load transformer model. Using statistical methods only.")
+            self.model_loaded = False
+    def calculate_perplexity_score(self, text):
+        """Calculate perplexity-based features"""
         words = text.split()
+        if len(words) < 2:
             return 0.5
+        # Simple bigram perplexity approximation
+        bigrams = [(words[i], words[i+1]) for i in range(len(words)-1)]
+        unique_bigrams = len(set(bigrams))
+        total_bigrams = len(bigrams)
+        # AI text tends to have less variation in bigrams
+        diversity_score = unique_bigrams / total_bigrams if total_bigrams > 0 else 0
+        return diversity_score
+    def calculate_burstiness(self, text):
+        """Calculate burstiness - human text tends to be more bursty"""
+        sentences = re.split(r'[.!?]+', text)
+        sentence_lengths = [len(s.split()) for s in sentences if s.strip()]
+        if len(sentence_lengths) < 2:
+            return 0.5
+        # Calculate variance in sentence lengths
+        variance = np.var(sentence_lengths)
+        mean_length = np.mean(sentence_lengths)
+        # Normalize burstiness score
+        burstiness = variance / (mean_length + 1) if mean_length > 0 else 0
+        return min(burstiness / 10, 1.0)  # Normalize to 0-1
+    def calculate_repetition_score(self, text):
+        """Calculate repetition patterns - AI tends to repeat phrases more"""
+        words = text.lower().split()
+        # Check for repeated phrases (3-grams)
+        if len(words) < 3:
+            return 0.5
+        trigrams = [' '.join(words[i:i+3]) for i in range(len(words)-2)]
+        trigram_counts = Counter(trigrams)
+        repeated_trigrams = sum(1 for count in trigram_counts.values() if count > 1)
+        repetition_ratio = repeated_trigrams / len(trigrams) if trigrams else 0
+        return repetition_ratio
+    def calculate_vocabulary_diversity(self, text):
+        """Calculate vocabulary diversity - AI text often has less diverse vocabulary"""
+        words = re.findall(r'\b\w+\b', text.lower())
+        if not words:
+            return 0.5
+        unique_words = set(words)
+        diversity = len(unique_words) / len(words)
+        # Type-token ratio
+        return diversity
+    def calculate_punctuation_patterns(self, text):
+        """Analyze punctuation patterns - AI has more regular punctuation"""
+        sentences = re.split(r'[.!?]+', text)
+        punct_variance = []
+        for sentence in sentences:
+            if sentence.strip():
+                punct_count = len(re.findall(r'[,;:\-—()]', sentence))
+                word_count = len(sentence.split())
+                if word_count > 0:
+                    punct_variance.append(punct_count / word_count)
+        if not punct_variance:
+            return 0.5
+        # AI text tends to have more consistent punctuation density
+        variance = np.var(punct_variance)
+        return 1 - min(variance * 10, 1.0)  # Lower variance = more likely AI
+    def detect_ai_statistical(self, text):
+        """Combine statistical methods for AI detection"""
+        if len(text.strip()) < 50:
+            return 0.5, "Text too short for accurate analysis"
+        # Calculate various features
+        perplexity_score = self.calculate_perplexity_score(text)
+        burstiness = self.calculate_burstiness(text)
+        repetition = self.calculate_repetition_score(text)
+        vocab_diversity = self.calculate_vocabulary_diversity(text)
+        punct_patterns = self.calculate_punctuation_patterns(text)
+        # Weighted combination of features
+        # Lower perplexity, lower burstiness, higher repetition, lower diversity = more likely AI
         ai_score = (
+            (1 - perplexity_score) * 0.2 +  # Low diversity in bigrams
+            (1 - burstiness) * 0.25 +        # Low burstiness
+            repetition * 0.2 +               # High repetition
+            (1 - vocab_diversity) * 0.2 +    # Low vocabulary diversity
+            punct_patterns * 0.15            # Regular punctuation
         )
+        return ai_score, {
+            "perplexity_score": perplexity_score,
+            "burstiness": burstiness,
+            "repetition": repetition,
+            "vocab_diversity": vocab_diversity,
+            "punct_patterns": punct_patterns
         }
+    def detect_ai_transformer(self, text):
+        """Use transformer model for AI detection"""
+        if not self.model_loaded:
+            return 0.5, "Model not loaded"
         try:
+            inputs = self.tokenizer(text, return_tensors="pt", truncation=True,
+                                   max_length=512, padding=True)
+            with torch.no_grad():
+                outputs = self.model(**inputs)
+                logits = outputs.logits
+                probabilities = torch.softmax(logits, dim=-1)
+                # Assuming class 1 is AI-generated
+                ai_probability = probabilities[0][1].item()
+            return ai_probability, "Transformer model prediction"
         except Exception as e:
+            return 0.5, f"Error in transformer model: {str(e)}"
+    def detect(self, text):
+        """Main detection method combining multiple approaches"""
+        if not text or len(text.strip()) < 20:
             return {
+                "ai_probability": 0.5,
+                "classification": "Undetermined",
+                "confidence": "Low",
+                "explanation": "Text too short for accurate analysis. Please provide at least 50 characters.",
+                "detailed_scores": {}
             }
+        # Get statistical analysis
+        stat_score, stat_details = self.detect_ai_statistical(text)
+        # Get transformer model prediction if available
+        if self.model_loaded:
+            transformer_score, _ = self.detect_ai_transformer(text)
+            # Weighted average of both methods
+            final_score = (transformer_score * 0.7 + stat_score * 0.3)
+        else:
+            final_score = stat_score
+        # Determine classification and confidence
+        if final_score >= 0.8:
+            classification = "AI-Generated"
+            confidence = "High"
+        elif final_score >= 0.6:
+            classification = "Likely AI-Generated"
+            confidence = "Medium"
+        elif final_score >= 0.4:
+            classification = "Uncertain"
+            confidence = "Low"
+        elif final_score >= 0.2:
+            classification = "Likely Human-Written"
+            confidence = "Medium"
+        else:
+            classification = "Human-Written"
+            confidence = "High"
+        # Create detailed explanation
+        explanation = self._generate_explanation(final_score, stat_details if isinstance(stat_details, dict) else {})
+        return {
+            "ai_probability": round(final_score * 100, 2),
+            "classification": classification,
+            "confidence": confidence,
+            "explanation": explanation,
+            "detailed_scores": stat_details if isinstance(stat_details, dict) else {}
+        }
+    def _generate_explanation(self, score, details):
+        """Generate human-readable explanation of the detection result"""
+        explanations = []
+        if score >= 0.7:
+            explanations.append("This text shows strong indicators of AI generation.")
+        elif score >= 0.3:
+            explanations.append("This text shows mixed characteristics.")
+        else:
+            explanations.append("This text appears to be human-written.")
+        if details:
+            if details.get('burstiness', 0.5) < 0.3:
+                explanations.append("• Low sentence length variation (typical of AI)")
+            elif details.get('burstiness', 0.5) > 0.7:
+                explanations.append("• High sentence length variation (typical of humans)")
+            if details.get('vocab_diversity', 0.5) < 0.4:
+                explanations.append("• Limited vocabulary diversity")
+            elif details.get('vocab_diversity', 0.5) > 0.6:
+                explanations.append("• Rich vocabulary diversity")
+            if details.get('repetition', 0) > 0.2:
+                explanations.append("• Notable phrase repetition detected")
+            if details.get('punct_patterns', 0.5) > 0.7:
+                explanations.append("• Regular punctuation patterns (AI-like)")
+        return " ".join(explanations)
+# Initialize detector
+detector = AdvancedAITextDetector()
+def analyze_text(text):
+    """Gradio interface function"""
+    result = detector.detect(text)
+    # Format output for Gradio
+    output = f"""
+## Detection Result
+**Classification:** {result['classification']}
+**AI Probability:** {result['ai_probability']}%
+**Confidence Level:** {result['confidence']}
+### Analysis Details
+{result['explanation']}
+### Detailed Metrics
+"""
+    if result['detailed_scores']:
+        for metric, value in result['detailed_scores'].items():
+            metric_name = metric.replace('_', ' ').title()
+            output += f"- {metric_name}: {round(value, 3)}\n"
+    # Create a simple bar chart visualization
+    ai_prob = result['ai_probability']
+    human_prob = 100 - ai_prob
+    bar_chart = f"""
+### Probability Distribution
+```
+AI-Generated:    {'█' * int(ai_prob/5)}{'░' * (20-int(ai_prob/5))} {ai_prob}%
+Human-Written:   {'█' * int(human_prob/5)}{'░' * (20-int(human_prob/5))} {human_prob}%
+```
+"""
+    return output + bar_chart
+# Create Gradio interface
+interface = gr.Interface(
+    fn=analyze_text,
+    inputs=gr.Textbox(
+        lines=10,
+        placeholder="Paste the text you want to analyze here...",
+        label="Input Text"
+    ),
+    outputs=gr.Markdown(label="Analysis Result"),
+    title="🔍 Advanced AI Text Detector",
+    description="""
+    This advanced AI text detector uses multiple techniques to identify AI-generated content:
+    - **Transformer-based detection** using fine-tuned RoBERTa model
+    - **Statistical analysis** including burstiness, perplexity, and repetition patterns
+    - **Linguistic features** such as vocabulary diversity and punctuation patterns
+    The tool is particularly effective at detecting text from ChatGPT, GPT-4, and similar language models.
+    For best results, provide at least 100 words of text.
+    """,
+    examples=[
+        ["The impact of artificial intelligence on modern society cannot be overstated. From healthcare to transportation, AI systems are revolutionizing how we live and work. Machine learning algorithms process vast amounts of data to identify patterns and make predictions with unprecedented accuracy. In medical diagnosis, AI assists doctors in detecting diseases earlier than ever before. Autonomous vehicles promise to transform our cities and reduce traffic accidents. However, these advancements also raise important ethical questions about privacy, employment, and human autonomy that society must carefully consider."],
+        ["So I was walking down the street yesterday, right? And this crazy thing happened - I mean, you won't believe it. There was this dog, just a regular golden retriever, but it was wearing these ridiculous sunglasses. Like, who puts sunglasses on a dog? Anyway, the owner was this old lady, must've been like 80 or something, and she was just chatting away on her phone, completely oblivious. The dog looked so confused! I couldn't help but laugh. Sometimes you see the weirdest stuff when you're just out and about, you know?"]
+    ],
+    theme=gr.themes.Soft(),
+    analytics_enabled=False
+)
 if __name__ == "__main__":
+    interface.launch()

requirements.txt CHANGED Viewed

@@ -1,8 +1,6 @@
-gradio>=4.0.0
-torch>=1.13.0
-transformers>=4.25.0
-numpy>=1.21.0
-scikit-learn>=1.2.0
-plotly>=5.0.0
-fastapi>=0.68.0
-uvicorn>=0.15.0

+gradio==4.44.0
+torch==2.1.0
+transformers==4.35.0
+scipy==1.11.4
+numpy==1.24.3
+huggingface-hub==0.19.4