Spaces:
Running
Running
milwright
commited on
Commit
·
f4795d7
1
Parent(s):
4f630fa
clean: remove test framework and unnecessary files
Browse files- .claude/settings.local.json +0 -15
- CLAUDE.local.md +0 -2
- README-testing-framework.md +0 -217
- model-testing.html +0 -629
- src/modelTestingFramework.js +0 -703
- src/testAIService.js +0 -154
- src/testGameRunner.js +0 -473
- src/testReportGenerator.js +0 -453
- src/userRankingInterface.js +0 -650
- test-direct.js +0 -28
- test-local-llm.js +0 -155
.claude/settings.local.json
DELETED
|
@@ -1,15 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"permissions": {
|
| 3 |
-
"allow": [
|
| 4 |
-
"Bash(git checkout:*)",
|
| 5 |
-
"Bash(cp:*)",
|
| 6 |
-
"Bash(rm:*)",
|
| 7 |
-
"Bash(git commit:*)",
|
| 8 |
-
"Bash(git push:*)",
|
| 9 |
-
"Bash(git add:*)",
|
| 10 |
-
"Bash(grep:*)",
|
| 11 |
-
"Bash(node:*)"
|
| 12 |
-
],
|
| 13 |
-
"deny": []
|
| 14 |
-
}
|
| 15 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CLAUDE.local.md
DELETED
|
@@ -1,2 +0,0 @@
|
|
| 1 |
-
- DO NOT SIGN OFF COMMIT MESSAGES WITH CLAUDE AS AN AUTHOR
|
| 2 |
-
- Remember to review this file at the end of prompt engineering related changes or when the user tells you to or at the end of a long session. If changes have been made to OpenRouter prompt language or programming logic, low or high level, then update this file accordingly.
|
|
|
|
|
|
|
|
|
README-testing-framework.md
DELETED
|
@@ -1,217 +0,0 @@
|
|
| 1 |
-
# Cloze Reader Model Testing Framework
|
| 2 |
-
|
| 3 |
-
A comprehensive testing system for evaluating AI models across all tasks in the Cloze Reader application, including both OpenRouter and local LLM (LM Studio) models.
|
| 4 |
-
|
| 5 |
-
## Features
|
| 6 |
-
|
| 7 |
-
### 🎯 Comprehensive Testing
|
| 8 |
-
- **Word Selection Testing**: Evaluates vocabulary selection accuracy, difficulty matching, and response quality
|
| 9 |
-
- **Contextualization Testing**: Tests historical and literary context generation for books and authors
|
| 10 |
-
- **Chat Hints Testing**: Assesses all 4 question types (part of speech, sentence role, word category, synonym)
|
| 11 |
-
- **Performance Monitoring**: Tracks response times, success rates, and error patterns
|
| 12 |
-
- **User Satisfaction Ratings**: Collect user feedback on model performance after each round
|
| 13 |
-
|
| 14 |
-
### 🏠 Local LLM Support
|
| 15 |
-
- **LM Studio Integration**: Auto-detects models running on port 1234
|
| 16 |
-
- **Real-time Status**: Shows connection status and available models
|
| 17 |
-
- **Response Cleaning**: Handles local LLM output artifacts automatically
|
| 18 |
-
- **Fallback Testing**: Graceful handling when local server is unavailable
|
| 19 |
-
|
| 20 |
-
### 📊 Advanced Analytics
|
| 21 |
-
- **Multi-format Reports**: JSON, CSV, and Markdown outputs
|
| 22 |
-
- **Performance Comparisons**: Side-by-side model analysis
|
| 23 |
-
- **Quality Scoring**: Detailed evaluation metrics for each task
|
| 24 |
-
- **Interactive Game Testing**: Real-time performance monitoring during gameplay
|
| 25 |
-
- **User Ranking Integration**: 5-star ratings for word selection, passage quality, hint helpfulness, and overall experience
|
| 26 |
-
|
| 27 |
-
## Quick Start
|
| 28 |
-
|
| 29 |
-
### 1. Start the Testing Interface
|
| 30 |
-
```bash
|
| 31 |
-
# Start development server
|
| 32 |
-
make dev
|
| 33 |
-
# or
|
| 34 |
-
python local-server.py 8000
|
| 35 |
-
|
| 36 |
-
# Open testing interface
|
| 37 |
-
open http://localhost:8000/model-testing.html
|
| 38 |
-
```
|
| 39 |
-
|
| 40 |
-
### 2. Setup Local LLM (Optional)
|
| 41 |
-
```bash
|
| 42 |
-
# Start LM Studio server on port 1234
|
| 43 |
-
# Load your preferred model (e.g., Gemma-3-12b, Llama-3.1-8b)
|
| 44 |
-
# The framework will auto-detect available models
|
| 45 |
-
```
|
| 46 |
-
|
| 47 |
-
### 3. Run Tests
|
| 48 |
-
1. Select models to test (OpenRouter and/or local models)
|
| 49 |
-
2. Click "Start Comprehensive Test" for full evaluation
|
| 50 |
-
3. Or click "Test Selected Model in Game" for interactive testing
|
| 51 |
-
4. Results are automatically saved to the `/output` folder
|
| 52 |
-
|
| 53 |
-
## Test Results
|
| 54 |
-
|
| 55 |
-
### CSV Output Format
|
| 56 |
-
Results are saved as timestamped CSV files with columns for:
|
| 57 |
-
- Model performance metrics (overall score, success rates)
|
| 58 |
-
- Response time analytics (average, min, max)
|
| 59 |
-
- Task-specific scores (word selection, contextualization, chat hints)
|
| 60 |
-
- Error rates and reliability metrics
|
| 61 |
-
- User satisfaction ratings (1-5 stars per category)
|
| 62 |
-
- User comments and feedback count
|
| 63 |
-
|
| 64 |
-
### Game Testing Output
|
| 65 |
-
Interactive game sessions generate JSON reports with:
|
| 66 |
-
- Real-time AI interaction logs
|
| 67 |
-
- User performance analytics
|
| 68 |
-
- Response time breakdowns
|
| 69 |
-
- Error tracking and categorization
|
| 70 |
-
- User satisfaction ratings per round
|
| 71 |
-
- Qualitative feedback and comments
|
| 72 |
-
|
| 73 |
-
## Model Categories
|
| 74 |
-
|
| 75 |
-
### OpenRouter Models
|
| 76 |
-
- GPT-4o, GPT-4o Mini
|
| 77 |
-
- Claude 3.5 Sonnet, Claude 3 Haiku
|
| 78 |
-
- Gemini Pro 1.5
|
| 79 |
-
- Llama 3.1 (8B, 70B)
|
| 80 |
-
- Mistral 7B, Phi-3 Medium, Qwen 2 7B
|
| 81 |
-
|
| 82 |
-
### Local LLM Models (LM Studio)
|
| 83 |
-
- Auto-detected from running server
|
| 84 |
-
- Supports any OpenAI-compatible model
|
| 85 |
-
- Common options: Gemma-3-12b, Llama-3.1-8b, Mistral-7b
|
| 86 |
-
|
| 87 |
-
## Testing Methodology
|
| 88 |
-
|
| 89 |
-
### Word Selection Evaluation
|
| 90 |
-
- **Accuracy**: Words exist in source passage
|
| 91 |
-
- **Difficulty Matching**: Length and complexity appropriate for level
|
| 92 |
-
- **Quality Scoring**: Avoids overly common words at higher difficulties
|
| 93 |
-
- **Performance**: Response time and success rate tracking
|
| 94 |
-
- **User Rating**: 5-star scale for vocabulary appropriateness
|
| 95 |
-
|
| 96 |
-
### Contextualization Assessment
|
| 97 |
-
- **Relevance**: Mentions book title, author, historical context
|
| 98 |
-
- **Educational Value**: Appropriate for language learners
|
| 99 |
-
- **Completeness**: Balanced length (100-500 characters)
|
| 100 |
-
- **Literary Terms**: Uses appropriate academic vocabulary
|
| 101 |
-
- **User Rating**: Passage quality and educational value scoring
|
| 102 |
-
|
| 103 |
-
### Chat Hints Analysis
|
| 104 |
-
- **Question Type Coverage**: All 4 hint categories tested
|
| 105 |
-
- **Educational Appropriateness**: Helps without revealing answers
|
| 106 |
-
- **Response Quality**: Clear, concise, and helpful explanations
|
| 107 |
-
- **Consistency**: Performance across different question types
|
| 108 |
-
- **User Rating**: Helpfulness and clarity of AI hints
|
| 109 |
-
|
| 110 |
-
### User Experience Rating
|
| 111 |
-
After each round, users can rate:
|
| 112 |
-
- **Word Selection Quality** (1-5 stars)
|
| 113 |
-
- **Passage Selection** (1-5 stars)
|
| 114 |
-
- **Hint Helpfulness** (1-5 stars)
|
| 115 |
-
- **Overall Experience** (1-5 stars)
|
| 116 |
-
- **Optional Comments** for detailed feedback
|
| 117 |
-
|
| 118 |
-
## Architecture
|
| 119 |
-
|
| 120 |
-
### Core Components
|
| 121 |
-
- **ModelTestingFramework**: Main testing orchestrator
|
| 122 |
-
- **TestAIService**: Performance-tracking AI service wrapper
|
| 123 |
-
- **TestGameRunner**: Real-time game session monitoring
|
| 124 |
-
- **TestReportGenerator**: Multi-format report generation
|
| 125 |
-
|
| 126 |
-
### File Structure
|
| 127 |
-
```
|
| 128 |
-
src/
|
| 129 |
-
├── modelTestingFramework.js # Main testing logic
|
| 130 |
-
├── testAIService.js # AI service wrapper
|
| 131 |
-
├── testGameRunner.js # Game monitoring
|
| 132 |
-
└── testReportGenerator.js # Report generation
|
| 133 |
-
|
| 134 |
-
model-testing.html # Testing interface UI
|
| 135 |
-
output/ # Test results folder
|
| 136 |
-
```
|
| 137 |
-
|
| 138 |
-
## Usage Examples
|
| 139 |
-
|
| 140 |
-
### Automated Testing
|
| 141 |
-
```javascript
|
| 142 |
-
import { ModelTestingFramework } from './src/modelTestingFramework.js';
|
| 143 |
-
|
| 144 |
-
const framework = new ModelTestingFramework();
|
| 145 |
-
const results = await framework.runComprehensiveTest();
|
| 146 |
-
console.log('Results saved to output folder');
|
| 147 |
-
```
|
| 148 |
-
|
| 149 |
-
### Custom Model Testing
|
| 150 |
-
```javascript
|
| 151 |
-
const customModel = {
|
| 152 |
-
id: 'my-local-model',
|
| 153 |
-
name: 'Custom Local Model',
|
| 154 |
-
provider: 'local'
|
| 155 |
-
};
|
| 156 |
-
|
| 157 |
-
const result = await framework.testModel(customModel);
|
| 158 |
-
```
|
| 159 |
-
|
| 160 |
-
### Report Generation
|
| 161 |
-
```javascript
|
| 162 |
-
import { TestReportGenerator } from './src/testReportGenerator.js';
|
| 163 |
-
|
| 164 |
-
const generator = new TestReportGenerator();
|
| 165 |
-
const reports = await generator.generateAllReports(testResults);
|
| 166 |
-
// Generates JSON, CSV, and Markdown reports
|
| 167 |
-
```
|
| 168 |
-
|
| 169 |
-
## Integration with Existing Codebase
|
| 170 |
-
|
| 171 |
-
The testing framework integrates seamlessly with the existing Cloze Reader architecture:
|
| 172 |
-
|
| 173 |
-
- **aiService.js**: Framework uses the same AI service patterns
|
| 174 |
-
- **conversationManager.js**: Chat hint testing leverages existing conversation logic
|
| 175 |
-
- **clozeGameEngine.js**: Game testing monitors actual game interactions
|
| 176 |
-
- **bookDataService.js**: Uses same book data and quality filtering
|
| 177 |
-
|
| 178 |
-
## Troubleshooting
|
| 179 |
-
|
| 180 |
-
### Local LLM Issues
|
| 181 |
-
- Ensure LM Studio is running on port 1234
|
| 182 |
-
- Check that a model is loaded and ready
|
| 183 |
-
- Verify CORS is enabled in LM Studio settings
|
| 184 |
-
|
| 185 |
-
### API Key Issues
|
| 186 |
-
- OpenRouter API key must be set via environment variable or meta tag
|
| 187 |
-
- Local models don't require API keys
|
| 188 |
-
|
| 189 |
-
### Performance Issues
|
| 190 |
-
- Large model testing can take 10-30 minutes
|
| 191 |
-
- Consider testing fewer models or specific categories
|
| 192 |
-
- Monitor network connectivity for OpenRouter models
|
| 193 |
-
|
| 194 |
-
## Contributing
|
| 195 |
-
|
| 196 |
-
The testing framework is designed to be extensible:
|
| 197 |
-
|
| 198 |
-
1. Add new model providers in `ModelTestingFramework.constructor()`
|
| 199 |
-
2. Extend evaluation metrics in the respective `evaluate*` methods
|
| 200 |
-
3. Add new report formats in `TestReportGenerator`
|
| 201 |
-
4. Enhance UI components in `model-testing.html`
|
| 202 |
-
|
| 203 |
-
## Results Interpretation
|
| 204 |
-
|
| 205 |
-
### Overall Scores
|
| 206 |
-
- **90-100**: Excellent performance across all tasks
|
| 207 |
-
- **80-89**: Very good with minor weaknesses
|
| 208 |
-
- **70-79**: Good performance with some limitations
|
| 209 |
-
- **60-69**: Adequate but needs improvement
|
| 210 |
-
- **Below 60**: Poor performance, not recommended
|
| 211 |
-
|
| 212 |
-
### Success Rate Thresholds
|
| 213 |
-
- **Word Selection**: >80% for production use
|
| 214 |
-
- **Contextualization**: >90% for educational content
|
| 215 |
-
- **Chat Hints**: >85% for effective tutoring
|
| 216 |
-
|
| 217 |
-
Use these benchmarks to select the best model for your specific needs and performance requirements.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
model-testing.html
DELETED
|
@@ -1,629 +0,0 @@
|
|
| 1 |
-
<!DOCTYPE html>
|
| 2 |
-
<html lang="en">
|
| 3 |
-
<head>
|
| 4 |
-
<meta charset="UTF-8">
|
| 5 |
-
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
-
<title>Cloze Reader - Model Testing Framework</title>
|
| 7 |
-
<style>
|
| 8 |
-
body {
|
| 9 |
-
font-family: 'Georgia', serif;
|
| 10 |
-
background: linear-gradient(135deg, #f5f3f0 0%, #e8e4df 100%);
|
| 11 |
-
margin: 0;
|
| 12 |
-
padding: 20px;
|
| 13 |
-
min-height: 100vh;
|
| 14 |
-
}
|
| 15 |
-
|
| 16 |
-
.container {
|
| 17 |
-
max-width: 1200px;
|
| 18 |
-
margin: 0 auto;
|
| 19 |
-
background: rgba(255, 255, 255, 0.95);
|
| 20 |
-
border-radius: 15px;
|
| 21 |
-
box-shadow: 0 10px 30px rgba(0, 0, 0, 0.1);
|
| 22 |
-
padding: 40px;
|
| 23 |
-
}
|
| 24 |
-
|
| 25 |
-
h1 {
|
| 26 |
-
text-align: center;
|
| 27 |
-
color: #2c3e50;
|
| 28 |
-
font-size: 2.5rem;
|
| 29 |
-
margin-bottom: 10px;
|
| 30 |
-
text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.1);
|
| 31 |
-
}
|
| 32 |
-
|
| 33 |
-
.subtitle {
|
| 34 |
-
text-align: center;
|
| 35 |
-
color: #7f8c8d;
|
| 36 |
-
font-size: 1.2rem;
|
| 37 |
-
margin-bottom: 40px;
|
| 38 |
-
}
|
| 39 |
-
|
| 40 |
-
.model-selection {
|
| 41 |
-
background: #f8f9fa;
|
| 42 |
-
border-radius: 10px;
|
| 43 |
-
padding: 30px;
|
| 44 |
-
margin-bottom: 30px;
|
| 45 |
-
border: 2px solid #e9ecef;
|
| 46 |
-
}
|
| 47 |
-
|
| 48 |
-
.model-selection h2 {
|
| 49 |
-
color: #2c3e50;
|
| 50 |
-
margin-bottom: 20px;
|
| 51 |
-
font-size: 1.5rem;
|
| 52 |
-
}
|
| 53 |
-
|
| 54 |
-
.model-grid {
|
| 55 |
-
display: grid;
|
| 56 |
-
grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
|
| 57 |
-
gap: 15px;
|
| 58 |
-
margin-bottom: 20px;
|
| 59 |
-
}
|
| 60 |
-
|
| 61 |
-
.model-option {
|
| 62 |
-
background: white;
|
| 63 |
-
border: 2px solid #dee2e6;
|
| 64 |
-
border-radius: 8px;
|
| 65 |
-
padding: 15px;
|
| 66 |
-
cursor: pointer;
|
| 67 |
-
transition: all 0.3s ease;
|
| 68 |
-
position: relative;
|
| 69 |
-
}
|
| 70 |
-
|
| 71 |
-
.model-option:hover {
|
| 72 |
-
border-color: #007bff;
|
| 73 |
-
box-shadow: 0 4px 8px rgba(0, 123, 255, 0.2);
|
| 74 |
-
}
|
| 75 |
-
|
| 76 |
-
.model-option.selected {
|
| 77 |
-
border-color: #28a745;
|
| 78 |
-
background: #f8fff9;
|
| 79 |
-
}
|
| 80 |
-
|
| 81 |
-
.model-option input[type="checkbox"] {
|
| 82 |
-
position: absolute;
|
| 83 |
-
top: 10px;
|
| 84 |
-
right: 10px;
|
| 85 |
-
transform: scale(1.2);
|
| 86 |
-
}
|
| 87 |
-
|
| 88 |
-
.model-name {
|
| 89 |
-
font-weight: bold;
|
| 90 |
-
color: #2c3e50;
|
| 91 |
-
margin-bottom: 5px;
|
| 92 |
-
}
|
| 93 |
-
|
| 94 |
-
.model-provider {
|
| 95 |
-
color: #6c757d;
|
| 96 |
-
font-size: 0.9rem;
|
| 97 |
-
margin-bottom: 5px;
|
| 98 |
-
}
|
| 99 |
-
|
| 100 |
-
.model-id {
|
| 101 |
-
color: #495057;
|
| 102 |
-
font-size: 0.8rem;
|
| 103 |
-
font-family: monospace;
|
| 104 |
-
background: #f1f3f4;
|
| 105 |
-
padding: 2px 6px;
|
| 106 |
-
border-radius: 4px;
|
| 107 |
-
}
|
| 108 |
-
|
| 109 |
-
.controls {
|
| 110 |
-
display: flex;
|
| 111 |
-
gap: 15px;
|
| 112 |
-
align-items: center;
|
| 113 |
-
flex-wrap: wrap;
|
| 114 |
-
}
|
| 115 |
-
|
| 116 |
-
.btn {
|
| 117 |
-
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 118 |
-
color: white;
|
| 119 |
-
border: none;
|
| 120 |
-
padding: 12px 24px;
|
| 121 |
-
border-radius: 8px;
|
| 122 |
-
font-size: 1rem;
|
| 123 |
-
cursor: pointer;
|
| 124 |
-
transition: all 0.3s ease;
|
| 125 |
-
font-weight: 500;
|
| 126 |
-
}
|
| 127 |
-
|
| 128 |
-
.btn:hover {
|
| 129 |
-
transform: translateY(-2px);
|
| 130 |
-
box-shadow: 0 6px 20px rgba(102, 126, 234, 0.4);
|
| 131 |
-
}
|
| 132 |
-
|
| 133 |
-
.btn:disabled {
|
| 134 |
-
background: #6c757d;
|
| 135 |
-
cursor: not-allowed;
|
| 136 |
-
transform: none;
|
| 137 |
-
box-shadow: none;
|
| 138 |
-
}
|
| 139 |
-
|
| 140 |
-
.btn-secondary {
|
| 141 |
-
background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);
|
| 142 |
-
}
|
| 143 |
-
|
| 144 |
-
.btn-success {
|
| 145 |
-
background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%);
|
| 146 |
-
}
|
| 147 |
-
|
| 148 |
-
.progress-section {
|
| 149 |
-
margin-top: 30px;
|
| 150 |
-
padding: 20px;
|
| 151 |
-
background: #f8f9fa;
|
| 152 |
-
border-radius: 10px;
|
| 153 |
-
display: none;
|
| 154 |
-
}
|
| 155 |
-
|
| 156 |
-
.progress-section.active {
|
| 157 |
-
display: block;
|
| 158 |
-
}
|
| 159 |
-
|
| 160 |
-
.progress-bar {
|
| 161 |
-
width: 100%;
|
| 162 |
-
height: 8px;
|
| 163 |
-
background: #e9ecef;
|
| 164 |
-
border-radius: 4px;
|
| 165 |
-
overflow: hidden;
|
| 166 |
-
margin-bottom: 10px;
|
| 167 |
-
}
|
| 168 |
-
|
| 169 |
-
.progress-fill {
|
| 170 |
-
height: 100%;
|
| 171 |
-
background: linear-gradient(90deg, #667eea, #764ba2);
|
| 172 |
-
width: 0%;
|
| 173 |
-
transition: width 0.3s ease;
|
| 174 |
-
}
|
| 175 |
-
|
| 176 |
-
.status-message {
|
| 177 |
-
color: #495057;
|
| 178 |
-
font-size: 1rem;
|
| 179 |
-
margin-bottom: 10px;
|
| 180 |
-
}
|
| 181 |
-
|
| 182 |
-
.test-log {
|
| 183 |
-
background: #2d3748;
|
| 184 |
-
color: #e2e8f0;
|
| 185 |
-
padding: 15px;
|
| 186 |
-
border-radius: 8px;
|
| 187 |
-
font-family: 'Courier New', monospace;
|
| 188 |
-
font-size: 0.9rem;
|
| 189 |
-
max-height: 300px;
|
| 190 |
-
overflow-y: auto;
|
| 191 |
-
white-space: pre-wrap;
|
| 192 |
-
}
|
| 193 |
-
|
| 194 |
-
.results-section {
|
| 195 |
-
margin-top: 30px;
|
| 196 |
-
padding: 20px;
|
| 197 |
-
background: #f8f9fa;
|
| 198 |
-
border-radius: 10px;
|
| 199 |
-
display: none;
|
| 200 |
-
}
|
| 201 |
-
|
| 202 |
-
.results-section.active {
|
| 203 |
-
display: block;
|
| 204 |
-
}
|
| 205 |
-
|
| 206 |
-
.results-grid {
|
| 207 |
-
display: grid;
|
| 208 |
-
grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
|
| 209 |
-
gap: 20px;
|
| 210 |
-
margin-top: 20px;
|
| 211 |
-
}
|
| 212 |
-
|
| 213 |
-
.result-card {
|
| 214 |
-
background: white;
|
| 215 |
-
border-radius: 8px;
|
| 216 |
-
padding: 20px;
|
| 217 |
-
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
|
| 218 |
-
}
|
| 219 |
-
|
| 220 |
-
.result-card h3 {
|
| 221 |
-
color: #2c3e50;
|
| 222 |
-
margin-bottom: 15px;
|
| 223 |
-
font-size: 1.2rem;
|
| 224 |
-
}
|
| 225 |
-
|
| 226 |
-
.metric {
|
| 227 |
-
display: flex;
|
| 228 |
-
justify-content: space-between;
|
| 229 |
-
margin-bottom: 10px;
|
| 230 |
-
padding-bottom: 8px;
|
| 231 |
-
border-bottom: 1px solid #e9ecef;
|
| 232 |
-
}
|
| 233 |
-
|
| 234 |
-
.metric:last-child {
|
| 235 |
-
border-bottom: none;
|
| 236 |
-
margin-bottom: 0;
|
| 237 |
-
}
|
| 238 |
-
|
| 239 |
-
.metric-label {
|
| 240 |
-
color: #6c757d;
|
| 241 |
-
font-weight: 500;
|
| 242 |
-
}
|
| 243 |
-
|
| 244 |
-
.metric-value {
|
| 245 |
-
color: #2c3e50;
|
| 246 |
-
font-weight: bold;
|
| 247 |
-
}
|
| 248 |
-
|
| 249 |
-
.score-high { color: #28a745; }
|
| 250 |
-
.score-medium { color: #ffc107; }
|
| 251 |
-
.score-low { color: #dc3545; }
|
| 252 |
-
|
| 253 |
-
.game-section {
|
| 254 |
-
margin-top: 30px;
|
| 255 |
-
padding: 20px;
|
| 256 |
-
background: #f8f9fa;
|
| 257 |
-
border-radius: 10px;
|
| 258 |
-
display: none;
|
| 259 |
-
}
|
| 260 |
-
|
| 261 |
-
.game-section.active {
|
| 262 |
-
display: block;
|
| 263 |
-
}
|
| 264 |
-
|
| 265 |
-
.game-frame {
|
| 266 |
-
width: 100%;
|
| 267 |
-
height: 600px;
|
| 268 |
-
border: none;
|
| 269 |
-
border-radius: 8px;
|
| 270 |
-
background: white;
|
| 271 |
-
}
|
| 272 |
-
|
| 273 |
-
@media (max-width: 768px) {
|
| 274 |
-
.container {
|
| 275 |
-
padding: 20px;
|
| 276 |
-
}
|
| 277 |
-
|
| 278 |
-
.model-grid {
|
| 279 |
-
grid-template-columns: 1fr;
|
| 280 |
-
}
|
| 281 |
-
|
| 282 |
-
.controls {
|
| 283 |
-
flex-direction: column;
|
| 284 |
-
align-items: stretch;
|
| 285 |
-
}
|
| 286 |
-
}
|
| 287 |
-
</style>
|
| 288 |
-
</head>
|
| 289 |
-
<body>
|
| 290 |
-
<div class="container">
|
| 291 |
-
<h1>Model Testing Framework</h1>
|
| 292 |
-
<p class="subtitle">Comprehensive evaluation of AI models for the Cloze Reader application</p>
|
| 293 |
-
|
| 294 |
-
<div class="model-selection">
|
| 295 |
-
<h2>Select Models to Test</h2>
|
| 296 |
-
<div id="modelGrid" class="model-grid">
|
| 297 |
-
<!-- Models will be populated by JavaScript -->
|
| 298 |
-
</div>
|
| 299 |
-
|
| 300 |
-
<div class="controls">
|
| 301 |
-
<button id="selectAllBtn" class="btn btn-secondary">Select All</button>
|
| 302 |
-
<button id="clearAllBtn" class="btn btn-secondary">Clear All</button>
|
| 303 |
-
<button id="startTestBtn" class="btn">Start Comprehensive Test</button>
|
| 304 |
-
<button id="testGameBtn" class="btn btn-success">Test Selected Model in Game</button>
|
| 305 |
-
</div>
|
| 306 |
-
</div>
|
| 307 |
-
|
| 308 |
-
<div id="progressSection" class="progress-section">
|
| 309 |
-
<h2>Testing Progress</h2>
|
| 310 |
-
<div class="progress-bar">
|
| 311 |
-
<div id="progressFill" class="progress-fill"></div>
|
| 312 |
-
</div>
|
| 313 |
-
<div id="statusMessage" class="status-message">Initializing tests...</div>
|
| 314 |
-
<div id="testLog" class="test-log"></div>
|
| 315 |
-
</div>
|
| 316 |
-
|
| 317 |
-
<div id="resultsSection" class="results-section">
|
| 318 |
-
<h2>Test Results</h2>
|
| 319 |
-
<p>Results have been saved to the output folder as CSV files.</p>
|
| 320 |
-
<div id="resultsGrid" class="results-grid">
|
| 321 |
-
<!-- Results will be populated by JavaScript -->
|
| 322 |
-
</div>
|
| 323 |
-
</div>
|
| 324 |
-
|
| 325 |
-
<div id="gameSection" class="game-section">
|
| 326 |
-
<h2>Interactive Game Testing</h2>
|
| 327 |
-
<p>Test the selected model by playing the game. Performance will be logged for analysis.</p>
|
| 328 |
-
<iframe id="gameFrame" class="game-frame" src="about:blank"></iframe>
|
| 329 |
-
</div>
|
| 330 |
-
</div>
|
| 331 |
-
|
| 332 |
-
<script type="module">
|
| 333 |
-
import { ModelTestingFramework } from './src/modelTestingFramework.js';
|
| 334 |
-
|
| 335 |
-
class ModelTestingUI {
|
| 336 |
-
constructor() {
|
| 337 |
-
this.framework = new ModelTestingFramework();
|
| 338 |
-
this.selectedModels = new Set();
|
| 339 |
-
this.isTestingInProgress = false;
|
| 340 |
-
this.localServerStatus = null;
|
| 341 |
-
|
| 342 |
-
this.initializeUI();
|
| 343 |
-
this.setupEventListeners();
|
| 344 |
-
}
|
| 345 |
-
|
| 346 |
-
async initializeUI() {
|
| 347 |
-
await this.checkLocalServer();
|
| 348 |
-
await this.populateModelGrid();
|
| 349 |
-
}
|
| 350 |
-
|
| 351 |
-
async checkLocalServer() {
|
| 352 |
-
this.localServerStatus = await this.framework.testLocalServerConnection();
|
| 353 |
-
if (this.localServerStatus.connected) {
|
| 354 |
-
console.log('Local LM Studio server detected:', this.localServerStatus.models.length, 'models available');
|
| 355 |
-
await this.framework.detectLocalModels();
|
| 356 |
-
} else {
|
| 357 |
-
console.log('Local LM Studio server not available:', this.localServerStatus.error);
|
| 358 |
-
}
|
| 359 |
-
}
|
| 360 |
-
|
| 361 |
-
populateModelGrid() {
|
| 362 |
-
const grid = document.getElementById('modelGrid');
|
| 363 |
-
grid.innerHTML = '';
|
| 364 |
-
|
| 365 |
-
// Add local server status indicator
|
| 366 |
-
if (this.localServerStatus) {
|
| 367 |
-
const statusDiv = document.createElement('div');
|
| 368 |
-
statusDiv.className = 'server-status';
|
| 369 |
-
statusDiv.style.cssText = `
|
| 370 |
-
grid-column: 1 / -1;
|
| 371 |
-
padding: 15px;
|
| 372 |
-
margin-bottom: 15px;
|
| 373 |
-
border-radius: 8px;
|
| 374 |
-
font-weight: bold;
|
| 375 |
-
text-align: center;
|
| 376 |
-
${this.localServerStatus.connected
|
| 377 |
-
? 'background: #d4edda; color: #155724; border: 1px solid #c3e6cb;'
|
| 378 |
-
: 'background: #f8d7da; color: #721c24; border: 1px solid #f5c6cb;'
|
| 379 |
-
}
|
| 380 |
-
`;
|
| 381 |
-
|
| 382 |
-
if (this.localServerStatus.connected) {
|
| 383 |
-
statusDiv.innerHTML = `
|
| 384 |
-
✓ Local LM Studio Server Connected (Port 1234)<br>
|
| 385 |
-
<small>${this.localServerStatus.models.length} model(s) available</small>
|
| 386 |
-
`;
|
| 387 |
-
} else {
|
| 388 |
-
statusDiv.innerHTML = `
|
| 389 |
-
✗ Local LM Studio Server Not Available<br>
|
| 390 |
-
<small>Start LM Studio on port 1234 to test local models</small>
|
| 391 |
-
`;
|
| 392 |
-
}
|
| 393 |
-
|
| 394 |
-
grid.appendChild(statusDiv);
|
| 395 |
-
}
|
| 396 |
-
|
| 397 |
-
this.framework.models.forEach(model => {
|
| 398 |
-
const modelDiv = document.createElement('div');
|
| 399 |
-
modelDiv.className = 'model-option';
|
| 400 |
-
modelDiv.dataset.modelId = model.id;
|
| 401 |
-
|
| 402 |
-
// Disable local models if server is not connected
|
| 403 |
-
const isDisabled = model.provider === 'local' && !this.localServerStatus?.connected;
|
| 404 |
-
if (isDisabled) {
|
| 405 |
-
modelDiv.classList.add('disabled');
|
| 406 |
-
modelDiv.style.opacity = '0.5';
|
| 407 |
-
modelDiv.style.cursor = 'not-allowed';
|
| 408 |
-
}
|
| 409 |
-
|
| 410 |
-
const providerLabel = model.provider === 'local'
|
| 411 |
-
? `LOCAL ${this.localServerStatus?.connected ? '(✓)' : '(✗)'}`
|
| 412 |
-
: model.provider.toUpperCase();
|
| 413 |
-
|
| 414 |
-
modelDiv.innerHTML = `
|
| 415 |
-
<input type="checkbox" id="model-${model.id}" ${isDisabled ? 'disabled' : ''} />
|
| 416 |
-
<div class="model-name">${model.name}</div>
|
| 417 |
-
<div class="model-provider">${providerLabel}</div>
|
| 418 |
-
<div class="model-id">${model.id}</div>
|
| 419 |
-
`;
|
| 420 |
-
|
| 421 |
-
const checkbox = modelDiv.querySelector('input');
|
| 422 |
-
checkbox.addEventListener('change', (e) => {
|
| 423 |
-
if (e.target.checked) {
|
| 424 |
-
this.selectedModels.add(model);
|
| 425 |
-
modelDiv.classList.add('selected');
|
| 426 |
-
} else {
|
| 427 |
-
this.selectedModels.delete(model);
|
| 428 |
-
modelDiv.classList.remove('selected');
|
| 429 |
-
}
|
| 430 |
-
this.updateControlsState();
|
| 431 |
-
});
|
| 432 |
-
|
| 433 |
-
if (!isDisabled) {
|
| 434 |
-
modelDiv.addEventListener('click', (e) => {
|
| 435 |
-
if (e.target !== checkbox) {
|
| 436 |
-
checkbox.click();
|
| 437 |
-
}
|
| 438 |
-
});
|
| 439 |
-
}
|
| 440 |
-
|
| 441 |
-
grid.appendChild(modelDiv);
|
| 442 |
-
});
|
| 443 |
-
}
|
| 444 |
-
|
| 445 |
-
setupEventListeners() {
|
| 446 |
-
document.getElementById('selectAllBtn').addEventListener('click', () => {
|
| 447 |
-
this.selectAllModels();
|
| 448 |
-
});
|
| 449 |
-
|
| 450 |
-
document.getElementById('clearAllBtn').addEventListener('click', () => {
|
| 451 |
-
this.clearAllModels();
|
| 452 |
-
});
|
| 453 |
-
|
| 454 |
-
document.getElementById('startTestBtn').addEventListener('click', () => {
|
| 455 |
-
this.startComprehensiveTest();
|
| 456 |
-
});
|
| 457 |
-
|
| 458 |
-
document.getElementById('testGameBtn').addEventListener('click', () => {
|
| 459 |
-
this.startGameTest();
|
| 460 |
-
});
|
| 461 |
-
}
|
| 462 |
-
|
| 463 |
-
selectAllModels() {
|
| 464 |
-
this.framework.models.forEach(model => {
|
| 465 |
-
this.selectedModels.add(model);
|
| 466 |
-
const modelDiv = document.querySelector(`[data-model-id="${model.id}"]`);
|
| 467 |
-
const checkbox = modelDiv.querySelector('input');
|
| 468 |
-
checkbox.checked = true;
|
| 469 |
-
modelDiv.classList.add('selected');
|
| 470 |
-
});
|
| 471 |
-
this.updateControlsState();
|
| 472 |
-
}
|
| 473 |
-
|
| 474 |
-
clearAllModels() {
|
| 475 |
-
this.selectedModels.clear();
|
| 476 |
-
document.querySelectorAll('.model-option').forEach(div => {
|
| 477 |
-
div.classList.remove('selected');
|
| 478 |
-
div.querySelector('input').checked = false;
|
| 479 |
-
});
|
| 480 |
-
this.updateControlsState();
|
| 481 |
-
}
|
| 482 |
-
|
| 483 |
-
updateControlsState() {
|
| 484 |
-
const hasSelection = this.selectedModels.size > 0;
|
| 485 |
-
document.getElementById('startTestBtn').disabled = !hasSelection || this.isTestingInProgress;
|
| 486 |
-
document.getElementById('testGameBtn').disabled = this.selectedModels.size !== 1 || this.isTestingInProgress;
|
| 487 |
-
}
|
| 488 |
-
|
| 489 |
-
async startComprehensiveTest() {
|
| 490 |
-
if (this.selectedModels.size === 0) {
|
| 491 |
-
alert('Please select at least one model to test.');
|
| 492 |
-
return;
|
| 493 |
-
}
|
| 494 |
-
|
| 495 |
-
this.isTestingInProgress = true;
|
| 496 |
-
this.updateControlsState();
|
| 497 |
-
|
| 498 |
-
const progressSection = document.getElementById('progressSection');
|
| 499 |
-
const progressFill = document.getElementById('progressFill');
|
| 500 |
-
const statusMessage = document.getElementById('statusMessage');
|
| 501 |
-
const testLog = document.getElementById('testLog');
|
| 502 |
-
|
| 503 |
-
progressSection.classList.add('active');
|
| 504 |
-
testLog.textContent = '';
|
| 505 |
-
|
| 506 |
-
const modelsArray = Array.from(this.selectedModels);
|
| 507 |
-
let completedTests = 0;
|
| 508 |
-
|
| 509 |
-
try {
|
| 510 |
-
for (let i = 0; i < modelsArray.length; i++) {
|
| 511 |
-
const model = modelsArray[i];
|
| 512 |
-
const progress = (i / modelsArray.length) * 100;
|
| 513 |
-
|
| 514 |
-
progressFill.style.width = `${progress}%`;
|
| 515 |
-
statusMessage.textContent = `Testing ${model.name} (${i + 1}/${modelsArray.length})...`;
|
| 516 |
-
|
| 517 |
-
this.log(`Starting test for ${model.name}...`);
|
| 518 |
-
|
| 519 |
-
try {
|
| 520 |
-
const result = await this.framework.testModel(model);
|
| 521 |
-
this.log(`✓ ${model.name} completed - Score: ${result.overallScore.toFixed(1)}`);
|
| 522 |
-
completedTests++;
|
| 523 |
-
} catch (error) {
|
| 524 |
-
this.log(`✗ ${model.name} failed: ${error.message}`);
|
| 525 |
-
}
|
| 526 |
-
|
| 527 |
-
progressFill.style.width = `${((i + 1) / modelsArray.length) * 100}%`;
|
| 528 |
-
}
|
| 529 |
-
|
| 530 |
-
statusMessage.textContent = `Testing completed! ${completedTests}/${modelsArray.length} models tested successfully.`;
|
| 531 |
-
this.log(`\\nTesting completed! Results saved to output folder.`);
|
| 532 |
-
|
| 533 |
-
// Show results
|
| 534 |
-
this.displayResults();
|
| 535 |
-
|
| 536 |
-
} catch (error) {
|
| 537 |
-
this.log(`\\nTesting failed: ${error.message}`);
|
| 538 |
-
statusMessage.textContent = 'Testing failed. Check the log for details.';
|
| 539 |
-
} finally {
|
| 540 |
-
this.isTestingInProgress = false;
|
| 541 |
-
this.updateControlsState();
|
| 542 |
-
}
|
| 543 |
-
}
|
| 544 |
-
|
| 545 |
-
startGameTest() {
|
| 546 |
-
if (this.selectedModels.size !== 1) {
|
| 547 |
-
alert('Please select exactly one model for game testing.');
|
| 548 |
-
return;
|
| 549 |
-
}
|
| 550 |
-
|
| 551 |
-
const selectedModel = Array.from(this.selectedModels)[0];
|
| 552 |
-
const gameSection = document.getElementById('gameSection');
|
| 553 |
-
const gameFrame = document.getElementById('gameFrame');
|
| 554 |
-
|
| 555 |
-
// Construct URL with model parameter
|
| 556 |
-
const gameUrl = `index.html?testModel=${encodeURIComponent(selectedModel.id)}&testMode=true`;
|
| 557 |
-
if (selectedModel.provider === 'local') {
|
| 558 |
-
gameUrl += '&local=true';
|
| 559 |
-
}
|
| 560 |
-
|
| 561 |
-
gameFrame.src = gameUrl;
|
| 562 |
-
gameSection.classList.add('active');
|
| 563 |
-
|
| 564 |
-
this.log(`Starting game test with ${selectedModel.name}...`);
|
| 565 |
-
}
|
| 566 |
-
|
| 567 |
-
displayResults() {
|
| 568 |
-
const resultsSection = document.getElementById('resultsSection');
|
| 569 |
-
const resultsGrid = document.getElementById('resultsGrid');
|
| 570 |
-
|
| 571 |
-
resultsGrid.innerHTML = '';
|
| 572 |
-
|
| 573 |
-
this.framework.testResults.tests.forEach(result => {
|
| 574 |
-
const card = document.createElement('div');
|
| 575 |
-
card.className = 'result-card';
|
| 576 |
-
|
| 577 |
-
const overallScoreClass = this.getScoreClass(result.overallScore);
|
| 578 |
-
|
| 579 |
-
card.innerHTML = `
|
| 580 |
-
<h3>${result.modelName}</h3>
|
| 581 |
-
<div class="metric">
|
| 582 |
-
<span class="metric-label">Overall Score</span>
|
| 583 |
-
<span class="metric-value ${overallScoreClass}">${result.overallScore?.toFixed(1) || 'N/A'}</span>
|
| 584 |
-
</div>
|
| 585 |
-
<div class="metric">
|
| 586 |
-
<span class="metric-label">Word Selection Success</span>
|
| 587 |
-
<span class="metric-value">${(result.wordSelection?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
|
| 588 |
-
</div>
|
| 589 |
-
<div class="metric">
|
| 590 |
-
<span class="metric-label">Contextualization Success</span>
|
| 591 |
-
<span class="metric-value">${(result.contextualization?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
|
| 592 |
-
</div>
|
| 593 |
-
<div class="metric">
|
| 594 |
-
<span class="metric-label">Chat Hints Success</span>
|
| 595 |
-
<span class="metric-value">${(result.chatHints?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
|
| 596 |
-
</div>
|
| 597 |
-
<div class="metric">
|
| 598 |
-
<span class="metric-label">Average Response Time</span>
|
| 599 |
-
<span class="metric-value">${result.wordSelection?.averageTime?.toFixed(0) || 'N/A'}ms</span>
|
| 600 |
-
</div>
|
| 601 |
-
`;
|
| 602 |
-
|
| 603 |
-
resultsGrid.appendChild(card);
|
| 604 |
-
});
|
| 605 |
-
|
| 606 |
-
resultsSection.classList.add('active');
|
| 607 |
-
}
|
| 608 |
-
|
| 609 |
-
getScoreClass(score) {
|
| 610 |
-
if (score >= 80) return 'score-high';
|
| 611 |
-
if (score >= 60) return 'score-medium';
|
| 612 |
-
return 'score-low';
|
| 613 |
-
}
|
| 614 |
-
|
| 615 |
-
log(message) {
|
| 616 |
-
const testLog = document.getElementById('testLog');
|
| 617 |
-
const timestamp = new Date().toLocaleTimeString();
|
| 618 |
-
testLog.textContent += `[${timestamp}] ${message}\\n`;
|
| 619 |
-
testLog.scrollTop = testLog.scrollHeight;
|
| 620 |
-
}
|
| 621 |
-
}
|
| 622 |
-
|
| 623 |
-
// Initialize the testing UI when the page loads
|
| 624 |
-
window.addEventListener('DOMContentLoaded', () => {
|
| 625 |
-
new ModelTestingUI();
|
| 626 |
-
});
|
| 627 |
-
</script>
|
| 628 |
-
</body>
|
| 629 |
-
</html>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/modelTestingFramework.js
DELETED
|
@@ -1,703 +0,0 @@
|
|
| 1 |
-
/**
|
| 2 |
-
* Comprehensive Model Testing Framework for Cloze Reader
|
| 3 |
-
* Tests all AI-powered features across different models
|
| 4 |
-
*/
|
| 5 |
-
|
| 6 |
-
class ModelTestingFramework {
|
| 7 |
-
constructor() {
|
| 8 |
-
this.models = [
|
| 9 |
-
// OpenRouter Models
|
| 10 |
-
{ id: 'openai/gpt-4o', name: 'GPT-4o', provider: 'openrouter' },
|
| 11 |
-
{ id: 'openai/gpt-4o-mini', name: 'GPT-4o Mini', provider: 'openrouter' },
|
| 12 |
-
{ id: 'anthropic/claude-3.5-sonnet', name: 'Claude 3.5 Sonnet', provider: 'openrouter' },
|
| 13 |
-
{ id: 'anthropic/claude-3-haiku', name: 'Claude 3 Haiku', provider: 'openrouter' },
|
| 14 |
-
{ id: 'google/gemini-pro-1.5', name: 'Gemini Pro 1.5', provider: 'openrouter' },
|
| 15 |
-
{ id: 'meta-llama/llama-3.1-8b-instruct', name: 'Llama 3.1 8B', provider: 'openrouter' },
|
| 16 |
-
{ id: 'meta-llama/llama-3.1-70b-instruct', name: 'Llama 3.1 70B', provider: 'openrouter' },
|
| 17 |
-
{ id: 'mistralai/mistral-7b-instruct', name: 'Mistral 7B', provider: 'openrouter' },
|
| 18 |
-
{ id: 'microsoft/phi-3-medium-4k-instruct', name: 'Phi-3 Medium', provider: 'openrouter' },
|
| 19 |
-
{ id: 'qwen/qwen-2-7b-instruct', name: 'Qwen 2 7B', provider: 'openrouter' },
|
| 20 |
-
|
| 21 |
-
// Local LLM Models (LM Studio compatible)
|
| 22 |
-
{ id: 'local-llm', name: 'Local LLM (Auto-detect)', provider: 'local' },
|
| 23 |
-
{ id: 'gemma-3-12b', name: 'Gemma 3 12B (Local)', provider: 'local' },
|
| 24 |
-
{ id: 'llama-3.1-8b', name: 'Llama 3.1 8B (Local)', provider: 'local' },
|
| 25 |
-
{ id: 'mistral-7b', name: 'Mistral 7B (Local)', provider: 'local' },
|
| 26 |
-
{ id: 'qwen-2-7b', name: 'Qwen 2 7B (Local)', provider: 'local' },
|
| 27 |
-
{ id: 'phi-3-medium', name: 'Phi-3 Medium (Local)', provider: 'local' },
|
| 28 |
-
{ id: 'custom-local', name: 'Custom Local Model', provider: 'local' }
|
| 29 |
-
];
|
| 30 |
-
|
| 31 |
-
this.testResults = {
|
| 32 |
-
timestamp: new Date().toISOString(),
|
| 33 |
-
tests: []
|
| 34 |
-
};
|
| 35 |
-
|
| 36 |
-
this.testPassages = [
|
| 37 |
-
{
|
| 38 |
-
text: "The old man sat by the fireplace, reading his favorite book. The flames danced in the hearth, casting shadows on the walls. He turned each page carefully, savoring every word of the ancient tale.",
|
| 39 |
-
difficulty: 3,
|
| 40 |
-
expectedWords: ['favorite', 'flames', 'shadows', 'carefully', 'ancient']
|
| 41 |
-
},
|
| 42 |
-
{
|
| 43 |
-
text: "In the garden, colorful flowers bloomed under the warm sunshine. Bees buzzed from blossom to blossom, collecting nectar for their hive. The gardener watched with satisfaction as his hard work flourished.",
|
| 44 |
-
difficulty: 2,
|
| 45 |
-
expectedWords: ['colorful', 'warm', 'buzzed', 'collecting', 'satisfaction']
|
| 46 |
-
},
|
| 47 |
-
{
|
| 48 |
-
text: "The protagonist's journey through the labyrinthine corridors revealed the edifice's architectural complexity. Each ornate chamber contained mysterious artifacts that suggested an ancient civilization's sophisticated understanding of mathematics and astronomy.",
|
| 49 |
-
difficulty: 8,
|
| 50 |
-
expectedWords: ['labyrinthine', 'edifice', 'architectural', 'ornate', 'artifacts', 'civilization', 'sophisticated']
|
| 51 |
-
}
|
| 52 |
-
];
|
| 53 |
-
|
| 54 |
-
this.chatQuestions = [
|
| 55 |
-
{ type: 'part_of_speech', prompt: 'What part of speech is this word?' },
|
| 56 |
-
{ type: 'sentence_role', prompt: 'What role does this word play in the sentence?' },
|
| 57 |
-
{ type: 'word_category', prompt: 'What category or type of word is this?' },
|
| 58 |
-
{ type: 'synonym', prompt: 'Can you suggest a synonym for this word?' }
|
| 59 |
-
];
|
| 60 |
-
}
|
| 61 |
-
|
| 62 |
-
async runComprehensiveTest(selectedModels = null) {
|
| 63 |
-
const modelsToTest = selectedModels || this.models;
|
| 64 |
-
console.log(`Starting comprehensive test of ${modelsToTest.length} models...`);
|
| 65 |
-
|
| 66 |
-
for (const model of modelsToTest) {
|
| 67 |
-
console.log(`\nTesting model: ${model.name}`);
|
| 68 |
-
const modelResults = await this.testModel(model);
|
| 69 |
-
this.testResults.tests.push(modelResults);
|
| 70 |
-
|
| 71 |
-
// Save intermediate results
|
| 72 |
-
await this.saveResults();
|
| 73 |
-
}
|
| 74 |
-
|
| 75 |
-
console.log('\nAll tests completed!');
|
| 76 |
-
return this.testResults;
|
| 77 |
-
}
|
| 78 |
-
|
| 79 |
-
async testModel(model) {
|
| 80 |
-
const startTime = Date.now();
|
| 81 |
-
const results = {
|
| 82 |
-
modelId: model.id,
|
| 83 |
-
modelName: model.name,
|
| 84 |
-
provider: model.provider,
|
| 85 |
-
timestamp: new Date().toISOString(),
|
| 86 |
-
totalTime: 0,
|
| 87 |
-
wordSelection: {},
|
| 88 |
-
contextualization: {},
|
| 89 |
-
chatHints: {},
|
| 90 |
-
errorRates: {},
|
| 91 |
-
overallScore: 0
|
| 92 |
-
};
|
| 93 |
-
|
| 94 |
-
try {
|
| 95 |
-
// Test word selection across different difficulty levels
|
| 96 |
-
results.wordSelection = await this.testWordSelection(model);
|
| 97 |
-
|
| 98 |
-
// Test contextualization
|
| 99 |
-
results.contextualization = await this.testContextualization(model);
|
| 100 |
-
|
| 101 |
-
// Test chat hint generation
|
| 102 |
-
results.chatHints = await this.testChatHints(model);
|
| 103 |
-
|
| 104 |
-
// Calculate overall metrics
|
| 105 |
-
results.totalTime = Date.now() - startTime;
|
| 106 |
-
results.overallScore = this.calculateOverallScore(results);
|
| 107 |
-
|
| 108 |
-
} catch (error) {
|
| 109 |
-
console.error(`Error testing model ${model.name}:`, error);
|
| 110 |
-
results.error = error.message;
|
| 111 |
-
results.overallScore = 0;
|
| 112 |
-
}
|
| 113 |
-
|
| 114 |
-
return results;
|
| 115 |
-
}
|
| 116 |
-
|
| 117 |
-
async testWordSelection(model) {
|
| 118 |
-
const results = {
|
| 119 |
-
tests: [],
|
| 120 |
-
averageTime: 0,
|
| 121 |
-
successRate: 0,
|
| 122 |
-
qualityScore: 0,
|
| 123 |
-
difficultyAccuracy: 0
|
| 124 |
-
};
|
| 125 |
-
|
| 126 |
-
let totalTime = 0;
|
| 127 |
-
let successCount = 0;
|
| 128 |
-
let qualitySum = 0;
|
| 129 |
-
let difficultySum = 0;
|
| 130 |
-
|
| 131 |
-
for (const passage of this.testPassages) {
|
| 132 |
-
const testStart = Date.now();
|
| 133 |
-
|
| 134 |
-
try {
|
| 135 |
-
const words = await this.performWordSelection(model, passage);
|
| 136 |
-
const testTime = Date.now() - testStart;
|
| 137 |
-
totalTime += testTime;
|
| 138 |
-
|
| 139 |
-
const test = {
|
| 140 |
-
passageLength: passage.text.length,
|
| 141 |
-
targetDifficulty: passage.difficulty,
|
| 142 |
-
responseTime: testTime,
|
| 143 |
-
selectedWords: words,
|
| 144 |
-
wordCount: words.length,
|
| 145 |
-
success: words.length > 0,
|
| 146 |
-
qualityScore: this.evaluateWordQuality(words, passage),
|
| 147 |
-
difficultyScore: this.evaluateDifficultyMatch(words, passage.difficulty)
|
| 148 |
-
};
|
| 149 |
-
|
| 150 |
-
results.tests.push(test);
|
| 151 |
-
|
| 152 |
-
if (test.success) {
|
| 153 |
-
successCount++;
|
| 154 |
-
qualitySum += test.qualityScore;
|
| 155 |
-
difficultySum += test.difficultyScore;
|
| 156 |
-
}
|
| 157 |
-
|
| 158 |
-
} catch (error) {
|
| 159 |
-
results.tests.push({
|
| 160 |
-
passageLength: passage.text.length,
|
| 161 |
-
targetDifficulty: passage.difficulty,
|
| 162 |
-
responseTime: Date.now() - testStart,
|
| 163 |
-
error: error.message,
|
| 164 |
-
success: false
|
| 165 |
-
});
|
| 166 |
-
}
|
| 167 |
-
|
| 168 |
-
// Brief pause between tests
|
| 169 |
-
await new Promise(resolve => setTimeout(resolve, 1000));
|
| 170 |
-
}
|
| 171 |
-
|
| 172 |
-
results.averageTime = totalTime / this.testPassages.length;
|
| 173 |
-
results.successRate = successCount / this.testPassages.length;
|
| 174 |
-
results.qualityScore = successCount > 0 ? qualitySum / successCount : 0;
|
| 175 |
-
results.difficultyAccuracy = successCount > 0 ? difficultySum / successCount : 0;
|
| 176 |
-
|
| 177 |
-
return results;
|
| 178 |
-
}
|
| 179 |
-
|
| 180 |
-
async testContextualization(model) {
|
| 181 |
-
const results = {
|
| 182 |
-
tests: [],
|
| 183 |
-
averageTime: 0,
|
| 184 |
-
successRate: 0,
|
| 185 |
-
relevanceScore: 0
|
| 186 |
-
};
|
| 187 |
-
|
| 188 |
-
const testBooks = [
|
| 189 |
-
{ title: 'Pride and Prejudice', author: 'Jane Austen' },
|
| 190 |
-
{ title: 'The Adventures of Tom Sawyer', author: 'Mark Twain' },
|
| 191 |
-
{ title: 'Moby Dick', author: 'Herman Melville' }
|
| 192 |
-
];
|
| 193 |
-
|
| 194 |
-
let totalTime = 0;
|
| 195 |
-
let successCount = 0;
|
| 196 |
-
let relevanceSum = 0;
|
| 197 |
-
|
| 198 |
-
for (const book of testBooks) {
|
| 199 |
-
const testStart = Date.now();
|
| 200 |
-
|
| 201 |
-
try {
|
| 202 |
-
const context = await this.performContextualization(model, book);
|
| 203 |
-
const testTime = Date.now() - testStart;
|
| 204 |
-
totalTime += testTime;
|
| 205 |
-
|
| 206 |
-
const test = {
|
| 207 |
-
bookTitle: book.title,
|
| 208 |
-
author: book.author,
|
| 209 |
-
responseTime: testTime,
|
| 210 |
-
contextLength: context.length,
|
| 211 |
-
success: context.length > 0,
|
| 212 |
-
relevanceScore: this.evaluateContextRelevance(context, book)
|
| 213 |
-
};
|
| 214 |
-
|
| 215 |
-
results.tests.push(test);
|
| 216 |
-
|
| 217 |
-
if (test.success) {
|
| 218 |
-
successCount++;
|
| 219 |
-
relevanceSum += test.relevanceScore;
|
| 220 |
-
}
|
| 221 |
-
|
| 222 |
-
} catch (error) {
|
| 223 |
-
results.tests.push({
|
| 224 |
-
bookTitle: book.title,
|
| 225 |
-
author: book.author,
|
| 226 |
-
responseTime: Date.now() - testStart,
|
| 227 |
-
error: error.message,
|
| 228 |
-
success: false
|
| 229 |
-
});
|
| 230 |
-
}
|
| 231 |
-
|
| 232 |
-
await new Promise(resolve => setTimeout(resolve, 1000));
|
| 233 |
-
}
|
| 234 |
-
|
| 235 |
-
results.averageTime = totalTime / testBooks.length;
|
| 236 |
-
results.successRate = successCount / testBooks.length;
|
| 237 |
-
results.relevanceScore = successCount > 0 ? relevanceSum / successCount : 0;
|
| 238 |
-
|
| 239 |
-
return results;
|
| 240 |
-
}
|
| 241 |
-
|
| 242 |
-
async testChatHints(model) {
|
| 243 |
-
const results = {
|
| 244 |
-
tests: [],
|
| 245 |
-
averageTime: 0,
|
| 246 |
-
successRate: 0,
|
| 247 |
-
helpfulnessScore: 0,
|
| 248 |
-
questionTypePerformance: {}
|
| 249 |
-
};
|
| 250 |
-
|
| 251 |
-
const testWords = [
|
| 252 |
-
{ word: 'magnificent', sentence: 'The cathedral was truly magnificent.', difficulty: 5 },
|
| 253 |
-
{ word: 'whispered', sentence: 'She whispered the secret to her friend.', difficulty: 3 },
|
| 254 |
-
{ word: 'extraordinary', sentence: 'His performance was extraordinary.', difficulty: 7 }
|
| 255 |
-
];
|
| 256 |
-
|
| 257 |
-
let totalTime = 0;
|
| 258 |
-
let successCount = 0;
|
| 259 |
-
let helpfulnessSum = 0;
|
| 260 |
-
|
| 261 |
-
// Initialize question type tracking
|
| 262 |
-
this.chatQuestions.forEach(q => {
|
| 263 |
-
results.questionTypePerformance[q.type] = {
|
| 264 |
-
tests: 0,
|
| 265 |
-
successes: 0,
|
| 266 |
-
averageScore: 0
|
| 267 |
-
};
|
| 268 |
-
});
|
| 269 |
-
|
| 270 |
-
for (const testWord of testWords) {
|
| 271 |
-
for (const question of this.chatQuestions) {
|
| 272 |
-
const testStart = Date.now();
|
| 273 |
-
|
| 274 |
-
try {
|
| 275 |
-
const hint = await this.performChatHint(model, testWord, question);
|
| 276 |
-
const testTime = Date.now() - testStart;
|
| 277 |
-
totalTime += testTime;
|
| 278 |
-
|
| 279 |
-
const helpfulnessScore = this.evaluateHintHelpfulness(hint, testWord, question);
|
| 280 |
-
|
| 281 |
-
const test = {
|
| 282 |
-
word: testWord.word,
|
| 283 |
-
questionType: question.type,
|
| 284 |
-
difficulty: testWord.difficulty,
|
| 285 |
-
responseTime: testTime,
|
| 286 |
-
hintLength: hint.length,
|
| 287 |
-
success: hint.length > 10, // Minimum meaningful response
|
| 288 |
-
helpfulnessScore: helpfulnessScore
|
| 289 |
-
};
|
| 290 |
-
|
| 291 |
-
results.tests.push(test);
|
| 292 |
-
|
| 293 |
-
// Update question type performance
|
| 294 |
-
const qtPerf = results.questionTypePerformance[question.type];
|
| 295 |
-
qtPerf.tests++;
|
| 296 |
-
|
| 297 |
-
if (test.success) {
|
| 298 |
-
successCount++;
|
| 299 |
-
helpfulnessSum += helpfulnessScore;
|
| 300 |
-
qtPerf.successes++;
|
| 301 |
-
qtPerf.averageScore += helpfulnessScore;
|
| 302 |
-
}
|
| 303 |
-
|
| 304 |
-
} catch (error) {
|
| 305 |
-
results.tests.push({
|
| 306 |
-
word: testWord.word,
|
| 307 |
-
questionType: question.type,
|
| 308 |
-
difficulty: testWord.difficulty,
|
| 309 |
-
responseTime: Date.now() - testStart,
|
| 310 |
-
error: error.message,
|
| 311 |
-
success: false
|
| 312 |
-
});
|
| 313 |
-
|
| 314 |
-
results.questionTypePerformance[question.type].tests++;
|
| 315 |
-
}
|
| 316 |
-
|
| 317 |
-
await new Promise(resolve => setTimeout(resolve, 500));
|
| 318 |
-
}
|
| 319 |
-
}
|
| 320 |
-
|
| 321 |
-
// Calculate averages for question types
|
| 322 |
-
Object.keys(results.questionTypePerformance).forEach(type => {
|
| 323 |
-
const perf = results.questionTypePerformance[type];
|
| 324 |
-
perf.successRate = perf.tests > 0 ? perf.successes / perf.tests : 0;
|
| 325 |
-
perf.averageScore = perf.successes > 0 ? perf.averageScore / perf.successes : 0;
|
| 326 |
-
});
|
| 327 |
-
|
| 328 |
-
const totalTests = testWords.length * this.chatQuestions.length;
|
| 329 |
-
results.averageTime = totalTime / totalTests;
|
| 330 |
-
results.successRate = successCount / totalTests;
|
| 331 |
-
results.helpfulnessScore = successCount > 0 ? helpfulnessSum / successCount : 0;
|
| 332 |
-
|
| 333 |
-
return results;
|
| 334 |
-
}
|
| 335 |
-
|
| 336 |
-
async performWordSelection(model, passage) {
|
| 337 |
-
// Create a temporary AI service instance for this model
|
| 338 |
-
const aiService = await this.createModelAIService(model);
|
| 339 |
-
|
| 340 |
-
const prompt = `Select ${Math.min(3, Math.floor(passage.difficulty / 2) + 1)} appropriate words to remove from this passage for a cloze exercise at difficulty level ${passage.difficulty}:
|
| 341 |
-
|
| 342 |
-
"${passage.text}"
|
| 343 |
-
|
| 344 |
-
Return only a JSON array of words, like: ["word1", "word2", "word3"]`;
|
| 345 |
-
|
| 346 |
-
const response = await aiService.makeAIRequest(prompt);
|
| 347 |
-
|
| 348 |
-
try {
|
| 349 |
-
return JSON.parse(response);
|
| 350 |
-
} catch {
|
| 351 |
-
// Try to extract words from non-JSON response
|
| 352 |
-
const matches = response.match(/\[.*?\]/);
|
| 353 |
-
if (matches) {
|
| 354 |
-
return JSON.parse(matches[0]);
|
| 355 |
-
}
|
| 356 |
-
return [];
|
| 357 |
-
}
|
| 358 |
-
}
|
| 359 |
-
|
| 360 |
-
async performContextualization(model, book) {
|
| 361 |
-
const aiService = await this.createModelAIService(model);
|
| 362 |
-
|
| 363 |
-
const prompt = `Provide a brief historical and literary context for "${book.title}" by ${book.author}. Keep it concise and educational, suitable for language learners.`;
|
| 364 |
-
|
| 365 |
-
return await aiService.makeAIRequest(prompt);
|
| 366 |
-
}
|
| 367 |
-
|
| 368 |
-
async performChatHint(model, testWord, question) {
|
| 369 |
-
const aiService = await this.createModelAIService(model);
|
| 370 |
-
|
| 371 |
-
const prompt = `You are helping a student understand a word in context. The word is "${testWord.word}" in the sentence: "${testWord.sentence}"
|
| 372 |
-
|
| 373 |
-
${question.prompt}
|
| 374 |
-
|
| 375 |
-
Provide a helpful hint without revealing the word directly. Keep your response concise and educational.`;
|
| 376 |
-
|
| 377 |
-
return await aiService.makeAIRequest(prompt);
|
| 378 |
-
}
|
| 379 |
-
|
| 380 |
-
async createModelAIService(model) {
|
| 381 |
-
// Use the testing AI service for better performance tracking
|
| 382 |
-
const { TestAIService } = await import('./testAIService.js');
|
| 383 |
-
|
| 384 |
-
const config = {
|
| 385 |
-
modelId: model.id,
|
| 386 |
-
provider: model.provider,
|
| 387 |
-
isLocal: model.provider === 'local'
|
| 388 |
-
};
|
| 389 |
-
|
| 390 |
-
return new TestAIService(config);
|
| 391 |
-
}
|
| 392 |
-
|
| 393 |
-
async detectLocalModels() {
|
| 394 |
-
// Attempt to detect available local models from LM Studio
|
| 395 |
-
try {
|
| 396 |
-
const response = await fetch('http://localhost:1234/v1/models');
|
| 397 |
-
if (response.ok) {
|
| 398 |
-
const data = await response.json();
|
| 399 |
-
const detectedModels = data.data.map(model => ({
|
| 400 |
-
id: model.id,
|
| 401 |
-
name: `${model.id} (Local)`,
|
| 402 |
-
provider: 'local'
|
| 403 |
-
}));
|
| 404 |
-
|
| 405 |
-
// Update the local models list
|
| 406 |
-
this.models = this.models.filter(m => m.provider !== 'local');
|
| 407 |
-
this.models.push(...detectedModels);
|
| 408 |
-
|
| 409 |
-
return detectedModels;
|
| 410 |
-
}
|
| 411 |
-
} catch (error) {
|
| 412 |
-
console.log('No local LM Studio server detected on port 1234');
|
| 413 |
-
}
|
| 414 |
-
|
| 415 |
-
// Return default local models if detection fails
|
| 416 |
-
return this.models.filter(m => m.provider === 'local');
|
| 417 |
-
}
|
| 418 |
-
|
| 419 |
-
async testLocalServerConnection() {
|
| 420 |
-
try {
|
| 421 |
-
const response = await fetch('http://localhost:1234/v1/models', {
|
| 422 |
-
method: 'GET',
|
| 423 |
-
headers: {
|
| 424 |
-
'Content-Type': 'application/json'
|
| 425 |
-
}
|
| 426 |
-
});
|
| 427 |
-
|
| 428 |
-
if (response.ok) {
|
| 429 |
-
const data = await response.json();
|
| 430 |
-
return {
|
| 431 |
-
connected: true,
|
| 432 |
-
models: data.data || [],
|
| 433 |
-
serverInfo: data
|
| 434 |
-
};
|
| 435 |
-
} else {
|
| 436 |
-
return {
|
| 437 |
-
connected: false,
|
| 438 |
-
error: `HTTP ${response.status}: ${response.statusText}`
|
| 439 |
-
};
|
| 440 |
-
}
|
| 441 |
-
} catch (error) {
|
| 442 |
-
return {
|
| 443 |
-
connected: false,
|
| 444 |
-
error: error.message
|
| 445 |
-
};
|
| 446 |
-
}
|
| 447 |
-
}
|
| 448 |
-
|
| 449 |
-
evaluateWordQuality(words, passage) {
|
| 450 |
-
if (!words || words.length === 0) return 0;
|
| 451 |
-
|
| 452 |
-
let score = 0;
|
| 453 |
-
const text = passage.text.toLowerCase();
|
| 454 |
-
|
| 455 |
-
for (const word of words) {
|
| 456 |
-
const wordLower = word.toLowerCase();
|
| 457 |
-
|
| 458 |
-
// Check if word exists in passage
|
| 459 |
-
if (text.includes(wordLower)) score += 20;
|
| 460 |
-
|
| 461 |
-
// Check word length appropriateness
|
| 462 |
-
const expectedMinLength = Math.max(4, passage.difficulty);
|
| 463 |
-
const expectedMaxLength = Math.min(12, passage.difficulty + 6);
|
| 464 |
-
|
| 465 |
-
if (word.length >= expectedMinLength && word.length <= expectedMaxLength) {
|
| 466 |
-
score += 15;
|
| 467 |
-
}
|
| 468 |
-
|
| 469 |
-
// Avoid overly common words for higher difficulties
|
| 470 |
-
const commonWords = ['the', 'and', 'but', 'for', 'are', 'was', 'his', 'her'];
|
| 471 |
-
if (passage.difficulty > 5 && !commonWords.includes(wordLower)) {
|
| 472 |
-
score += 10;
|
| 473 |
-
}
|
| 474 |
-
}
|
| 475 |
-
|
| 476 |
-
return Math.min(100, score / words.length);
|
| 477 |
-
}
|
| 478 |
-
|
| 479 |
-
evaluateDifficultyMatch(words, targetDifficulty) {
|
| 480 |
-
if (!words || words.length === 0) return 0;
|
| 481 |
-
|
| 482 |
-
let score = 0;
|
| 483 |
-
|
| 484 |
-
for (const word of words) {
|
| 485 |
-
const wordLength = word.length;
|
| 486 |
-
const expectedMin = Math.max(4, targetDifficulty);
|
| 487 |
-
const expectedMax = Math.min(14, targetDifficulty + 6);
|
| 488 |
-
|
| 489 |
-
if (wordLength >= expectedMin && wordLength <= expectedMax) {
|
| 490 |
-
score += 100;
|
| 491 |
-
} else {
|
| 492 |
-
// Partial credit for close matches
|
| 493 |
-
const distance = Math.min(
|
| 494 |
-
Math.abs(wordLength - expectedMin),
|
| 495 |
-
Math.abs(wordLength - expectedMax)
|
| 496 |
-
);
|
| 497 |
-
score += Math.max(0, 100 - (distance * 20));
|
| 498 |
-
}
|
| 499 |
-
}
|
| 500 |
-
|
| 501 |
-
return score / words.length;
|
| 502 |
-
}
|
| 503 |
-
|
| 504 |
-
evaluateContextRelevance(context, book) {
|
| 505 |
-
if (!context || context.length < 20) return 0;
|
| 506 |
-
|
| 507 |
-
let score = 0;
|
| 508 |
-
const contextLower = context.toLowerCase();
|
| 509 |
-
|
| 510 |
-
// Check for book title mention
|
| 511 |
-
if (contextLower.includes(book.title.toLowerCase())) score += 25;
|
| 512 |
-
|
| 513 |
-
// Check for author mention
|
| 514 |
-
if (contextLower.includes(book.author.toLowerCase().split(' ').pop())) score += 25;
|
| 515 |
-
|
| 516 |
-
// Check for literary/historical terms
|
| 517 |
-
const literaryTerms = ['novel', 'literature', 'author', 'published', 'century', 'period', 'style', 'theme'];
|
| 518 |
-
const foundTerms = literaryTerms.filter(term => contextLower.includes(term));
|
| 519 |
-
score += Math.min(30, foundTerms.length * 5);
|
| 520 |
-
|
| 521 |
-
// Length appropriateness (100-500 chars is good)
|
| 522 |
-
if (context.length >= 100 && context.length <= 500) score += 20;
|
| 523 |
-
|
| 524 |
-
return Math.min(100, score);
|
| 525 |
-
}
|
| 526 |
-
|
| 527 |
-
evaluateHintHelpfulness(hint, testWord, question) {
|
| 528 |
-
if (!hint || hint.length < 10) return 0;
|
| 529 |
-
|
| 530 |
-
let score = 0;
|
| 531 |
-
const hintLower = hint.toLowerCase();
|
| 532 |
-
const wordLower = testWord.word.toLowerCase();
|
| 533 |
-
|
| 534 |
-
// Penalize if the word is revealed directly
|
| 535 |
-
if (hintLower.includes(wordLower)) {
|
| 536 |
-
score -= 50;
|
| 537 |
-
}
|
| 538 |
-
|
| 539 |
-
// Check for question-appropriate responses
|
| 540 |
-
switch (question.type) {
|
| 541 |
-
case 'part_of_speech':
|
| 542 |
-
const posTerms = ['noun', 'verb', 'adjective', 'adverb', 'pronoun'];
|
| 543 |
-
if (posTerms.some(term => hintLower.includes(term))) score += 40;
|
| 544 |
-
break;
|
| 545 |
-
|
| 546 |
-
case 'sentence_role':
|
| 547 |
-
const roleTerms = ['subject', 'object', 'predicate', 'modifier', 'describes'];
|
| 548 |
-
if (roleTerms.some(term => hintLower.includes(term))) score += 40;
|
| 549 |
-
break;
|
| 550 |
-
|
| 551 |
-
case 'word_category':
|
| 552 |
-
const categoryTerms = ['type', 'kind', 'category', 'group', 'family'];
|
| 553 |
-
if (categoryTerms.some(term => hintLower.includes(term))) score += 40;
|
| 554 |
-
break;
|
| 555 |
-
|
| 556 |
-
case 'synonym':
|
| 557 |
-
const synonymTerms = ['similar', 'means', 'like', 'same as', 'equivalent'];
|
| 558 |
-
if (synonymTerms.some(term => hintLower.includes(term))) score += 40;
|
| 559 |
-
break;
|
| 560 |
-
}
|
| 561 |
-
|
| 562 |
-
// Length appropriateness
|
| 563 |
-
if (hint.length >= 20 && hint.length <= 200) score += 30;
|
| 564 |
-
|
| 565 |
-
// Educational tone
|
| 566 |
-
const educationalTerms = ['this word', 'in this context', 'here', 'sentence'];
|
| 567 |
-
if (educationalTerms.some(term => hintLower.includes(term))) score += 20;
|
| 568 |
-
|
| 569 |
-
return Math.max(0, Math.min(100, score));
|
| 570 |
-
}
|
| 571 |
-
|
| 572 |
-
calculateOverallScore(results) {
|
| 573 |
-
const weights = {
|
| 574 |
-
wordSelection: 0.4,
|
| 575 |
-
contextualization: 0.3,
|
| 576 |
-
chatHints: 0.3
|
| 577 |
-
};
|
| 578 |
-
|
| 579 |
-
let totalScore = 0;
|
| 580 |
-
|
| 581 |
-
if (results.wordSelection.successRate !== undefined) {
|
| 582 |
-
totalScore += results.wordSelection.successRate * 40 * weights.wordSelection;
|
| 583 |
-
}
|
| 584 |
-
|
| 585 |
-
if (results.contextualization.successRate !== undefined) {
|
| 586 |
-
totalScore += results.contextualization.successRate * 50 * weights.contextualization;
|
| 587 |
-
}
|
| 588 |
-
|
| 589 |
-
if (results.chatHints.successRate !== undefined) {
|
| 590 |
-
totalScore += results.chatHints.successRate * 60 * weights.chatHints;
|
| 591 |
-
}
|
| 592 |
-
|
| 593 |
-
// Bonus for consistent performance across all areas
|
| 594 |
-
const allAreas = [results.wordSelection, results.contextualization, results.chatHints];
|
| 595 |
-
const minSuccess = Math.min(...allAreas.map(area => area.successRate || 0));
|
| 596 |
-
if (minSuccess > 0.8) totalScore += 10;
|
| 597 |
-
|
| 598 |
-
return Math.min(100, totalScore);
|
| 599 |
-
}
|
| 600 |
-
|
| 601 |
-
async saveResults() {
|
| 602 |
-
const csvContent = this.generateCSV();
|
| 603 |
-
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
|
| 604 |
-
const filename = `model_test_results_${timestamp}.csv`;
|
| 605 |
-
|
| 606 |
-
// Browser environment - download file
|
| 607 |
-
this.downloadCSV(csvContent, filename);
|
| 608 |
-
|
| 609 |
-
console.log(`Results saved as ${filename}`);
|
| 610 |
-
return filename;
|
| 611 |
-
}
|
| 612 |
-
|
| 613 |
-
downloadCSV(content, filename) {
|
| 614 |
-
const blob = new Blob([content], { type: 'text/csv' });
|
| 615 |
-
const url = URL.createObjectURL(blob);
|
| 616 |
-
|
| 617 |
-
const a = document.createElement('a');
|
| 618 |
-
a.href = url;
|
| 619 |
-
a.download = filename;
|
| 620 |
-
document.body.appendChild(a);
|
| 621 |
-
a.click();
|
| 622 |
-
document.body.removeChild(a);
|
| 623 |
-
URL.revokeObjectURL(url);
|
| 624 |
-
}
|
| 625 |
-
|
| 626 |
-
generateCSV() {
|
| 627 |
-
const headers = [
|
| 628 |
-
'Model Name',
|
| 629 |
-
'Model ID',
|
| 630 |
-
'Provider',
|
| 631 |
-
'Timestamp',
|
| 632 |
-
'Total Time (ms)',
|
| 633 |
-
'Overall Score',
|
| 634 |
-
'Word Selection Success Rate',
|
| 635 |
-
'Word Selection Avg Time (ms)',
|
| 636 |
-
'Word Selection Quality Score',
|
| 637 |
-
'Word Selection Difficulty Accuracy',
|
| 638 |
-
'Contextualization Success Rate',
|
| 639 |
-
'Contextualization Avg Time (ms)',
|
| 640 |
-
'Contextualization Relevance Score',
|
| 641 |
-
'Chat Hints Success Rate',
|
| 642 |
-
'Chat Hints Avg Time (ms)',
|
| 643 |
-
'Chat Hints Helpfulness Score',
|
| 644 |
-
'Part of Speech Success Rate',
|
| 645 |
-
'Sentence Role Success Rate',
|
| 646 |
-
'Word Category Success Rate',
|
| 647 |
-
'Synonym Success Rate',
|
| 648 |
-
'User Satisfaction Score',
|
| 649 |
-
'Word Selection User Rating',
|
| 650 |
-
'Passage Quality User Rating',
|
| 651 |
-
'Hint Helpfulness User Rating',
|
| 652 |
-
'Overall Experience User Rating',
|
| 653 |
-
'User Comments Count',
|
| 654 |
-
'Error Message'
|
| 655 |
-
];
|
| 656 |
-
|
| 657 |
-
const rows = [headers.join(',')];
|
| 658 |
-
|
| 659 |
-
for (const test of this.testResults.tests) {
|
| 660 |
-
// Get user ranking data if available
|
| 661 |
-
const userRankings = test.userRankings || {};
|
| 662 |
-
const userSatisfaction = userRankings.overallUserSatisfaction || 0;
|
| 663 |
-
const avgRatings = userRankings.averageRatings || {};
|
| 664 |
-
const commentsCount = userRankings.comments?.length || 0;
|
| 665 |
-
|
| 666 |
-
const row = [
|
| 667 |
-
`"${test.modelName}"`,
|
| 668 |
-
`"${test.modelId}"`,
|
| 669 |
-
`"${test.provider}"`,
|
| 670 |
-
`"${test.timestamp}"`,
|
| 671 |
-
test.totalTime || 0,
|
| 672 |
-
test.overallScore || 0,
|
| 673 |
-
test.wordSelection?.successRate || 0,
|
| 674 |
-
test.wordSelection?.averageTime || 0,
|
| 675 |
-
test.wordSelection?.qualityScore || 0,
|
| 676 |
-
test.wordSelection?.difficultyAccuracy || 0,
|
| 677 |
-
test.contextualization?.successRate || 0,
|
| 678 |
-
test.contextualization?.averageTime || 0,
|
| 679 |
-
test.contextualization?.relevanceScore || 0,
|
| 680 |
-
test.chatHints?.successRate || 0,
|
| 681 |
-
test.chatHints?.averageTime || 0,
|
| 682 |
-
test.chatHints?.helpfulnessScore || 0,
|
| 683 |
-
test.chatHints?.questionTypePerformance?.part_of_speech?.successRate || 0,
|
| 684 |
-
test.chatHints?.questionTypePerformance?.sentence_role?.successRate || 0,
|
| 685 |
-
test.chatHints?.questionTypePerformance?.word_category?.successRate || 0,
|
| 686 |
-
test.chatHints?.questionTypePerformance?.synonym?.successRate || 0,
|
| 687 |
-
userSatisfaction.toFixed(2),
|
| 688 |
-
avgRatings.word_selection?.toFixed(2) || 0,
|
| 689 |
-
avgRatings.passage_quality?.toFixed(2) || 0,
|
| 690 |
-
avgRatings.hint_helpfulness?.toFixed(2) || 0,
|
| 691 |
-
avgRatings.overall_experience?.toFixed(2) || 0,
|
| 692 |
-
commentsCount,
|
| 693 |
-
`"${test.error || ''}"`
|
| 694 |
-
];
|
| 695 |
-
|
| 696 |
-
rows.push(row.join(','));
|
| 697 |
-
}
|
| 698 |
-
|
| 699 |
-
return rows.join('\n');
|
| 700 |
-
}
|
| 701 |
-
}
|
| 702 |
-
|
| 703 |
-
export { ModelTestingFramework };
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/testAIService.js
DELETED
|
@@ -1,154 +0,0 @@
|
|
| 1 |
-
/**
|
| 2 |
-
* Testing-specific AI Service wrapper
|
| 3 |
-
* Extends the main AI service with testing capabilities
|
| 4 |
-
*/
|
| 5 |
-
|
| 6 |
-
class TestAIService {
|
| 7 |
-
constructor(config) {
|
| 8 |
-
this.modelId = config.modelId;
|
| 9 |
-
this.provider = config.provider;
|
| 10 |
-
this.isLocal = config.isLocal || config.provider === 'local';
|
| 11 |
-
this.baseUrl = this.isLocal ? 'http://localhost:1234' : 'https://openrouter.ai/api/v1';
|
| 12 |
-
this.apiKey = this.isLocal ? 'test-key' : this.getApiKey();
|
| 13 |
-
|
| 14 |
-
// Performance tracking
|
| 15 |
-
this.requestCount = 0;
|
| 16 |
-
this.totalResponseTime = 0;
|
| 17 |
-
this.errorCount = 0;
|
| 18 |
-
this.lastError = null;
|
| 19 |
-
}
|
| 20 |
-
|
| 21 |
-
getApiKey() {
|
| 22 |
-
// Try to get API key from meta tag (injected by server)
|
| 23 |
-
const metaTag = document.querySelector('meta[name="openrouter-api-key"]');
|
| 24 |
-
if (metaTag) {
|
| 25 |
-
return metaTag.content;
|
| 26 |
-
}
|
| 27 |
-
|
| 28 |
-
// Fallback to environment variable (for Node.js testing)
|
| 29 |
-
if (typeof process !== 'undefined' && process.env) {
|
| 30 |
-
return process.env.OPENROUTER_API_KEY;
|
| 31 |
-
}
|
| 32 |
-
|
| 33 |
-
return null;
|
| 34 |
-
}
|
| 35 |
-
|
| 36 |
-
async makeAIRequest(prompt, options = {}) {
|
| 37 |
-
const startTime = Date.now();
|
| 38 |
-
this.requestCount++;
|
| 39 |
-
|
| 40 |
-
try {
|
| 41 |
-
const response = await this.performRequest(prompt, options);
|
| 42 |
-
this.totalResponseTime += Date.now() - startTime;
|
| 43 |
-
return response;
|
| 44 |
-
} catch (error) {
|
| 45 |
-
this.errorCount++;
|
| 46 |
-
this.lastError = error;
|
| 47 |
-
this.totalResponseTime += Date.now() - startTime;
|
| 48 |
-
throw error;
|
| 49 |
-
}
|
| 50 |
-
}
|
| 51 |
-
|
| 52 |
-
async performRequest(prompt, options = {}) {
|
| 53 |
-
const requestBody = {
|
| 54 |
-
model: this.modelId,
|
| 55 |
-
messages: [
|
| 56 |
-
{
|
| 57 |
-
role: "user",
|
| 58 |
-
content: prompt
|
| 59 |
-
}
|
| 60 |
-
],
|
| 61 |
-
max_tokens: options.maxTokens || 500,
|
| 62 |
-
temperature: options.temperature || 0.7,
|
| 63 |
-
top_p: options.topP || 0.9
|
| 64 |
-
};
|
| 65 |
-
|
| 66 |
-
const headers = {
|
| 67 |
-
'Content-Type': 'application/json',
|
| 68 |
-
'Authorization': `Bearer ${this.apiKey}`
|
| 69 |
-
};
|
| 70 |
-
|
| 71 |
-
if (!this.isLocal) {
|
| 72 |
-
headers['HTTP-Referer'] = window.location.origin;
|
| 73 |
-
}
|
| 74 |
-
|
| 75 |
-
const controller = new AbortController();
|
| 76 |
-
const timeoutId = setTimeout(() => controller.abort(), 30000); // 30 second timeout
|
| 77 |
-
|
| 78 |
-
try {
|
| 79 |
-
const response = await fetch(`${this.baseUrl}/chat/completions`, {
|
| 80 |
-
method: 'POST',
|
| 81 |
-
headers: headers,
|
| 82 |
-
body: JSON.stringify(requestBody),
|
| 83 |
-
signal: controller.signal
|
| 84 |
-
});
|
| 85 |
-
|
| 86 |
-
clearTimeout(timeoutId);
|
| 87 |
-
|
| 88 |
-
if (!response.ok) {
|
| 89 |
-
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
|
| 90 |
-
}
|
| 91 |
-
|
| 92 |
-
const data = await response.json();
|
| 93 |
-
|
| 94 |
-
if (!data.choices || data.choices.length === 0) {
|
| 95 |
-
throw new Error('No response from AI service');
|
| 96 |
-
}
|
| 97 |
-
|
| 98 |
-
let content = data.choices[0].message.content;
|
| 99 |
-
|
| 100 |
-
// Clean up local LLM response artifacts
|
| 101 |
-
if (this.isLocal) {
|
| 102 |
-
content = this.cleanLocalLLMResponse(content);
|
| 103 |
-
}
|
| 104 |
-
|
| 105 |
-
return content;
|
| 106 |
-
} catch (error) {
|
| 107 |
-
clearTimeout(timeoutId);
|
| 108 |
-
if (error.name === 'AbortError') {
|
| 109 |
-
throw new Error('Request timeout');
|
| 110 |
-
}
|
| 111 |
-
throw error;
|
| 112 |
-
}
|
| 113 |
-
}
|
| 114 |
-
|
| 115 |
-
cleanLocalLLMResponse(content) {
|
| 116 |
-
// Remove common local LLM artifacts
|
| 117 |
-
content = content.replace(/^\[.*?\]\s*/, ''); // Remove leading brackets
|
| 118 |
-
content = content.replace(/\s*\[.*?\]$/, ''); // Remove trailing brackets
|
| 119 |
-
content = content.replace(/^"(.*)"$/, '$1'); // Remove surrounding quotes
|
| 120 |
-
content = content.replace(/\\n/g, '\n'); // Fix escaped newlines
|
| 121 |
-
content = content.replace(/\\"/g, '"'); // Fix escaped quotes
|
| 122 |
-
|
| 123 |
-
return content.trim();
|
| 124 |
-
}
|
| 125 |
-
|
| 126 |
-
// Performance metrics
|
| 127 |
-
getAverageResponseTime() {
|
| 128 |
-
return this.requestCount > 0 ? this.totalResponseTime / this.requestCount : 0;
|
| 129 |
-
}
|
| 130 |
-
|
| 131 |
-
getErrorRate() {
|
| 132 |
-
return this.requestCount > 0 ? this.errorCount / this.requestCount : 0;
|
| 133 |
-
}
|
| 134 |
-
|
| 135 |
-
getPerformanceStats() {
|
| 136 |
-
return {
|
| 137 |
-
requestCount: this.requestCount,
|
| 138 |
-
totalResponseTime: this.totalResponseTime,
|
| 139 |
-
averageResponseTime: this.getAverageResponseTime(),
|
| 140 |
-
errorCount: this.errorCount,
|
| 141 |
-
errorRate: this.getErrorRate(),
|
| 142 |
-
lastError: this.lastError?.message || null
|
| 143 |
-
};
|
| 144 |
-
}
|
| 145 |
-
|
| 146 |
-
reset() {
|
| 147 |
-
this.requestCount = 0;
|
| 148 |
-
this.totalResponseTime = 0;
|
| 149 |
-
this.errorCount = 0;
|
| 150 |
-
this.lastError = null;
|
| 151 |
-
}
|
| 152 |
-
}
|
| 153 |
-
|
| 154 |
-
export { TestAIService };
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/testGameRunner.js
DELETED
|
@@ -1,473 +0,0 @@
|
|
| 1 |
-
/**
|
| 2 |
-
* Test Game Runner - Monitors and logs performance during game testing
|
| 3 |
-
*/
|
| 4 |
-
|
| 5 |
-
class TestGameRunner {
|
| 6 |
-
constructor(modelConfig) {
|
| 7 |
-
this.modelConfig = modelConfig;
|
| 8 |
-
this.sessionData = {
|
| 9 |
-
modelId: modelConfig.modelId,
|
| 10 |
-
modelName: modelConfig.modelName,
|
| 11 |
-
provider: modelConfig.provider,
|
| 12 |
-
startTime: Date.now(),
|
| 13 |
-
rounds: [],
|
| 14 |
-
interactions: [],
|
| 15 |
-
userRankings: [],
|
| 16 |
-
performance: {
|
| 17 |
-
wordSelectionRequests: 0,
|
| 18 |
-
wordSelectionSuccess: 0,
|
| 19 |
-
wordSelectionTime: 0,
|
| 20 |
-
contextualizationRequests: 0,
|
| 21 |
-
contextualizationSuccess: 0,
|
| 22 |
-
contextualizationTime: 0,
|
| 23 |
-
chatHintRequests: 0,
|
| 24 |
-
chatHintSuccess: 0,
|
| 25 |
-
chatHintTime: 0,
|
| 26 |
-
errors: []
|
| 27 |
-
}
|
| 28 |
-
};
|
| 29 |
-
|
| 30 |
-
this.originalAIService = null;
|
| 31 |
-
this.setupInterception();
|
| 32 |
-
}
|
| 33 |
-
|
| 34 |
-
setupInterception() {
|
| 35 |
-
// Intercept AI service calls to track performance
|
| 36 |
-
if (window.aiService) {
|
| 37 |
-
this.originalAIService = window.aiService;
|
| 38 |
-
this.wrapAIService();
|
| 39 |
-
}
|
| 40 |
-
|
| 41 |
-
// Monitor for game events
|
| 42 |
-
this.setupGameEventListeners();
|
| 43 |
-
}
|
| 44 |
-
|
| 45 |
-
wrapAIService() {
|
| 46 |
-
const testRunner = this;
|
| 47 |
-
|
| 48 |
-
// Wrap the makeAIRequest method
|
| 49 |
-
const originalMakeAIRequest = this.originalAIService.makeAIRequest.bind(this.originalAIService);
|
| 50 |
-
|
| 51 |
-
window.aiService.makeAIRequest = async function(prompt, options = {}) {
|
| 52 |
-
const startTime = Date.now();
|
| 53 |
-
const requestType = testRunner.classifyRequest(prompt);
|
| 54 |
-
|
| 55 |
-
testRunner.logInteraction({
|
| 56 |
-
type: 'ai_request_start',
|
| 57 |
-
requestType: requestType,
|
| 58 |
-
prompt: prompt.substring(0, 200) + '...',
|
| 59 |
-
timestamp: Date.now()
|
| 60 |
-
});
|
| 61 |
-
|
| 62 |
-
try {
|
| 63 |
-
const result = await originalMakeAIRequest(prompt, options);
|
| 64 |
-
const responseTime = Date.now() - startTime;
|
| 65 |
-
|
| 66 |
-
testRunner.updatePerformanceMetrics(requestType, true, responseTime);
|
| 67 |
-
testRunner.logInteraction({
|
| 68 |
-
type: 'ai_request_success',
|
| 69 |
-
requestType: requestType,
|
| 70 |
-
responseTime: responseTime,
|
| 71 |
-
responseLength: result.length,
|
| 72 |
-
timestamp: Date.now()
|
| 73 |
-
});
|
| 74 |
-
|
| 75 |
-
return result;
|
| 76 |
-
} catch (error) {
|
| 77 |
-
const responseTime = Date.now() - startTime;
|
| 78 |
-
|
| 79 |
-
testRunner.updatePerformanceMetrics(requestType, false, responseTime);
|
| 80 |
-
testRunner.logInteraction({
|
| 81 |
-
type: 'ai_request_error',
|
| 82 |
-
requestType: requestType,
|
| 83 |
-
error: error.message,
|
| 84 |
-
responseTime: responseTime,
|
| 85 |
-
timestamp: Date.now()
|
| 86 |
-
});
|
| 87 |
-
|
| 88 |
-
testRunner.sessionData.performance.errors.push({
|
| 89 |
-
type: requestType,
|
| 90 |
-
error: error.message,
|
| 91 |
-
timestamp: Date.now()
|
| 92 |
-
});
|
| 93 |
-
|
| 94 |
-
throw error;
|
| 95 |
-
}
|
| 96 |
-
};
|
| 97 |
-
}
|
| 98 |
-
|
| 99 |
-
classifyRequest(prompt) {
|
| 100 |
-
const promptLower = prompt.toLowerCase();
|
| 101 |
-
|
| 102 |
-
if (promptLower.includes('select') && promptLower.includes('word')) {
|
| 103 |
-
return 'word_selection';
|
| 104 |
-
} else if (promptLower.includes('context') || promptLower.includes('background')) {
|
| 105 |
-
return 'contextualization';
|
| 106 |
-
} else if (promptLower.includes('hint') || promptLower.includes('help') || promptLower.includes('clue')) {
|
| 107 |
-
return 'chat_hint';
|
| 108 |
-
} else {
|
| 109 |
-
return 'other';
|
| 110 |
-
}
|
| 111 |
-
}
|
| 112 |
-
|
| 113 |
-
updatePerformanceMetrics(requestType, success, responseTime) {
|
| 114 |
-
const perf = this.sessionData.performance;
|
| 115 |
-
|
| 116 |
-
switch (requestType) {
|
| 117 |
-
case 'word_selection':
|
| 118 |
-
perf.wordSelectionRequests++;
|
| 119 |
-
if (success) {
|
| 120 |
-
perf.wordSelectionSuccess++;
|
| 121 |
-
perf.wordSelectionTime += responseTime;
|
| 122 |
-
}
|
| 123 |
-
break;
|
| 124 |
-
|
| 125 |
-
case 'contextualization':
|
| 126 |
-
perf.contextualizationRequests++;
|
| 127 |
-
if (success) {
|
| 128 |
-
perf.contextualizationSuccess++;
|
| 129 |
-
perf.contextualizationTime += responseTime;
|
| 130 |
-
}
|
| 131 |
-
break;
|
| 132 |
-
|
| 133 |
-
case 'chat_hint':
|
| 134 |
-
perf.chatHintRequests++;
|
| 135 |
-
if (success) {
|
| 136 |
-
perf.chatHintSuccess++;
|
| 137 |
-
perf.chatHintTime += responseTime;
|
| 138 |
-
}
|
| 139 |
-
break;
|
| 140 |
-
}
|
| 141 |
-
}
|
| 142 |
-
|
| 143 |
-
setupGameEventListeners() {
|
| 144 |
-
// Listen for game-specific events
|
| 145 |
-
document.addEventListener('gameRoundStart', (event) => {
|
| 146 |
-
this.logInteraction({
|
| 147 |
-
type: 'round_start',
|
| 148 |
-
level: event.detail.level,
|
| 149 |
-
round: event.detail.round,
|
| 150 |
-
timestamp: Date.now()
|
| 151 |
-
});
|
| 152 |
-
});
|
| 153 |
-
|
| 154 |
-
document.addEventListener('gameRoundComplete', (event) => {
|
| 155 |
-
const roundData = {
|
| 156 |
-
level: event.detail.level,
|
| 157 |
-
round: event.detail.round,
|
| 158 |
-
score: event.detail.score,
|
| 159 |
-
correctAnswers: event.detail.correctAnswers,
|
| 160 |
-
totalBlanks: event.detail.totalBlanks,
|
| 161 |
-
timeSpent: event.detail.timeSpent,
|
| 162 |
-
timestamp: Date.now()
|
| 163 |
-
};
|
| 164 |
-
|
| 165 |
-
this.sessionData.rounds.push(roundData);
|
| 166 |
-
|
| 167 |
-
// Store the current round index for user ranking association
|
| 168 |
-
this.currentRoundIndex = this.sessionData.rounds.length - 1;
|
| 169 |
-
|
| 170 |
-
this.logInteraction({
|
| 171 |
-
type: 'round_complete',
|
| 172 |
-
level: event.detail.level,
|
| 173 |
-
round: event.detail.round,
|
| 174 |
-
score: event.detail.score,
|
| 175 |
-
timestamp: Date.now()
|
| 176 |
-
});
|
| 177 |
-
});
|
| 178 |
-
|
| 179 |
-
document.addEventListener('userAnswer', (event) => {
|
| 180 |
-
this.logInteraction({
|
| 181 |
-
type: 'user_answer',
|
| 182 |
-
word: event.detail.targetWord,
|
| 183 |
-
userAnswer: event.detail.userAnswer,
|
| 184 |
-
correct: event.detail.correct,
|
| 185 |
-
timestamp: Date.now()
|
| 186 |
-
});
|
| 187 |
-
});
|
| 188 |
-
|
| 189 |
-
document.addEventListener('chatInteraction', (event) => {
|
| 190 |
-
this.logInteraction({
|
| 191 |
-
type: 'chat_interaction',
|
| 192 |
-
questionType: event.detail.questionType,
|
| 193 |
-
word: event.detail.word,
|
| 194 |
-
timestamp: Date.now()
|
| 195 |
-
});
|
| 196 |
-
});
|
| 197 |
-
|
| 198 |
-
// Listen for user ranking events
|
| 199 |
-
document.addEventListener('userRanking', (event) => {
|
| 200 |
-
const rankingData = {
|
| 201 |
-
...event.detail,
|
| 202 |
-
roundIndex: this.currentRoundIndex,
|
| 203 |
-
roundDetails: this.sessionData.rounds[this.currentRoundIndex]
|
| 204 |
-
};
|
| 205 |
-
|
| 206 |
-
this.sessionData.userRankings.push(rankingData);
|
| 207 |
-
|
| 208 |
-
this.logInteraction({
|
| 209 |
-
type: 'user_ranking',
|
| 210 |
-
averageRating: event.detail.averageRating,
|
| 211 |
-
ratings: event.detail.ratings,
|
| 212 |
-
timestamp: Date.now()
|
| 213 |
-
});
|
| 214 |
-
});
|
| 215 |
-
}
|
| 216 |
-
|
| 217 |
-
logInteraction(interaction) {
|
| 218 |
-
this.sessionData.interactions.push(interaction);
|
| 219 |
-
|
| 220 |
-
// Log to console for real-time monitoring
|
| 221 |
-
console.log(`[TestRunner] ${interaction.type}:`, interaction);
|
| 222 |
-
}
|
| 223 |
-
|
| 224 |
-
generateReport() {
|
| 225 |
-
const endTime = Date.now();
|
| 226 |
-
const totalTime = endTime - this.sessionData.startTime;
|
| 227 |
-
const perf = this.sessionData.performance;
|
| 228 |
-
|
| 229 |
-
// Calculate user ranking summary
|
| 230 |
-
const userRankingSummary = this.calculateUserRankingSummary();
|
| 231 |
-
|
| 232 |
-
const report = {
|
| 233 |
-
...this.sessionData,
|
| 234 |
-
endTime: endTime,
|
| 235 |
-
totalSessionTime: totalTime,
|
| 236 |
-
summary: {
|
| 237 |
-
totalRounds: this.sessionData.rounds.length,
|
| 238 |
-
averageScore: this.sessionData.rounds.length > 0
|
| 239 |
-
? this.sessionData.rounds.reduce((sum, round) => sum + round.score, 0) / this.sessionData.rounds.length
|
| 240 |
-
: 0,
|
| 241 |
-
wordSelectionSuccessRate: perf.wordSelectionRequests > 0
|
| 242 |
-
? perf.wordSelectionSuccess / perf.wordSelectionRequests
|
| 243 |
-
: 0,
|
| 244 |
-
wordSelectionAvgTime: perf.wordSelectionSuccess > 0
|
| 245 |
-
? perf.wordSelectionTime / perf.wordSelectionSuccess
|
| 246 |
-
: 0,
|
| 247 |
-
contextualizationSuccessRate: perf.contextualizationRequests > 0
|
| 248 |
-
? perf.contextualizationSuccess / perf.contextualizationRequests
|
| 249 |
-
: 0,
|
| 250 |
-
contextualizationAvgTime: perf.contextualizationSuccess > 0
|
| 251 |
-
? perf.contextualizationTime / perf.contextualizationSuccess
|
| 252 |
-
: 0,
|
| 253 |
-
chatHintSuccessRate: perf.chatHintRequests > 0
|
| 254 |
-
? perf.chatHintSuccess / perf.chatHintRequests
|
| 255 |
-
: 0,
|
| 256 |
-
chatHintAvgTime: perf.chatHintSuccess > 0
|
| 257 |
-
? perf.chatHintTime / perf.chatHintSuccess
|
| 258 |
-
: 0,
|
| 259 |
-
totalErrors: perf.errors.length,
|
| 260 |
-
userRankingSummary: userRankingSummary
|
| 261 |
-
}
|
| 262 |
-
};
|
| 263 |
-
|
| 264 |
-
return report;
|
| 265 |
-
}
|
| 266 |
-
|
| 267 |
-
calculateUserRankingSummary() {
|
| 268 |
-
if (this.sessionData.userRankings.length === 0) {
|
| 269 |
-
return null;
|
| 270 |
-
}
|
| 271 |
-
|
| 272 |
-
const categories = ['word_selection', 'passage_quality', 'hint_helpfulness', 'overall_experience'];
|
| 273 |
-
const summary = {
|
| 274 |
-
totalRankings: this.sessionData.userRankings.length,
|
| 275 |
-
averageRatings: {},
|
| 276 |
-
categoryBreakdown: {},
|
| 277 |
-
comments: [],
|
| 278 |
-
overallUserSatisfaction: 0
|
| 279 |
-
};
|
| 280 |
-
|
| 281 |
-
// Calculate average ratings per category
|
| 282 |
-
categories.forEach(category => {
|
| 283 |
-
const ratings = this.sessionData.userRankings
|
| 284 |
-
.map(r => r.ratings[category])
|
| 285 |
-
.filter(r => r !== undefined);
|
| 286 |
-
|
| 287 |
-
if (ratings.length > 0) {
|
| 288 |
-
summary.averageRatings[category] =
|
| 289 |
-
ratings.reduce((a, b) => a + b, 0) / ratings.length;
|
| 290 |
-
|
| 291 |
-
// Distribution of ratings
|
| 292 |
-
summary.categoryBreakdown[category] = {
|
| 293 |
-
1: ratings.filter(r => r === 1).length,
|
| 294 |
-
2: ratings.filter(r => r === 2).length,
|
| 295 |
-
3: ratings.filter(r => r === 3).length,
|
| 296 |
-
4: ratings.filter(r => r === 4).length,
|
| 297 |
-
5: ratings.filter(r => r === 5).length
|
| 298 |
-
};
|
| 299 |
-
}
|
| 300 |
-
});
|
| 301 |
-
|
| 302 |
-
// Calculate overall satisfaction
|
| 303 |
-
const allRatings = this.sessionData.userRankings
|
| 304 |
-
.map(r => r.averageRating)
|
| 305 |
-
.filter(r => r !== undefined);
|
| 306 |
-
|
| 307 |
-
if (allRatings.length > 0) {
|
| 308 |
-
summary.overallUserSatisfaction =
|
| 309 |
-
allRatings.reduce((a, b) => a + b, 0) / allRatings.length;
|
| 310 |
-
}
|
| 311 |
-
|
| 312 |
-
// Collect comments with context
|
| 313 |
-
summary.comments = this.sessionData.userRankings
|
| 314 |
-
.filter(r => r.comments)
|
| 315 |
-
.map(r => ({
|
| 316 |
-
timestamp: r.timestamp,
|
| 317 |
-
comment: r.comments,
|
| 318 |
-
averageRating: r.averageRating,
|
| 319 |
-
roundLevel: r.roundDetails?.level,
|
| 320 |
-
roundScore: r.roundDetails?.score
|
| 321 |
-
}));
|
| 322 |
-
|
| 323 |
-
return summary;
|
| 324 |
-
}
|
| 325 |
-
|
| 326 |
-
async saveReport() {
|
| 327 |
-
const report = this.generateReport();
|
| 328 |
-
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
|
| 329 |
-
const filename = `game_test_${this.modelConfig.modelId.replace(/[\/\\:]/g, '_')}_${timestamp}.json`;
|
| 330 |
-
|
| 331 |
-
try {
|
| 332 |
-
// Try to save via browser download
|
| 333 |
-
this.downloadReport(report, filename);
|
| 334 |
-
|
| 335 |
-
// Also try to save to output folder if possible (server-side)
|
| 336 |
-
await this.saveToServer(report, filename);
|
| 337 |
-
|
| 338 |
-
console.log(`Test report saved: ${filename}`);
|
| 339 |
-
return filename;
|
| 340 |
-
} catch (error) {
|
| 341 |
-
console.error('Error saving test report:', error);
|
| 342 |
-
return null;
|
| 343 |
-
}
|
| 344 |
-
}
|
| 345 |
-
|
| 346 |
-
downloadReport(report, filename) {
|
| 347 |
-
const jsonString = JSON.stringify(report, null, 2);
|
| 348 |
-
const blob = new Blob([jsonString], { type: 'application/json' });
|
| 349 |
-
const url = URL.createObjectURL(blob);
|
| 350 |
-
|
| 351 |
-
const a = document.createElement('a');
|
| 352 |
-
a.href = url;
|
| 353 |
-
a.download = filename;
|
| 354 |
-
document.body.appendChild(a);
|
| 355 |
-
a.click();
|
| 356 |
-
document.body.removeChild(a);
|
| 357 |
-
URL.revokeObjectURL(url);
|
| 358 |
-
}
|
| 359 |
-
|
| 360 |
-
async saveToServer(report, filename) {
|
| 361 |
-
try {
|
| 362 |
-
const response = await fetch('/api/save-test-report', {
|
| 363 |
-
method: 'POST',
|
| 364 |
-
headers: {
|
| 365 |
-
'Content-Type': 'application/json'
|
| 366 |
-
},
|
| 367 |
-
body: JSON.stringify({
|
| 368 |
-
filename: filename,
|
| 369 |
-
data: report
|
| 370 |
-
})
|
| 371 |
-
});
|
| 372 |
-
|
| 373 |
-
if (!response.ok) {
|
| 374 |
-
throw new Error(`Server save failed: ${response.status}`);
|
| 375 |
-
}
|
| 376 |
-
} catch (error) {
|
| 377 |
-
console.log('Server save not available, using browser download only');
|
| 378 |
-
}
|
| 379 |
-
}
|
| 380 |
-
|
| 381 |
-
// Utility methods for analysis
|
| 382 |
-
getWordSelectionAnalytics() {
|
| 383 |
-
const wordSelectionInteractions = this.sessionData.interactions.filter(
|
| 384 |
-
i => i.type === 'ai_request_success' && i.requestType === 'word_selection'
|
| 385 |
-
);
|
| 386 |
-
|
| 387 |
-
return {
|
| 388 |
-
count: wordSelectionInteractions.length,
|
| 389 |
-
averageResponseTime: wordSelectionInteractions.length > 0
|
| 390 |
-
? wordSelectionInteractions.reduce((sum, i) => sum + i.responseTime, 0) / wordSelectionInteractions.length
|
| 391 |
-
: 0,
|
| 392 |
-
averageResponseLength: wordSelectionInteractions.length > 0
|
| 393 |
-
? wordSelectionInteractions.reduce((sum, i) => sum + i.responseLength, 0) / wordSelectionInteractions.length
|
| 394 |
-
: 0
|
| 395 |
-
};
|
| 396 |
-
}
|
| 397 |
-
|
| 398 |
-
getChatHintAnalytics() {
|
| 399 |
-
const chatHintInteractions = this.sessionData.interactions.filter(
|
| 400 |
-
i => i.type === 'chat_interaction'
|
| 401 |
-
);
|
| 402 |
-
|
| 403 |
-
const questionTypes = {};
|
| 404 |
-
chatHintInteractions.forEach(interaction => {
|
| 405 |
-
const type = interaction.questionType || 'unknown';
|
| 406 |
-
questionTypes[type] = (questionTypes[type] || 0) + 1;
|
| 407 |
-
});
|
| 408 |
-
|
| 409 |
-
return {
|
| 410 |
-
totalHints: chatHintInteractions.length,
|
| 411 |
-
questionTypeBreakdown: questionTypes
|
| 412 |
-
};
|
| 413 |
-
}
|
| 414 |
-
|
| 415 |
-
getUserPerformanceAnalytics() {
|
| 416 |
-
const answerInteractions = this.sessionData.interactions.filter(
|
| 417 |
-
i => i.type === 'user_answer'
|
| 418 |
-
);
|
| 419 |
-
|
| 420 |
-
const correctAnswers = answerInteractions.filter(i => i.correct).length;
|
| 421 |
-
|
| 422 |
-
return {
|
| 423 |
-
totalAnswers: answerInteractions.length,
|
| 424 |
-
correctAnswers: correctAnswers,
|
| 425 |
-
accuracy: answerInteractions.length > 0 ? correctAnswers / answerInteractions.length : 0
|
| 426 |
-
};
|
| 427 |
-
}
|
| 428 |
-
}
|
| 429 |
-
|
| 430 |
-
// Initialize test runner if in test mode
|
| 431 |
-
window.addEventListener('DOMContentLoaded', () => {
|
| 432 |
-
const urlParams = new URLSearchParams(window.location.search);
|
| 433 |
-
if (urlParams.get('testMode') === 'true') {
|
| 434 |
-
const modelId = urlParams.get('testModel');
|
| 435 |
-
const isLocal = urlParams.get('local') === 'true';
|
| 436 |
-
|
| 437 |
-
if (modelId) {
|
| 438 |
-
window.testGameRunner = new TestGameRunner({
|
| 439 |
-
modelId: modelId,
|
| 440 |
-
modelName: modelId,
|
| 441 |
-
provider: isLocal ? 'local' : 'openrouter'
|
| 442 |
-
});
|
| 443 |
-
|
| 444 |
-
console.log('Test Game Runner initialized for model:', modelId);
|
| 445 |
-
|
| 446 |
-
// Add end session button
|
| 447 |
-
const endButton = document.createElement('button');
|
| 448 |
-
endButton.textContent = 'End Test Session';
|
| 449 |
-
endButton.style.cssText = `
|
| 450 |
-
position: fixed;
|
| 451 |
-
top: 10px;
|
| 452 |
-
right: 10px;
|
| 453 |
-
z-index: 1000;
|
| 454 |
-
padding: 10px 15px;
|
| 455 |
-
background: #dc3545;
|
| 456 |
-
color: white;
|
| 457 |
-
border: none;
|
| 458 |
-
border-radius: 5px;
|
| 459 |
-
cursor: pointer;
|
| 460 |
-
`;
|
| 461 |
-
|
| 462 |
-
endButton.addEventListener('click', async () => {
|
| 463 |
-
const filename = await window.testGameRunner.saveReport();
|
| 464 |
-
alert(`Test session ended. Report saved as: ${filename}`);
|
| 465 |
-
window.close();
|
| 466 |
-
});
|
| 467 |
-
|
| 468 |
-
document.body.appendChild(endButton);
|
| 469 |
-
}
|
| 470 |
-
}
|
| 471 |
-
});
|
| 472 |
-
|
| 473 |
-
export { TestGameRunner };
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/testReportGenerator.js
DELETED
|
@@ -1,453 +0,0 @@
|
|
| 1 |
-
/**
|
| 2 |
-
* Comprehensive Test Report Generator
|
| 3 |
-
* Analyzes test results and generates detailed reports
|
| 4 |
-
*/
|
| 5 |
-
|
| 6 |
-
class TestReportGenerator {
|
| 7 |
-
constructor() {
|
| 8 |
-
this.reportTemplates = {
|
| 9 |
-
summary: this.generateSummaryReport.bind(this),
|
| 10 |
-
detailed: this.generateDetailedReport.bind(this),
|
| 11 |
-
comparison: this.generateComparisonReport.bind(this),
|
| 12 |
-
performance: this.generatePerformanceReport.bind(this),
|
| 13 |
-
markdown: this.generateMarkdownReport.bind(this)
|
| 14 |
-
};
|
| 15 |
-
}
|
| 16 |
-
|
| 17 |
-
async generateAllReports(testResults, outputFormat = 'all') {
|
| 18 |
-
const reports = {};
|
| 19 |
-
|
| 20 |
-
if (outputFormat === 'all' || outputFormat === 'summary') {
|
| 21 |
-
reports.summary = this.generateSummaryReport(testResults);
|
| 22 |
-
}
|
| 23 |
-
|
| 24 |
-
if (outputFormat === 'all' || outputFormat === 'detailed') {
|
| 25 |
-
reports.detailed = this.generateDetailedReport(testResults);
|
| 26 |
-
}
|
| 27 |
-
|
| 28 |
-
if (outputFormat === 'all' || outputFormat === 'comparison') {
|
| 29 |
-
reports.comparison = this.generateComparisonReport(testResults);
|
| 30 |
-
}
|
| 31 |
-
|
| 32 |
-
if (outputFormat === 'all' || outputFormat === 'performance') {
|
| 33 |
-
reports.performance = this.generatePerformanceReport(testResults);
|
| 34 |
-
}
|
| 35 |
-
|
| 36 |
-
if (outputFormat === 'all' || outputFormat === 'markdown') {
|
| 37 |
-
reports.markdown = this.generateMarkdownReport(testResults);
|
| 38 |
-
}
|
| 39 |
-
|
| 40 |
-
return reports;
|
| 41 |
-
}
|
| 42 |
-
|
| 43 |
-
generateSummaryReport(testResults) {
|
| 44 |
-
const summary = {
|
| 45 |
-
testOverview: {
|
| 46 |
-
timestamp: testResults.timestamp,
|
| 47 |
-
totalModels: testResults.tests.length,
|
| 48 |
-
testDuration: this.calculateTotalTestDuration(testResults.tests),
|
| 49 |
-
successfulTests: testResults.tests.filter(t => !t.error).length
|
| 50 |
-
},
|
| 51 |
-
topPerformers: this.getTopPerformers(testResults.tests),
|
| 52 |
-
categoryAverages: this.calculateCategoryAverages(testResults.tests),
|
| 53 |
-
recommendations: this.generateRecommendations(testResults.tests)
|
| 54 |
-
};
|
| 55 |
-
|
| 56 |
-
return summary;
|
| 57 |
-
}
|
| 58 |
-
|
| 59 |
-
generateDetailedReport(testResults) {
|
| 60 |
-
const detailed = {
|
| 61 |
-
testMetadata: {
|
| 62 |
-
timestamp: testResults.timestamp,
|
| 63 |
-
totalModels: testResults.tests.length,
|
| 64 |
-
testFrameworkVersion: '1.0.0'
|
| 65 |
-
},
|
| 66 |
-
modelResults: testResults.tests.map(test => ({
|
| 67 |
-
modelInfo: {
|
| 68 |
-
id: test.modelId,
|
| 69 |
-
name: test.modelName,
|
| 70 |
-
provider: test.provider
|
| 71 |
-
},
|
| 72 |
-
overallPerformance: {
|
| 73 |
-
score: test.overallScore,
|
| 74 |
-
totalTime: test.totalTime,
|
| 75 |
-
rank: this.calculateRank(test, testResults.tests)
|
| 76 |
-
},
|
| 77 |
-
wordSelection: this.analyzeWordSelection(test.wordSelection),
|
| 78 |
-
contextualization: this.analyzeContextualization(test.contextualization),
|
| 79 |
-
chatHints: this.analyzeChatHints(test.chatHints),
|
| 80 |
-
errorAnalysis: this.analyzeErrors(test)
|
| 81 |
-
}))
|
| 82 |
-
};
|
| 83 |
-
|
| 84 |
-
return detailed;
|
| 85 |
-
}
|
| 86 |
-
|
| 87 |
-
generateComparisonReport(testResults) {
|
| 88 |
-
const validTests = testResults.tests.filter(t => !t.error);
|
| 89 |
-
|
| 90 |
-
const comparison = {
|
| 91 |
-
modelComparison: this.createModelComparisonMatrix(validTests),
|
| 92 |
-
providerAnalysis: this.analyzeByProvider(validTests),
|
| 93 |
-
performanceMetrics: {
|
| 94 |
-
wordSelection: this.compareWordSelectionMetrics(validTests),
|
| 95 |
-
contextualization: this.compareContextualizationMetrics(validTests),
|
| 96 |
-
chatHints: this.compareChatHintMetrics(validTests),
|
| 97 |
-
responseTime: this.compareResponseTimes(validTests)
|
| 98 |
-
},
|
| 99 |
-
recommendations: {
|
| 100 |
-
bestOverall: this.getBestOverallModel(validTests),
|
| 101 |
-
bestForWordSelection: this.getBestForTask(validTests, 'wordSelection'),
|
| 102 |
-
bestForContextualization: this.getBestForTask(validTests, 'contextualization'),
|
| 103 |
-
bestForChatHints: this.getBestForTask(validTests, 'chatHints'),
|
| 104 |
-
fastestResponse: this.getFastestModel(validTests),
|
| 105 |
-
mostReliable: this.getMostReliableModel(validTests)
|
| 106 |
-
}
|
| 107 |
-
};
|
| 108 |
-
|
| 109 |
-
return comparison;
|
| 110 |
-
}
|
| 111 |
-
|
| 112 |
-
generatePerformanceReport(testResults) {
|
| 113 |
-
const performance = {
|
| 114 |
-
responseTimeAnalysis: this.analyzeResponseTimes(testResults.tests),
|
| 115 |
-
successRateAnalysis: this.analyzeSuccessRates(testResults.tests),
|
| 116 |
-
qualityMetrics: this.analyzeQualityMetrics(testResults.tests),
|
| 117 |
-
scalabilityInsights: this.analyzeScalability(testResults.tests),
|
| 118 |
-
reliabilityMetrics: this.analyzeReliability(testResults.tests)
|
| 119 |
-
};
|
| 120 |
-
|
| 121 |
-
return performance;
|
| 122 |
-
}
|
| 123 |
-
|
| 124 |
-
generateMarkdownReport(testResults) {
|
| 125 |
-
const summary = this.generateSummaryReport(testResults);
|
| 126 |
-
const comparison = this.generateComparisonReport(testResults);
|
| 127 |
-
|
| 128 |
-
let markdown = `# Cloze Reader Model Testing Report\n\n`;
|
| 129 |
-
markdown += `**Generated:** ${new Date().toLocaleString()}\n`;
|
| 130 |
-
markdown += `**Test Timestamp:** ${testResults.timestamp}\n`;
|
| 131 |
-
markdown += `**Models Tested:** ${testResults.tests.length}\n\n`;
|
| 132 |
-
|
| 133 |
-
// Executive Summary
|
| 134 |
-
markdown += `## Executive Summary\n\n`;
|
| 135 |
-
markdown += `- **Successful Tests:** ${summary.testOverview.successfulTests}/${summary.testOverview.totalModels}\n`;
|
| 136 |
-
markdown += `- **Best Overall Model:** ${comparison.recommendations.bestOverall.name} (${comparison.recommendations.bestOverall.score.toFixed(1)}/100)\n`;
|
| 137 |
-
markdown += `- **Average Response Time:** ${this.formatTime(this.calculateAverageResponseTime(testResults.tests))}\n\n`;
|
| 138 |
-
|
| 139 |
-
// Top Performers
|
| 140 |
-
markdown += `## Top Performers\n\n`;
|
| 141 |
-
markdown += `| Rank | Model | Score | Provider |\n`;
|
| 142 |
-
markdown += `|------|-------|-------|----------|\n`;
|
| 143 |
-
summary.topPerformers.forEach((model, index) => {
|
| 144 |
-
markdown += `| ${index + 1} | ${model.name} | ${model.score.toFixed(1)} | ${model.provider} |\n`;
|
| 145 |
-
});
|
| 146 |
-
markdown += `\n`;
|
| 147 |
-
|
| 148 |
-
// Performance by Category
|
| 149 |
-
markdown += `## Performance by Category\n\n`;
|
| 150 |
-
markdown += `### Word Selection\n`;
|
| 151 |
-
markdown += `- **Best:** ${comparison.recommendations.bestForWordSelection.name} (${(comparison.recommendations.bestForWordSelection.successRate * 100).toFixed(1)}% success rate)\n`;
|
| 152 |
-
markdown += `- **Average Success Rate:** ${(summary.categoryAverages.wordSelection.successRate * 100).toFixed(1)}%\n`;
|
| 153 |
-
markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.wordSelection.averageTime)}\n\n`;
|
| 154 |
-
|
| 155 |
-
markdown += `### Contextualization\n`;
|
| 156 |
-
markdown += `- **Best:** ${comparison.recommendations.bestForContextualization.name} (${(comparison.recommendations.bestForContextualization.successRate * 100).toFixed(1)}% success rate)\n`;
|
| 157 |
-
markdown += `- **Average Success Rate:** ${(summary.categoryAverages.contextualization.successRate * 100).toFixed(1)}%\n`;
|
| 158 |
-
markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.contextualization.averageTime)}\n\n`;
|
| 159 |
-
|
| 160 |
-
markdown += `### Chat Hints\n`;
|
| 161 |
-
markdown += `- **Best:** ${comparison.recommendations.bestForChatHints.name} (${(comparison.recommendations.bestForChatHints.successRate * 100).toFixed(1)}% success rate)\n`;
|
| 162 |
-
markdown += `- **Average Success Rate:** ${(summary.categoryAverages.chatHints.successRate * 100).toFixed(1)}%\n`;
|
| 163 |
-
markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.chatHints.averageTime)}\n\n`;
|
| 164 |
-
|
| 165 |
-
// Add user rankings section if available
|
| 166 |
-
const hasUserRankings = testResults.tests.some(t => t.userRankings?.totalRankings > 0);
|
| 167 |
-
if (hasUserRankings) {
|
| 168 |
-
markdown += `## User Satisfaction Ratings\n\n`;
|
| 169 |
-
markdown += `| Model | Overall Satisfaction | Word Selection | Passage Quality | Hint Helpfulness | Overall Experience |\n`;
|
| 170 |
-
markdown += `|-------|---------------------|----------------|-----------------|------------------|--------------------|\n`;
|
| 171 |
-
|
| 172 |
-
testResults.tests.forEach(test => {
|
| 173 |
-
if (test.userRankings?.totalRankings > 0) {
|
| 174 |
-
const ur = test.userRankings;
|
| 175 |
-
const avg = ur.averageRatings || {};
|
| 176 |
-
markdown += `| ${test.modelName} | ${ur.overallUserSatisfaction.toFixed(1)}/5 | ${(avg.word_selection || 0).toFixed(1)} | ${(avg.passage_quality || 0).toFixed(1)} | ${(avg.hint_helpfulness || 0).toFixed(1)} | ${(avg.overall_experience || 0).toFixed(1)} |\n`;
|
| 177 |
-
}
|
| 178 |
-
});
|
| 179 |
-
markdown += `\n`;
|
| 180 |
-
|
| 181 |
-
// Add user comments if any
|
| 182 |
-
const allComments = testResults.tests
|
| 183 |
-
.filter(t => t.userRankings?.comments?.length > 0)
|
| 184 |
-
.flatMap(t => t.userRankings.comments.map(c => ({ ...c, model: t.modelName })));
|
| 185 |
-
|
| 186 |
-
if (allComments.length > 0) {
|
| 187 |
-
markdown += `### User Comments\n\n`;
|
| 188 |
-
allComments.forEach(comment => {
|
| 189 |
-
markdown += `- **${comment.model}** (Rating: ${comment.averageRating.toFixed(1)}): "${comment.comment}"\n`;
|
| 190 |
-
});
|
| 191 |
-
markdown += `\n`;
|
| 192 |
-
}
|
| 193 |
-
}
|
| 194 |
-
|
| 195 |
-
// Detailed Results
|
| 196 |
-
markdown += `## Detailed Results\n\n`;
|
| 197 |
-
testResults.tests.forEach(test => {
|
| 198 |
-
if (!test.error) {
|
| 199 |
-
markdown += `### ${test.modelName}\n`;
|
| 200 |
-
markdown += `- **Provider:** ${test.provider}\n`;
|
| 201 |
-
markdown += `- **Overall Score:** ${test.overallScore.toFixed(1)}/100\n`;
|
| 202 |
-
markdown += `- **Total Time:** ${this.formatTime(test.totalTime)}\n`;
|
| 203 |
-
markdown += `- **Word Selection:** ${(test.wordSelection?.successRate * 100 || 0).toFixed(1)}% success\n`;
|
| 204 |
-
markdown += `- **Contextualization:** ${(test.contextualization?.successRate * 100 || 0).toFixed(1)}% success\n`;
|
| 205 |
-
markdown += `- **Chat Hints:** ${(test.chatHints?.successRate * 100 || 0).toFixed(1)}% success\n\n`;
|
| 206 |
-
}
|
| 207 |
-
});
|
| 208 |
-
|
| 209 |
-
// Recommendations
|
| 210 |
-
markdown += `## Recommendations\n\n`;
|
| 211 |
-
summary.recommendations.forEach(rec => {
|
| 212 |
-
markdown += `- ${rec}\n`;
|
| 213 |
-
});
|
| 214 |
-
|
| 215 |
-
return markdown;
|
| 216 |
-
}
|
| 217 |
-
|
| 218 |
-
// Helper methods for analysis
|
| 219 |
-
calculateTotalTestDuration(tests) {
|
| 220 |
-
return tests.reduce((total, test) => total + (test.totalTime || 0), 0);
|
| 221 |
-
}
|
| 222 |
-
|
| 223 |
-
getTopPerformers(tests, limit = 5) {
|
| 224 |
-
return tests
|
| 225 |
-
.filter(t => !t.error && t.overallScore)
|
| 226 |
-
.sort((a, b) => b.overallScore - a.overallScore)
|
| 227 |
-
.slice(0, limit)
|
| 228 |
-
.map(test => ({
|
| 229 |
-
name: test.modelName,
|
| 230 |
-
score: test.overallScore,
|
| 231 |
-
provider: test.provider
|
| 232 |
-
}));
|
| 233 |
-
}
|
| 234 |
-
|
| 235 |
-
calculateCategoryAverages(tests) {
|
| 236 |
-
const validTests = tests.filter(t => !t.error);
|
| 237 |
-
|
| 238 |
-
return {
|
| 239 |
-
wordSelection: this.calculateCategoryAverage(validTests, 'wordSelection'),
|
| 240 |
-
contextualization: this.calculateCategoryAverage(validTests, 'contextualization'),
|
| 241 |
-
chatHints: this.calculateCategoryAverage(validTests, 'chatHints')
|
| 242 |
-
};
|
| 243 |
-
}
|
| 244 |
-
|
| 245 |
-
calculateCategoryAverage(tests, category) {
|
| 246 |
-
const validCategoryTests = tests.filter(t => t[category]);
|
| 247 |
-
|
| 248 |
-
if (validCategoryTests.length === 0) {
|
| 249 |
-
return { successRate: 0, averageTime: 0, qualityScore: 0 };
|
| 250 |
-
}
|
| 251 |
-
|
| 252 |
-
return {
|
| 253 |
-
successRate: validCategoryTests.reduce((sum, t) => sum + (t[category].successRate || 0), 0) / validCategoryTests.length,
|
| 254 |
-
averageTime: validCategoryTests.reduce((sum, t) => sum + (t[category].averageTime || 0), 0) / validCategoryTests.length,
|
| 255 |
-
qualityScore: validCategoryTests.reduce((sum, t) => sum + (t[category].qualityScore || t[category].relevanceScore || t[category].helpfulnessScore || 0), 0) / validCategoryTests.length
|
| 256 |
-
};
|
| 257 |
-
}
|
| 258 |
-
|
| 259 |
-
generateRecommendations(tests) {
|
| 260 |
-
const recommendations = [];
|
| 261 |
-
const validTests = tests.filter(t => !t.error);
|
| 262 |
-
|
| 263 |
-
if (validTests.length === 0) {
|
| 264 |
-
return ['No successful tests to generate recommendations.'];
|
| 265 |
-
}
|
| 266 |
-
|
| 267 |
-
const bestOverall = validTests.reduce((best, test) =>
|
| 268 |
-
test.overallScore > best.overallScore ? test : best
|
| 269 |
-
);
|
| 270 |
-
|
| 271 |
-
recommendations.push(`For overall best performance, use ${bestOverall.modelName} (${bestOverall.provider})`);
|
| 272 |
-
|
| 273 |
-
// Provider-specific recommendations
|
| 274 |
-
const providerPerformance = this.analyzeByProvider(validTests);
|
| 275 |
-
const bestProvider = Object.keys(providerPerformance)
|
| 276 |
-
.reduce((best, provider) =>
|
| 277 |
-
providerPerformance[provider].averageScore > providerPerformance[best]?.averageScore ? provider : best
|
| 278 |
-
);
|
| 279 |
-
|
| 280 |
-
recommendations.push(`${bestProvider} models show the best average performance`);
|
| 281 |
-
|
| 282 |
-
// Speed vs quality trade-offs
|
| 283 |
-
const fastestGoodModel = validTests
|
| 284 |
-
.filter(t => t.overallScore > 70)
|
| 285 |
-
.sort((a, b) => a.totalTime - b.totalTime)[0];
|
| 286 |
-
|
| 287 |
-
if (fastestGoodModel) {
|
| 288 |
-
recommendations.push(`For fastest good performance, consider ${fastestGoodModel.modelName}`);
|
| 289 |
-
}
|
| 290 |
-
|
| 291 |
-
return recommendations;
|
| 292 |
-
}
|
| 293 |
-
|
| 294 |
-
analyzeByProvider(tests) {
|
| 295 |
-
const providerGroups = {};
|
| 296 |
-
|
| 297 |
-
tests.forEach(test => {
|
| 298 |
-
if (!providerGroups[test.provider]) {
|
| 299 |
-
providerGroups[test.provider] = [];
|
| 300 |
-
}
|
| 301 |
-
providerGroups[test.provider].push(test);
|
| 302 |
-
});
|
| 303 |
-
|
| 304 |
-
const analysis = {};
|
| 305 |
-
Object.keys(providerGroups).forEach(provider => {
|
| 306 |
-
const providerTests = providerGroups[provider];
|
| 307 |
-
analysis[provider] = {
|
| 308 |
-
count: providerTests.length,
|
| 309 |
-
averageScore: providerTests.reduce((sum, t) => sum + t.overallScore, 0) / providerTests.length,
|
| 310 |
-
averageTime: providerTests.reduce((sum, t) => sum + t.totalTime, 0) / providerTests.length,
|
| 311 |
-
successRate: providerTests.filter(t => !t.error).length / providerTests.length
|
| 312 |
-
};
|
| 313 |
-
});
|
| 314 |
-
|
| 315 |
-
return analysis;
|
| 316 |
-
}
|
| 317 |
-
|
| 318 |
-
getBestOverallModel(tests) {
|
| 319 |
-
return tests.reduce((best, test) =>
|
| 320 |
-
test.overallScore > best.overallScore ? {
|
| 321 |
-
name: test.modelName,
|
| 322 |
-
score: test.overallScore,
|
| 323 |
-
provider: test.provider
|
| 324 |
-
} : best
|
| 325 |
-
, { name: '', score: 0, provider: '' });
|
| 326 |
-
}
|
| 327 |
-
|
| 328 |
-
getBestForTask(tests, taskName) {
|
| 329 |
-
const validTests = tests.filter(t => t[taskName] && t[taskName].successRate !== undefined);
|
| 330 |
-
|
| 331 |
-
if (validTests.length === 0) {
|
| 332 |
-
return { name: 'N/A', successRate: 0, provider: '' };
|
| 333 |
-
}
|
| 334 |
-
|
| 335 |
-
return validTests.reduce((best, test) =>
|
| 336 |
-
test[taskName].successRate > best.successRate ? {
|
| 337 |
-
name: test.modelName,
|
| 338 |
-
successRate: test[taskName].successRate,
|
| 339 |
-
provider: test.provider
|
| 340 |
-
} : best
|
| 341 |
-
, { name: '', successRate: 0, provider: '' });
|
| 342 |
-
}
|
| 343 |
-
|
| 344 |
-
getFastestModel(tests) {
|
| 345 |
-
return tests.reduce((fastest, test) =>
|
| 346 |
-
test.totalTime < fastest.time ? {
|
| 347 |
-
name: test.modelName,
|
| 348 |
-
time: test.totalTime,
|
| 349 |
-
provider: test.provider
|
| 350 |
-
} : fastest
|
| 351 |
-
, { name: '', time: Infinity, provider: '' });
|
| 352 |
-
}
|
| 353 |
-
|
| 354 |
-
getMostReliableModel(tests) {
|
| 355 |
-
// Model with fewest errors and highest success rates across all tasks
|
| 356 |
-
const reliability = tests.map(test => {
|
| 357 |
-
const wordSelectionReliability = test.wordSelection?.successRate || 0;
|
| 358 |
-
const contextualizationReliability = test.contextualization?.successRate || 0;
|
| 359 |
-
const chatHintReliability = test.chatHints?.successRate || 0;
|
| 360 |
-
|
| 361 |
-
const overallReliability = (wordSelectionReliability + contextualizationReliability + chatHintReliability) / 3;
|
| 362 |
-
|
| 363 |
-
return {
|
| 364 |
-
name: test.modelName,
|
| 365 |
-
reliability: overallReliability,
|
| 366 |
-
provider: test.provider
|
| 367 |
-
};
|
| 368 |
-
});
|
| 369 |
-
|
| 370 |
-
return reliability.reduce((most, test) =>
|
| 371 |
-
test.reliability > most.reliability ? test : most
|
| 372 |
-
, { name: '', reliability: 0, provider: '' });
|
| 373 |
-
}
|
| 374 |
-
|
| 375 |
-
calculateAverageResponseTime(tests) {
|
| 376 |
-
const validTests = tests.filter(t => t.totalTime);
|
| 377 |
-
return validTests.reduce((sum, t) => sum + t.totalTime, 0) / validTests.length;
|
| 378 |
-
}
|
| 379 |
-
|
| 380 |
-
formatTime(milliseconds) {
|
| 381 |
-
if (milliseconds < 1000) {
|
| 382 |
-
return `${milliseconds.toFixed(0)}ms`;
|
| 383 |
-
} else if (milliseconds < 60000) {
|
| 384 |
-
return `${(milliseconds / 1000).toFixed(1)}s`;
|
| 385 |
-
} else {
|
| 386 |
-
return `${(milliseconds / 60000).toFixed(1)}m`;
|
| 387 |
-
}
|
| 388 |
-
}
|
| 389 |
-
|
| 390 |
-
async saveReports(reports, baseFilename) {
|
| 391 |
-
const savedFiles = [];
|
| 392 |
-
|
| 393 |
-
for (const [type, content] of Object.entries(reports)) {
|
| 394 |
-
const filename = `${baseFilename}_${type}`;
|
| 395 |
-
let fileContent, extension;
|
| 396 |
-
|
| 397 |
-
if (type === 'markdown') {
|
| 398 |
-
fileContent = content;
|
| 399 |
-
extension = '.md';
|
| 400 |
-
} else {
|
| 401 |
-
fileContent = JSON.stringify(content, null, 2);
|
| 402 |
-
extension = '.json';
|
| 403 |
-
}
|
| 404 |
-
|
| 405 |
-
try {
|
| 406 |
-
await this.saveFile(`${filename}${extension}`, fileContent);
|
| 407 |
-
savedFiles.push(`${filename}${extension}`);
|
| 408 |
-
} catch (error) {
|
| 409 |
-
console.error(`Error saving ${filename}:`, error);
|
| 410 |
-
}
|
| 411 |
-
}
|
| 412 |
-
|
| 413 |
-
return savedFiles;
|
| 414 |
-
}
|
| 415 |
-
|
| 416 |
-
async saveFile(filename, content) {
|
| 417 |
-
// Try to save via browser download
|
| 418 |
-
const blob = new Blob([content], {
|
| 419 |
-
type: filename.endsWith('.md') ? 'text/markdown' : 'application/json'
|
| 420 |
-
});
|
| 421 |
-
const url = URL.createObjectURL(blob);
|
| 422 |
-
|
| 423 |
-
const a = document.createElement('a');
|
| 424 |
-
a.href = url;
|
| 425 |
-
a.download = filename;
|
| 426 |
-
document.body.appendChild(a);
|
| 427 |
-
a.click();
|
| 428 |
-
document.body.removeChild(a);
|
| 429 |
-
URL.revokeObjectURL(url);
|
| 430 |
-
}
|
| 431 |
-
|
| 432 |
-
// Stub methods for detailed analysis (implement as needed)
|
| 433 |
-
analyzeWordSelection(data) { return data; }
|
| 434 |
-
analyzeContextualization(data) { return data; }
|
| 435 |
-
analyzeChatHints(data) { return data; }
|
| 436 |
-
analyzeErrors(test) { return test.error ? [test.error] : []; }
|
| 437 |
-
calculateRank(test, allTests) {
|
| 438 |
-
const sorted = allTests.filter(t => !t.error).sort((a, b) => b.overallScore - a.overallScore);
|
| 439 |
-
return sorted.findIndex(t => t.modelId === test.modelId) + 1;
|
| 440 |
-
}
|
| 441 |
-
createModelComparisonMatrix(tests) { return {}; }
|
| 442 |
-
compareWordSelectionMetrics(tests) { return {}; }
|
| 443 |
-
compareContextualizationMetrics(tests) { return {}; }
|
| 444 |
-
compareChatHintMetrics(tests) { return {}; }
|
| 445 |
-
compareResponseTimes(tests) { return {}; }
|
| 446 |
-
analyzeResponseTimes(tests) { return {}; }
|
| 447 |
-
analyzeSuccessRates(tests) { return {}; }
|
| 448 |
-
analyzeQualityMetrics(tests) { return {}; }
|
| 449 |
-
analyzeScalability(tests) { return {}; }
|
| 450 |
-
analyzeReliability(tests) { return {}; }
|
| 451 |
-
}
|
| 452 |
-
|
| 453 |
-
export { TestReportGenerator };
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/userRankingInterface.js
DELETED
|
@@ -1,650 +0,0 @@
|
|
| 1 |
-
/**
|
| 2 |
-
* User Ranking Interface for Model Testing
|
| 3 |
-
* Allows users to rate model performance on each task during gameplay
|
| 4 |
-
*/
|
| 5 |
-
|
| 6 |
-
class UserRankingInterface {
|
| 7 |
-
constructor() {
|
| 8 |
-
this.rankings = {
|
| 9 |
-
rounds: [],
|
| 10 |
-
currentRound: null
|
| 11 |
-
};
|
| 12 |
-
|
| 13 |
-
this.rankingCategories = [
|
| 14 |
-
{
|
| 15 |
-
id: 'word_selection',
|
| 16 |
-
name: 'Word Selection Quality',
|
| 17 |
-
description: 'How appropriate were the selected words for this difficulty level?',
|
| 18 |
-
criteria: [
|
| 19 |
-
'Words match the difficulty level',
|
| 20 |
-
'Vocabulary is challenging but fair',
|
| 21 |
-
'Selected words are meaningful in context'
|
| 22 |
-
]
|
| 23 |
-
},
|
| 24 |
-
{
|
| 25 |
-
id: 'passage_quality',
|
| 26 |
-
name: 'Passage Selection',
|
| 27 |
-
description: 'How suitable was this passage for language learning?',
|
| 28 |
-
criteria: [
|
| 29 |
-
'Text is engaging and appropriate',
|
| 30 |
-
'Content is educational',
|
| 31 |
-
'Difficulty matches the level'
|
| 32 |
-
]
|
| 33 |
-
},
|
| 34 |
-
{
|
| 35 |
-
id: 'hint_helpfulness',
|
| 36 |
-
name: 'Hint Quality',
|
| 37 |
-
description: 'How helpful were the AI-generated hints?',
|
| 38 |
-
criteria: [
|
| 39 |
-
'Hints guide without revealing answers',
|
| 40 |
-
'Explanations are clear and educational',
|
| 41 |
-
'Responses are contextually appropriate'
|
| 42 |
-
]
|
| 43 |
-
},
|
| 44 |
-
{
|
| 45 |
-
id: 'overall_experience',
|
| 46 |
-
name: 'Overall Round Experience',
|
| 47 |
-
description: 'How was the overall quality of this round?',
|
| 48 |
-
criteria: [
|
| 49 |
-
'Smooth gameplay experience',
|
| 50 |
-
'AI responses were timely',
|
| 51 |
-
'Educational value was high'
|
| 52 |
-
]
|
| 53 |
-
}
|
| 54 |
-
];
|
| 55 |
-
|
| 56 |
-
this.createRankingUI();
|
| 57 |
-
this.setupEventListeners();
|
| 58 |
-
}
|
| 59 |
-
|
| 60 |
-
createRankingUI() {
|
| 61 |
-
// Create ranking modal
|
| 62 |
-
const modal = document.createElement('div');
|
| 63 |
-
modal.id = 'ranking-modal';
|
| 64 |
-
modal.className = 'ranking-modal';
|
| 65 |
-
modal.innerHTML = `
|
| 66 |
-
<div class="ranking-modal-content">
|
| 67 |
-
<h2>Rate This Round</h2>
|
| 68 |
-
<p class="ranking-subtitle">Help us improve by rating the AI's performance</p>
|
| 69 |
-
|
| 70 |
-
<div id="ranking-categories" class="ranking-categories">
|
| 71 |
-
<!-- Categories will be populated dynamically -->
|
| 72 |
-
</div>
|
| 73 |
-
|
| 74 |
-
<div class="ranking-comments">
|
| 75 |
-
<label for="ranking-comments-input">Additional Comments (Optional):</label>
|
| 76 |
-
<textarea id="ranking-comments-input" rows="3" placeholder="Any specific feedback about this round..."></textarea>
|
| 77 |
-
</div>
|
| 78 |
-
|
| 79 |
-
<div class="ranking-actions">
|
| 80 |
-
<button id="skip-ranking-btn" class="btn-secondary">Skip</button>
|
| 81 |
-
<button id="submit-ranking-btn" class="btn-primary" disabled>Submit Rating</button>
|
| 82 |
-
</div>
|
| 83 |
-
</div>
|
| 84 |
-
`;
|
| 85 |
-
|
| 86 |
-
// Create ranking trigger button
|
| 87 |
-
const triggerButton = document.createElement('button');
|
| 88 |
-
triggerButton.id = 'ranking-trigger-btn';
|
| 89 |
-
triggerButton.className = 'ranking-trigger-btn';
|
| 90 |
-
triggerButton.innerHTML = '⭐ Rate Round';
|
| 91 |
-
triggerButton.style.cssText = `
|
| 92 |
-
position: fixed;
|
| 93 |
-
bottom: 20px;
|
| 94 |
-
left: 20px;
|
| 95 |
-
z-index: 999;
|
| 96 |
-
padding: 10px 20px;
|
| 97 |
-
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 98 |
-
color: white;
|
| 99 |
-
border: none;
|
| 100 |
-
border-radius: 25px;
|
| 101 |
-
cursor: pointer;
|
| 102 |
-
font-size: 14px;
|
| 103 |
-
font-weight: bold;
|
| 104 |
-
box-shadow: 0 4px 15px rgba(102, 126, 234, 0.4);
|
| 105 |
-
transition: all 0.3s ease;
|
| 106 |
-
display: none;
|
| 107 |
-
`;
|
| 108 |
-
|
| 109 |
-
// Add styles
|
| 110 |
-
const styles = document.createElement('style');
|
| 111 |
-
styles.textContent = `
|
| 112 |
-
.ranking-modal {
|
| 113 |
-
display: none;
|
| 114 |
-
position: fixed;
|
| 115 |
-
top: 0;
|
| 116 |
-
left: 0;
|
| 117 |
-
width: 100%;
|
| 118 |
-
height: 100%;
|
| 119 |
-
background: rgba(0, 0, 0, 0.5);
|
| 120 |
-
z-index: 1000;
|
| 121 |
-
backdrop-filter: blur(5px);
|
| 122 |
-
}
|
| 123 |
-
|
| 124 |
-
.ranking-modal.active {
|
| 125 |
-
display: flex;
|
| 126 |
-
align-items: center;
|
| 127 |
-
justify-content: center;
|
| 128 |
-
}
|
| 129 |
-
|
| 130 |
-
.ranking-modal-content {
|
| 131 |
-
background: white;
|
| 132 |
-
border-radius: 15px;
|
| 133 |
-
padding: 30px;
|
| 134 |
-
max-width: 600px;
|
| 135 |
-
width: 90%;
|
| 136 |
-
max-height: 80vh;
|
| 137 |
-
overflow-y: auto;
|
| 138 |
-
box-shadow: 0 10px 40px rgba(0, 0, 0, 0.3);
|
| 139 |
-
}
|
| 140 |
-
|
| 141 |
-
.ranking-modal-content h2 {
|
| 142 |
-
color: #2c3e50;
|
| 143 |
-
margin-bottom: 10px;
|
| 144 |
-
text-align: center;
|
| 145 |
-
}
|
| 146 |
-
|
| 147 |
-
.ranking-subtitle {
|
| 148 |
-
color: #7f8c8d;
|
| 149 |
-
text-align: center;
|
| 150 |
-
margin-bottom: 30px;
|
| 151 |
-
}
|
| 152 |
-
|
| 153 |
-
.ranking-category {
|
| 154 |
-
margin-bottom: 25px;
|
| 155 |
-
padding: 20px;
|
| 156 |
-
background: #f8f9fa;
|
| 157 |
-
border-radius: 10px;
|
| 158 |
-
border: 2px solid #e9ecef;
|
| 159 |
-
}
|
| 160 |
-
|
| 161 |
-
.ranking-category h3 {
|
| 162 |
-
color: #2c3e50;
|
| 163 |
-
margin-bottom: 8px;
|
| 164 |
-
font-size: 1.1rem;
|
| 165 |
-
}
|
| 166 |
-
|
| 167 |
-
.ranking-category-description {
|
| 168 |
-
color: #6c757d;
|
| 169 |
-
font-size: 0.9rem;
|
| 170 |
-
margin-bottom: 15px;
|
| 171 |
-
}
|
| 172 |
-
|
| 173 |
-
.ranking-criteria {
|
| 174 |
-
font-size: 0.85rem;
|
| 175 |
-
color: #6c757d;
|
| 176 |
-
margin-bottom: 15px;
|
| 177 |
-
padding-left: 20px;
|
| 178 |
-
}
|
| 179 |
-
|
| 180 |
-
.ranking-criteria li {
|
| 181 |
-
margin-bottom: 5px;
|
| 182 |
-
}
|
| 183 |
-
|
| 184 |
-
.ranking-stars {
|
| 185 |
-
display: flex;
|
| 186 |
-
gap: 10px;
|
| 187 |
-
justify-content: center;
|
| 188 |
-
margin-top: 10px;
|
| 189 |
-
}
|
| 190 |
-
|
| 191 |
-
.ranking-star {
|
| 192 |
-
font-size: 30px;
|
| 193 |
-
color: #ddd;
|
| 194 |
-
cursor: pointer;
|
| 195 |
-
transition: all 0.2s ease;
|
| 196 |
-
}
|
| 197 |
-
|
| 198 |
-
.ranking-star:hover,
|
| 199 |
-
.ranking-star.hover {
|
| 200 |
-
color: #ffd700;
|
| 201 |
-
transform: scale(1.1);
|
| 202 |
-
}
|
| 203 |
-
|
| 204 |
-
.ranking-star.selected {
|
| 205 |
-
color: #ffd700;
|
| 206 |
-
}
|
| 207 |
-
|
| 208 |
-
.ranking-comments {
|
| 209 |
-
margin: 20px 0;
|
| 210 |
-
}
|
| 211 |
-
|
| 212 |
-
.ranking-comments label {
|
| 213 |
-
display: block;
|
| 214 |
-
color: #2c3e50;
|
| 215 |
-
margin-bottom: 8px;
|
| 216 |
-
font-weight: 500;
|
| 217 |
-
}
|
| 218 |
-
|
| 219 |
-
.ranking-comments textarea {
|
| 220 |
-
width: 100%;
|
| 221 |
-
padding: 10px;
|
| 222 |
-
border: 2px solid #e9ecef;
|
| 223 |
-
border-radius: 8px;
|
| 224 |
-
font-family: inherit;
|
| 225 |
-
resize: vertical;
|
| 226 |
-
}
|
| 227 |
-
|
| 228 |
-
.ranking-actions {
|
| 229 |
-
display: flex;
|
| 230 |
-
gap: 15px;
|
| 231 |
-
justify-content: flex-end;
|
| 232 |
-
margin-top: 20px;
|
| 233 |
-
}
|
| 234 |
-
|
| 235 |
-
.btn-primary, .btn-secondary {
|
| 236 |
-
padding: 10px 24px;
|
| 237 |
-
border: none;
|
| 238 |
-
border-radius: 8px;
|
| 239 |
-
font-size: 1rem;
|
| 240 |
-
cursor: pointer;
|
| 241 |
-
transition: all 0.3s ease;
|
| 242 |
-
font-weight: 500;
|
| 243 |
-
}
|
| 244 |
-
|
| 245 |
-
.btn-primary {
|
| 246 |
-
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 247 |
-
color: white;
|
| 248 |
-
}
|
| 249 |
-
|
| 250 |
-
.btn-primary:hover:not(:disabled) {
|
| 251 |
-
transform: translateY(-2px);
|
| 252 |
-
box-shadow: 0 6px 20px rgba(102, 126, 234, 0.4);
|
| 253 |
-
}
|
| 254 |
-
|
| 255 |
-
.btn-primary:disabled {
|
| 256 |
-
background: #6c757d;
|
| 257 |
-
cursor: not-allowed;
|
| 258 |
-
}
|
| 259 |
-
|
| 260 |
-
.btn-secondary {
|
| 261 |
-
background: #e9ecef;
|
| 262 |
-
color: #495057;
|
| 263 |
-
}
|
| 264 |
-
|
| 265 |
-
.btn-secondary:hover {
|
| 266 |
-
background: #dee2e6;
|
| 267 |
-
}
|
| 268 |
-
|
| 269 |
-
.ranking-trigger-btn:hover {
|
| 270 |
-
transform: translateY(-2px) scale(1.05);
|
| 271 |
-
box-shadow: 0 6px 20px rgba(102, 126, 234, 0.6);
|
| 272 |
-
}
|
| 273 |
-
|
| 274 |
-
@media (max-width: 600px) {
|
| 275 |
-
.ranking-modal-content {
|
| 276 |
-
padding: 20px;
|
| 277 |
-
}
|
| 278 |
-
|
| 279 |
-
.ranking-star {
|
| 280 |
-
font-size: 24px;
|
| 281 |
-
}
|
| 282 |
-
|
| 283 |
-
.ranking-trigger-btn {
|
| 284 |
-
bottom: 70px;
|
| 285 |
-
padding: 8px 16px;
|
| 286 |
-
font-size: 12px;
|
| 287 |
-
}
|
| 288 |
-
}
|
| 289 |
-
`;
|
| 290 |
-
|
| 291 |
-
document.head.appendChild(styles);
|
| 292 |
-
document.body.appendChild(modal);
|
| 293 |
-
document.body.appendChild(triggerButton);
|
| 294 |
-
|
| 295 |
-
this.populateCategories();
|
| 296 |
-
}
|
| 297 |
-
|
| 298 |
-
populateCategories() {
|
| 299 |
-
const container = document.getElementById('ranking-categories');
|
| 300 |
-
container.innerHTML = '';
|
| 301 |
-
|
| 302 |
-
this.rankingCategories.forEach(category => {
|
| 303 |
-
const categoryDiv = document.createElement('div');
|
| 304 |
-
categoryDiv.className = 'ranking-category';
|
| 305 |
-
categoryDiv.dataset.categoryId = category.id;
|
| 306 |
-
|
| 307 |
-
const criteriaHtml = category.criteria.map(c => `<li>${c}</li>`).join('');
|
| 308 |
-
|
| 309 |
-
categoryDiv.innerHTML = `
|
| 310 |
-
<h3>${category.name}</h3>
|
| 311 |
-
<p class="ranking-category-description">${category.description}</p>
|
| 312 |
-
<ul class="ranking-criteria">${criteriaHtml}</ul>
|
| 313 |
-
<div class="ranking-stars" data-category="${category.id}">
|
| 314 |
-
${[1, 2, 3, 4, 5].map(i =>
|
| 315 |
-
`<span class="ranking-star" data-rating="${i}">★</span>`
|
| 316 |
-
).join('')}
|
| 317 |
-
</div>
|
| 318 |
-
`;
|
| 319 |
-
|
| 320 |
-
container.appendChild(categoryDiv);
|
| 321 |
-
});
|
| 322 |
-
|
| 323 |
-
// Setup star interactions
|
| 324 |
-
this.setupStarInteractions();
|
| 325 |
-
}
|
| 326 |
-
|
| 327 |
-
setupStarInteractions() {
|
| 328 |
-
const starContainers = document.querySelectorAll('.ranking-stars');
|
| 329 |
-
|
| 330 |
-
starContainers.forEach(container => {
|
| 331 |
-
const stars = container.querySelectorAll('.ranking-star');
|
| 332 |
-
const categoryId = container.dataset.category;
|
| 333 |
-
|
| 334 |
-
stars.forEach((star, index) => {
|
| 335 |
-
star.addEventListener('mouseenter', () => {
|
| 336 |
-
this.highlightStars(stars, index + 1);
|
| 337 |
-
});
|
| 338 |
-
|
| 339 |
-
star.addEventListener('click', () => {
|
| 340 |
-
this.selectRating(categoryId, index + 1);
|
| 341 |
-
this.markStarsAsSelected(stars, index + 1);
|
| 342 |
-
this.updateSubmitButton();
|
| 343 |
-
});
|
| 344 |
-
});
|
| 345 |
-
|
| 346 |
-
container.addEventListener('mouseleave', () => {
|
| 347 |
-
const currentRating = this.getCurrentRating(categoryId);
|
| 348 |
-
if (currentRating > 0) {
|
| 349 |
-
this.markStarsAsSelected(stars, currentRating);
|
| 350 |
-
} else {
|
| 351 |
-
this.highlightStars(stars, 0);
|
| 352 |
-
}
|
| 353 |
-
});
|
| 354 |
-
});
|
| 355 |
-
}
|
| 356 |
-
|
| 357 |
-
highlightStars(stars, count) {
|
| 358 |
-
stars.forEach((star, index) => {
|
| 359 |
-
if (index < count) {
|
| 360 |
-
star.classList.add('hover');
|
| 361 |
-
} else {
|
| 362 |
-
star.classList.remove('hover');
|
| 363 |
-
}
|
| 364 |
-
});
|
| 365 |
-
}
|
| 366 |
-
|
| 367 |
-
markStarsAsSelected(stars, count) {
|
| 368 |
-
stars.forEach((star, index) => {
|
| 369 |
-
if (index < count) {
|
| 370 |
-
star.classList.add('selected');
|
| 371 |
-
star.classList.remove('hover');
|
| 372 |
-
} else {
|
| 373 |
-
star.classList.remove('selected');
|
| 374 |
-
star.classList.remove('hover');
|
| 375 |
-
}
|
| 376 |
-
});
|
| 377 |
-
}
|
| 378 |
-
|
| 379 |
-
selectRating(categoryId, rating) {
|
| 380 |
-
if (!this.currentRound) {
|
| 381 |
-
this.currentRound = {
|
| 382 |
-
timestamp: Date.now(),
|
| 383 |
-
ratings: {},
|
| 384 |
-
comments: ''
|
| 385 |
-
};
|
| 386 |
-
}
|
| 387 |
-
|
| 388 |
-
this.currentRound.ratings[categoryId] = rating;
|
| 389 |
-
}
|
| 390 |
-
|
| 391 |
-
getCurrentRating(categoryId) {
|
| 392 |
-
return this.currentRound?.ratings[categoryId] || 0;
|
| 393 |
-
}
|
| 394 |
-
|
| 395 |
-
setupEventListeners() {
|
| 396 |
-
const modal = document.getElementById('ranking-modal');
|
| 397 |
-
const triggerBtn = document.getElementById('ranking-trigger-btn');
|
| 398 |
-
const skipBtn = document.getElementById('skip-ranking-btn');
|
| 399 |
-
const submitBtn = document.getElementById('submit-ranking-btn');
|
| 400 |
-
const commentsInput = document.getElementById('ranking-comments-input');
|
| 401 |
-
|
| 402 |
-
// Show modal
|
| 403 |
-
triggerBtn.addEventListener('click', () => {
|
| 404 |
-
this.showRankingModal();
|
| 405 |
-
});
|
| 406 |
-
|
| 407 |
-
// Skip ranking
|
| 408 |
-
skipBtn.addEventListener('click', () => {
|
| 409 |
-
this.hideRankingModal();
|
| 410 |
-
this.currentRound = null;
|
| 411 |
-
});
|
| 412 |
-
|
| 413 |
-
// Submit ranking
|
| 414 |
-
submitBtn.addEventListener('click', () => {
|
| 415 |
-
this.submitRanking();
|
| 416 |
-
});
|
| 417 |
-
|
| 418 |
-
// Update comments
|
| 419 |
-
commentsInput.addEventListener('input', (e) => {
|
| 420 |
-
if (this.currentRound) {
|
| 421 |
-
this.currentRound.comments = e.target.value;
|
| 422 |
-
}
|
| 423 |
-
});
|
| 424 |
-
|
| 425 |
-
// Close modal on background click
|
| 426 |
-
modal.addEventListener('click', (e) => {
|
| 427 |
-
if (e.target === modal) {
|
| 428 |
-
this.hideRankingModal();
|
| 429 |
-
}
|
| 430 |
-
});
|
| 431 |
-
|
| 432 |
-
// Listen for round completion events
|
| 433 |
-
document.addEventListener('gameRoundComplete', (event) => {
|
| 434 |
-
this.onRoundComplete(event.detail);
|
| 435 |
-
});
|
| 436 |
-
}
|
| 437 |
-
|
| 438 |
-
updateSubmitButton() {
|
| 439 |
-
const submitBtn = document.getElementById('submit-ranking-btn');
|
| 440 |
-
const allRated = this.rankingCategories.every(category =>
|
| 441 |
-
this.getCurrentRating(category.id) > 0
|
| 442 |
-
);
|
| 443 |
-
|
| 444 |
-
submitBtn.disabled = !allRated;
|
| 445 |
-
}
|
| 446 |
-
|
| 447 |
-
showRankingModal() {
|
| 448 |
-
const modal = document.getElementById('ranking-modal');
|
| 449 |
-
modal.classList.add('active');
|
| 450 |
-
|
| 451 |
-
// Reset current round if needed
|
| 452 |
-
if (!this.currentRound) {
|
| 453 |
-
this.currentRound = {
|
| 454 |
-
timestamp: Date.now(),
|
| 455 |
-
ratings: {},
|
| 456 |
-
comments: ''
|
| 457 |
-
};
|
| 458 |
-
}
|
| 459 |
-
|
| 460 |
-
// Clear previous selections
|
| 461 |
-
this.resetUI();
|
| 462 |
-
}
|
| 463 |
-
|
| 464 |
-
hideRankingModal() {
|
| 465 |
-
const modal = document.getElementById('ranking-modal');
|
| 466 |
-
modal.classList.remove('active');
|
| 467 |
-
}
|
| 468 |
-
|
| 469 |
-
resetUI() {
|
| 470 |
-
// Clear all star selections
|
| 471 |
-
document.querySelectorAll('.ranking-star').forEach(star => {
|
| 472 |
-
star.classList.remove('selected', 'hover');
|
| 473 |
-
});
|
| 474 |
-
|
| 475 |
-
// Clear comments
|
| 476 |
-
document.getElementById('ranking-comments-input').value = '';
|
| 477 |
-
|
| 478 |
-
// Disable submit button
|
| 479 |
-
document.getElementById('submit-ranking-btn').disabled = true;
|
| 480 |
-
}
|
| 481 |
-
|
| 482 |
-
submitRanking() {
|
| 483 |
-
if (!this.currentRound) return;
|
| 484 |
-
|
| 485 |
-
// Add metadata
|
| 486 |
-
this.currentRound.submittedAt = Date.now();
|
| 487 |
-
this.currentRound.modelId = window.testGameRunner?.modelConfig?.modelId || 'unknown';
|
| 488 |
-
|
| 489 |
-
// Calculate average rating
|
| 490 |
-
const ratings = Object.values(this.currentRound.ratings);
|
| 491 |
-
this.currentRound.averageRating = ratings.reduce((a, b) => a + b, 0) / ratings.length;
|
| 492 |
-
|
| 493 |
-
// Save ranking
|
| 494 |
-
this.rankings.rounds.push(this.currentRound);
|
| 495 |
-
|
| 496 |
-
// Dispatch event for test runner
|
| 497 |
-
document.dispatchEvent(new CustomEvent('userRanking', {
|
| 498 |
-
detail: this.currentRound
|
| 499 |
-
}));
|
| 500 |
-
|
| 501 |
-
// Show confirmation
|
| 502 |
-
this.showConfirmation();
|
| 503 |
-
|
| 504 |
-
// Reset
|
| 505 |
-
this.hideRankingModal();
|
| 506 |
-
this.currentRound = null;
|
| 507 |
-
|
| 508 |
-
console.log('Ranking submitted:', this.rankings);
|
| 509 |
-
}
|
| 510 |
-
|
| 511 |
-
showConfirmation() {
|
| 512 |
-
const confirmation = document.createElement('div');
|
| 513 |
-
confirmation.style.cssText = `
|
| 514 |
-
position: fixed;
|
| 515 |
-
bottom: 100px;
|
| 516 |
-
left: 50%;
|
| 517 |
-
transform: translateX(-50%);
|
| 518 |
-
background: #28a745;
|
| 519 |
-
color: white;
|
| 520 |
-
padding: 15px 30px;
|
| 521 |
-
border-radius: 8px;
|
| 522 |
-
box-shadow: 0 4px 15px rgba(40, 167, 69, 0.4);
|
| 523 |
-
z-index: 1001;
|
| 524 |
-
animation: slideInUp 0.3s ease;
|
| 525 |
-
`;
|
| 526 |
-
confirmation.textContent = '✓ Thank you for your feedback!';
|
| 527 |
-
|
| 528 |
-
document.body.appendChild(confirmation);
|
| 529 |
-
|
| 530 |
-
setTimeout(() => {
|
| 531 |
-
confirmation.style.animation = 'slideOutDown 0.3s ease';
|
| 532 |
-
setTimeout(() => confirmation.remove(), 300);
|
| 533 |
-
}, 2000);
|
| 534 |
-
}
|
| 535 |
-
|
| 536 |
-
onRoundComplete(roundDetails) {
|
| 537 |
-
// Store round details for context
|
| 538 |
-
if (!this.currentRound) {
|
| 539 |
-
this.currentRound = {
|
| 540 |
-
timestamp: Date.now(),
|
| 541 |
-
ratings: {},
|
| 542 |
-
comments: '',
|
| 543 |
-
roundDetails: roundDetails
|
| 544 |
-
};
|
| 545 |
-
} else {
|
| 546 |
-
this.currentRound.roundDetails = roundDetails;
|
| 547 |
-
}
|
| 548 |
-
|
| 549 |
-
// Show ranking trigger button
|
| 550 |
-
const triggerBtn = document.getElementById('ranking-trigger-btn');
|
| 551 |
-
triggerBtn.style.display = 'block';
|
| 552 |
-
|
| 553 |
-
// Auto-show modal after a short delay (optional)
|
| 554 |
-
if (window.testGameRunner?.modelConfig?.autoShowRanking) {
|
| 555 |
-
setTimeout(() => this.showRankingModal(), 1500);
|
| 556 |
-
}
|
| 557 |
-
}
|
| 558 |
-
|
| 559 |
-
exportRankings() {
|
| 560 |
-
const exportData = {
|
| 561 |
-
...this.rankings,
|
| 562 |
-
exportedAt: new Date().toISOString(),
|
| 563 |
-
modelId: window.testGameRunner?.modelConfig?.modelId || 'unknown'
|
| 564 |
-
};
|
| 565 |
-
|
| 566 |
-
return exportData;
|
| 567 |
-
}
|
| 568 |
-
|
| 569 |
-
getRankingSummary() {
|
| 570 |
-
if (this.rankings.rounds.length === 0) {
|
| 571 |
-
return null;
|
| 572 |
-
}
|
| 573 |
-
|
| 574 |
-
const summary = {
|
| 575 |
-
totalRounds: this.rankings.rounds.length,
|
| 576 |
-
averageRatings: {},
|
| 577 |
-
categoryBreakdown: {},
|
| 578 |
-
comments: []
|
| 579 |
-
};
|
| 580 |
-
|
| 581 |
-
// Calculate average ratings per category
|
| 582 |
-
this.rankingCategories.forEach(category => {
|
| 583 |
-
const ratings = this.rankings.rounds
|
| 584 |
-
.map(r => r.ratings[category.id])
|
| 585 |
-
.filter(r => r !== undefined);
|
| 586 |
-
|
| 587 |
-
if (ratings.length > 0) {
|
| 588 |
-
summary.averageRatings[category.id] =
|
| 589 |
-
ratings.reduce((a, b) => a + b, 0) / ratings.length;
|
| 590 |
-
|
| 591 |
-
// Distribution of ratings
|
| 592 |
-
summary.categoryBreakdown[category.id] = {
|
| 593 |
-
1: ratings.filter(r => r === 1).length,
|
| 594 |
-
2: ratings.filter(r => r === 2).length,
|
| 595 |
-
3: ratings.filter(r => r === 3).length,
|
| 596 |
-
4: ratings.filter(r => r === 4).length,
|
| 597 |
-
5: ratings.filter(r => r === 5).length
|
| 598 |
-
};
|
| 599 |
-
}
|
| 600 |
-
});
|
| 601 |
-
|
| 602 |
-
// Collect all comments
|
| 603 |
-
summary.comments = this.rankings.rounds
|
| 604 |
-
.filter(r => r.comments)
|
| 605 |
-
.map(r => ({
|
| 606 |
-
timestamp: r.timestamp,
|
| 607 |
-
comment: r.comments,
|
| 608 |
-
averageRating: r.averageRating
|
| 609 |
-
}));
|
| 610 |
-
|
| 611 |
-
return summary;
|
| 612 |
-
}
|
| 613 |
-
}
|
| 614 |
-
|
| 615 |
-
// Initialize when in test mode
|
| 616 |
-
window.addEventListener('DOMContentLoaded', () => {
|
| 617 |
-
const urlParams = new URLSearchParams(window.location.search);
|
| 618 |
-
if (urlParams.get('testMode') === 'true') {
|
| 619 |
-
window.userRankingInterface = new UserRankingInterface();
|
| 620 |
-
|
| 621 |
-
// Add CSS animation keyframes
|
| 622 |
-
const animationStyles = document.createElement('style');
|
| 623 |
-
animationStyles.textContent = `
|
| 624 |
-
@keyframes slideInUp {
|
| 625 |
-
from {
|
| 626 |
-
transform: translate(-50%, 100%);
|
| 627 |
-
opacity: 0;
|
| 628 |
-
}
|
| 629 |
-
to {
|
| 630 |
-
transform: translate(-50%, 0);
|
| 631 |
-
opacity: 1;
|
| 632 |
-
}
|
| 633 |
-
}
|
| 634 |
-
|
| 635 |
-
@keyframes slideOutDown {
|
| 636 |
-
from {
|
| 637 |
-
transform: translate(-50%, 0);
|
| 638 |
-
opacity: 1;
|
| 639 |
-
}
|
| 640 |
-
to {
|
| 641 |
-
transform: translate(-50%, 100%);
|
| 642 |
-
opacity: 0;
|
| 643 |
-
}
|
| 644 |
-
}
|
| 645 |
-
`;
|
| 646 |
-
document.head.appendChild(animationStyles);
|
| 647 |
-
}
|
| 648 |
-
});
|
| 649 |
-
|
| 650 |
-
export { UserRankingInterface };
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
test-direct.js
DELETED
|
@@ -1,28 +0,0 @@
|
|
| 1 |
-
import { AIService } from './src/aiService.js';
|
| 2 |
-
|
| 3 |
-
// Force local mode
|
| 4 |
-
const originalSearch = window.location.search;
|
| 5 |
-
window.location.search = '?local=true';
|
| 6 |
-
|
| 7 |
-
const ai = new AIService();
|
| 8 |
-
|
| 9 |
-
console.log('Testing direct AI connection...');
|
| 10 |
-
console.log('Config:', {
|
| 11 |
-
url: ai.apiUrl,
|
| 12 |
-
model: ai.model,
|
| 13 |
-
isLocal: ai.isLocalMode
|
| 14 |
-
});
|
| 15 |
-
|
| 16 |
-
const testPassage = "The ancient library contained thousands of manuscripts, each one carefully preserved by generations of scholars who dedicated their lives to knowledge.";
|
| 17 |
-
|
| 18 |
-
try {
|
| 19 |
-
console.log('\nTesting word selection...');
|
| 20 |
-
const words = await ai.selectSignificantWords(testPassage, 2, 3);
|
| 21 |
-
console.log('Selected words:', words);
|
| 22 |
-
console.log('✅ Success!');
|
| 23 |
-
} catch (error) {
|
| 24 |
-
console.error('❌ Error:', error.message);
|
| 25 |
-
}
|
| 26 |
-
|
| 27 |
-
// Restore original search
|
| 28 |
-
window.location.search = originalSearch;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
test-local-llm.js
DELETED
|
@@ -1,155 +0,0 @@
|
|
| 1 |
-
#!/usr/bin/env node
|
| 2 |
-
|
| 3 |
-
// Stress test for local LLM on port 1234
|
| 4 |
-
// Tests word selection functionality with Gutenberg passages
|
| 5 |
-
|
| 6 |
-
import http from 'http';
|
| 7 |
-
|
| 8 |
-
// Sample Gutenberg passages for testing
|
| 9 |
-
const testPassages = [
|
| 10 |
-
"The sun was shining brightly on the sea, shining with all his might. He did his very best to make the billows smooth and bright. And this was odd, because it was the middle of the night.",
|
| 11 |
-
"It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity.",
|
| 12 |
-
"In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on.",
|
| 13 |
-
"Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little.",
|
| 14 |
-
"It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife. However little known the feelings or views of such a man may be."
|
| 15 |
-
];
|
| 16 |
-
|
| 17 |
-
// Word selection prompt template (based on cloze reader's format)
|
| 18 |
-
function createWordSelectionPrompt(passage, level = 1) {
|
| 19 |
-
const wordCount = level < 6 ? 1 : level < 11 ? 2 : 3;
|
| 20 |
-
const minLength = level < 3 ? 4 : 5;
|
| 21 |
-
const maxLength = level < 3 ? 7 : level < 5 ? 10 : 14;
|
| 22 |
-
|
| 23 |
-
return {
|
| 24 |
-
model: "gemma-3-12b",
|
| 25 |
-
messages: [
|
| 26 |
-
{
|
| 27 |
-
role: "system",
|
| 28 |
-
content: "You are a vocabulary expert who selects appropriate words for cloze exercises."
|
| 29 |
-
},
|
| 30 |
-
{
|
| 31 |
-
role: "user",
|
| 32 |
-
content: `Select ${wordCount} word${wordCount > 1 ? 's' : ''} from this passage for a cloze exercise.
|
| 33 |
-
|
| 34 |
-
Passage: "${passage}"
|
| 35 |
-
|
| 36 |
-
Requirements:
|
| 37 |
-
- Select exactly ${wordCount} different word${wordCount > 1 ? 's' : ''}
|
| 38 |
-
- Each word must be ${minLength}-${maxLength} letters long
|
| 39 |
-
- Words must be meaningful nouns, verbs, adjectives, or adverbs
|
| 40 |
-
- Avoid pronouns, articles, and common words
|
| 41 |
-
- Return ONLY the selected word${wordCount > 1 ? 's' : ''}, ${wordCount > 1 ? 'comma-separated' : 'nothing else'}
|
| 42 |
-
|
| 43 |
-
Selected word${wordCount > 1 ? 's' : ''}:`
|
| 44 |
-
}
|
| 45 |
-
],
|
| 46 |
-
temperature: 0.7,
|
| 47 |
-
max_tokens: 50
|
| 48 |
-
};
|
| 49 |
-
}
|
| 50 |
-
|
| 51 |
-
// Function to make HTTP request to local LLM
|
| 52 |
-
function testLLMConnection(passage, testNumber) {
|
| 53 |
-
return new Promise((resolve, reject) => {
|
| 54 |
-
const prompt = createWordSelectionPrompt(passage, Math.floor(Math.random() * 10) + 1);
|
| 55 |
-
const data = JSON.stringify(prompt);
|
| 56 |
-
|
| 57 |
-
const options = {
|
| 58 |
-
hostname: 'localhost',
|
| 59 |
-
port: 1234,
|
| 60 |
-
path: '/v1/chat/completions',
|
| 61 |
-
method: 'POST',
|
| 62 |
-
headers: {
|
| 63 |
-
'Content-Type': 'application/json',
|
| 64 |
-
'Content-Length': data.length
|
| 65 |
-
}
|
| 66 |
-
};
|
| 67 |
-
|
| 68 |
-
console.log(`\n=== Test ${testNumber} ===`);
|
| 69 |
-
console.log(`Passage: "${passage.substring(0, 80)}..."`);
|
| 70 |
-
console.log(`Sending request to http://localhost:1234/v1/chat/completions`);
|
| 71 |
-
|
| 72 |
-
const startTime = Date.now();
|
| 73 |
-
|
| 74 |
-
const req = http.request(options, (res) => {
|
| 75 |
-
let responseData = '';
|
| 76 |
-
|
| 77 |
-
res.on('data', (chunk) => {
|
| 78 |
-
responseData += chunk;
|
| 79 |
-
});
|
| 80 |
-
|
| 81 |
-
res.on('end', () => {
|
| 82 |
-
const elapsed = Date.now() - startTime;
|
| 83 |
-
console.log(`Response received in ${elapsed}ms`);
|
| 84 |
-
console.log(`Status: ${res.statusCode}`);
|
| 85 |
-
|
| 86 |
-
try {
|
| 87 |
-
const parsed = JSON.parse(responseData);
|
| 88 |
-
if (parsed.choices && parsed.choices[0] && parsed.choices[0].message) {
|
| 89 |
-
const selectedWords = parsed.choices[0].message.content.trim();
|
| 90 |
-
console.log(`Selected words: ${selectedWords}`);
|
| 91 |
-
console.log(`✓ Test ${testNumber} PASSED`);
|
| 92 |
-
resolve({ success: true, words: selectedWords, time: elapsed });
|
| 93 |
-
} else {
|
| 94 |
-
console.log(`Response structure unexpected:`, parsed);
|
| 95 |
-
resolve({ success: false, error: 'Invalid response structure', time: elapsed });
|
| 96 |
-
}
|
| 97 |
-
} catch (error) {
|
| 98 |
-
console.log(`Failed to parse response:`, error.message);
|
| 99 |
-
console.log(`Raw response:`, responseData.substring(0, 200));
|
| 100 |
-
resolve({ success: false, error: error.message, time: elapsed });
|
| 101 |
-
}
|
| 102 |
-
});
|
| 103 |
-
});
|
| 104 |
-
|
| 105 |
-
req.on('error', (error) => {
|
| 106 |
-
const elapsed = Date.now() - startTime;
|
| 107 |
-
console.log(`✗ Test ${testNumber} FAILED - Connection error after ${elapsed}ms`);
|
| 108 |
-
console.log(`Error: ${error.message}`);
|
| 109 |
-
resolve({ success: false, error: error.message, time: elapsed });
|
| 110 |
-
});
|
| 111 |
-
|
| 112 |
-
req.write(data);
|
| 113 |
-
req.end();
|
| 114 |
-
});
|
| 115 |
-
}
|
| 116 |
-
|
| 117 |
-
// Run stress test
|
| 118 |
-
async function runStressTest() {
|
| 119 |
-
console.log('Starting stress test for Gemma-3-12b on localhost:1234');
|
| 120 |
-
console.log('Testing word selection for cloze reader game...\n');
|
| 121 |
-
|
| 122 |
-
const results = [];
|
| 123 |
-
|
| 124 |
-
// Test each passage
|
| 125 |
-
for (let i = 0; i < testPassages.length; i++) {
|
| 126 |
-
const result = await testLLMConnection(testPassages[i], i + 1);
|
| 127 |
-
results.push(result);
|
| 128 |
-
|
| 129 |
-
// Small delay between tests
|
| 130 |
-
await new Promise(resolve => setTimeout(resolve, 500));
|
| 131 |
-
}
|
| 132 |
-
|
| 133 |
-
// Summary
|
| 134 |
-
console.log('\n=== STRESS TEST SUMMARY ===');
|
| 135 |
-
const successful = results.filter(r => r.success).length;
|
| 136 |
-
const failed = results.length - successful;
|
| 137 |
-
const avgTime = results.reduce((sum, r) => sum + r.time, 0) / results.length;
|
| 138 |
-
|
| 139 |
-
console.log(`Total tests: ${results.length}`);
|
| 140 |
-
console.log(`Successful: ${successful}`);
|
| 141 |
-
console.log(`Failed: ${failed}`);
|
| 142 |
-
console.log(`Average response time: ${avgTime.toFixed(0)}ms`);
|
| 143 |
-
console.log(`Success rate: ${(successful / results.length * 100).toFixed(1)}%`);
|
| 144 |
-
|
| 145 |
-
if (successful === results.length) {
|
| 146 |
-
console.log('\n✓ All tests passed! The Gemma-3-12b server is functioning correctly for cloze reader.');
|
| 147 |
-
} else if (successful > 0) {
|
| 148 |
-
console.log('\n⚠ Some tests passed. The server is partially functional.');
|
| 149 |
-
} else {
|
| 150 |
-
console.log('\n✗ All tests failed. Please check if the server is running on port 1234.');
|
| 151 |
-
}
|
| 152 |
-
}
|
| 153 |
-
|
| 154 |
-
// Run the test
|
| 155 |
-
runStressTest().catch(console.error);
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|