milwright commited on
Commit
f4795d7
·
1 Parent(s): 4f630fa

clean: remove test framework and unnecessary files

Browse files
.claude/settings.local.json DELETED
@@ -1,15 +0,0 @@
1
- {
2
- "permissions": {
3
- "allow": [
4
- "Bash(git checkout:*)",
5
- "Bash(cp:*)",
6
- "Bash(rm:*)",
7
- "Bash(git commit:*)",
8
- "Bash(git push:*)",
9
- "Bash(git add:*)",
10
- "Bash(grep:*)",
11
- "Bash(node:*)"
12
- ],
13
- "deny": []
14
- }
15
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
CLAUDE.local.md DELETED
@@ -1,2 +0,0 @@
1
- - DO NOT SIGN OFF COMMIT MESSAGES WITH CLAUDE AS AN AUTHOR
2
- - Remember to review this file at the end of prompt engineering related changes or when the user tells you to or at the end of a long session. If changes have been made to OpenRouter prompt language or programming logic, low or high level, then update this file accordingly.
 
 
 
README-testing-framework.md DELETED
@@ -1,217 +0,0 @@
1
- # Cloze Reader Model Testing Framework
2
-
3
- A comprehensive testing system for evaluating AI models across all tasks in the Cloze Reader application, including both OpenRouter and local LLM (LM Studio) models.
4
-
5
- ## Features
6
-
7
- ### 🎯 Comprehensive Testing
8
- - **Word Selection Testing**: Evaluates vocabulary selection accuracy, difficulty matching, and response quality
9
- - **Contextualization Testing**: Tests historical and literary context generation for books and authors
10
- - **Chat Hints Testing**: Assesses all 4 question types (part of speech, sentence role, word category, synonym)
11
- - **Performance Monitoring**: Tracks response times, success rates, and error patterns
12
- - **User Satisfaction Ratings**: Collect user feedback on model performance after each round
13
-
14
- ### 🏠 Local LLM Support
15
- - **LM Studio Integration**: Auto-detects models running on port 1234
16
- - **Real-time Status**: Shows connection status and available models
17
- - **Response Cleaning**: Handles local LLM output artifacts automatically
18
- - **Fallback Testing**: Graceful handling when local server is unavailable
19
-
20
- ### 📊 Advanced Analytics
21
- - **Multi-format Reports**: JSON, CSV, and Markdown outputs
22
- - **Performance Comparisons**: Side-by-side model analysis
23
- - **Quality Scoring**: Detailed evaluation metrics for each task
24
- - **Interactive Game Testing**: Real-time performance monitoring during gameplay
25
- - **User Ranking Integration**: 5-star ratings for word selection, passage quality, hint helpfulness, and overall experience
26
-
27
- ## Quick Start
28
-
29
- ### 1. Start the Testing Interface
30
- ```bash
31
- # Start development server
32
- make dev
33
- # or
34
- python local-server.py 8000
35
-
36
- # Open testing interface
37
- open http://localhost:8000/model-testing.html
38
- ```
39
-
40
- ### 2. Setup Local LLM (Optional)
41
- ```bash
42
- # Start LM Studio server on port 1234
43
- # Load your preferred model (e.g., Gemma-3-12b, Llama-3.1-8b)
44
- # The framework will auto-detect available models
45
- ```
46
-
47
- ### 3. Run Tests
48
- 1. Select models to test (OpenRouter and/or local models)
49
- 2. Click "Start Comprehensive Test" for full evaluation
50
- 3. Or click "Test Selected Model in Game" for interactive testing
51
- 4. Results are automatically saved to the `/output` folder
52
-
53
- ## Test Results
54
-
55
- ### CSV Output Format
56
- Results are saved as timestamped CSV files with columns for:
57
- - Model performance metrics (overall score, success rates)
58
- - Response time analytics (average, min, max)
59
- - Task-specific scores (word selection, contextualization, chat hints)
60
- - Error rates and reliability metrics
61
- - User satisfaction ratings (1-5 stars per category)
62
- - User comments and feedback count
63
-
64
- ### Game Testing Output
65
- Interactive game sessions generate JSON reports with:
66
- - Real-time AI interaction logs
67
- - User performance analytics
68
- - Response time breakdowns
69
- - Error tracking and categorization
70
- - User satisfaction ratings per round
71
- - Qualitative feedback and comments
72
-
73
- ## Model Categories
74
-
75
- ### OpenRouter Models
76
- - GPT-4o, GPT-4o Mini
77
- - Claude 3.5 Sonnet, Claude 3 Haiku
78
- - Gemini Pro 1.5
79
- - Llama 3.1 (8B, 70B)
80
- - Mistral 7B, Phi-3 Medium, Qwen 2 7B
81
-
82
- ### Local LLM Models (LM Studio)
83
- - Auto-detected from running server
84
- - Supports any OpenAI-compatible model
85
- - Common options: Gemma-3-12b, Llama-3.1-8b, Mistral-7b
86
-
87
- ## Testing Methodology
88
-
89
- ### Word Selection Evaluation
90
- - **Accuracy**: Words exist in source passage
91
- - **Difficulty Matching**: Length and complexity appropriate for level
92
- - **Quality Scoring**: Avoids overly common words at higher difficulties
93
- - **Performance**: Response time and success rate tracking
94
- - **User Rating**: 5-star scale for vocabulary appropriateness
95
-
96
- ### Contextualization Assessment
97
- - **Relevance**: Mentions book title, author, historical context
98
- - **Educational Value**: Appropriate for language learners
99
- - **Completeness**: Balanced length (100-500 characters)
100
- - **Literary Terms**: Uses appropriate academic vocabulary
101
- - **User Rating**: Passage quality and educational value scoring
102
-
103
- ### Chat Hints Analysis
104
- - **Question Type Coverage**: All 4 hint categories tested
105
- - **Educational Appropriateness**: Helps without revealing answers
106
- - **Response Quality**: Clear, concise, and helpful explanations
107
- - **Consistency**: Performance across different question types
108
- - **User Rating**: Helpfulness and clarity of AI hints
109
-
110
- ### User Experience Rating
111
- After each round, users can rate:
112
- - **Word Selection Quality** (1-5 stars)
113
- - **Passage Selection** (1-5 stars)
114
- - **Hint Helpfulness** (1-5 stars)
115
- - **Overall Experience** (1-5 stars)
116
- - **Optional Comments** for detailed feedback
117
-
118
- ## Architecture
119
-
120
- ### Core Components
121
- - **ModelTestingFramework**: Main testing orchestrator
122
- - **TestAIService**: Performance-tracking AI service wrapper
123
- - **TestGameRunner**: Real-time game session monitoring
124
- - **TestReportGenerator**: Multi-format report generation
125
-
126
- ### File Structure
127
- ```
128
- src/
129
- ├── modelTestingFramework.js # Main testing logic
130
- ├── testAIService.js # AI service wrapper
131
- ├── testGameRunner.js # Game monitoring
132
- └── testReportGenerator.js # Report generation
133
-
134
- model-testing.html # Testing interface UI
135
- output/ # Test results folder
136
- ```
137
-
138
- ## Usage Examples
139
-
140
- ### Automated Testing
141
- ```javascript
142
- import { ModelTestingFramework } from './src/modelTestingFramework.js';
143
-
144
- const framework = new ModelTestingFramework();
145
- const results = await framework.runComprehensiveTest();
146
- console.log('Results saved to output folder');
147
- ```
148
-
149
- ### Custom Model Testing
150
- ```javascript
151
- const customModel = {
152
- id: 'my-local-model',
153
- name: 'Custom Local Model',
154
- provider: 'local'
155
- };
156
-
157
- const result = await framework.testModel(customModel);
158
- ```
159
-
160
- ### Report Generation
161
- ```javascript
162
- import { TestReportGenerator } from './src/testReportGenerator.js';
163
-
164
- const generator = new TestReportGenerator();
165
- const reports = await generator.generateAllReports(testResults);
166
- // Generates JSON, CSV, and Markdown reports
167
- ```
168
-
169
- ## Integration with Existing Codebase
170
-
171
- The testing framework integrates seamlessly with the existing Cloze Reader architecture:
172
-
173
- - **aiService.js**: Framework uses the same AI service patterns
174
- - **conversationManager.js**: Chat hint testing leverages existing conversation logic
175
- - **clozeGameEngine.js**: Game testing monitors actual game interactions
176
- - **bookDataService.js**: Uses same book data and quality filtering
177
-
178
- ## Troubleshooting
179
-
180
- ### Local LLM Issues
181
- - Ensure LM Studio is running on port 1234
182
- - Check that a model is loaded and ready
183
- - Verify CORS is enabled in LM Studio settings
184
-
185
- ### API Key Issues
186
- - OpenRouter API key must be set via environment variable or meta tag
187
- - Local models don't require API keys
188
-
189
- ### Performance Issues
190
- - Large model testing can take 10-30 minutes
191
- - Consider testing fewer models or specific categories
192
- - Monitor network connectivity for OpenRouter models
193
-
194
- ## Contributing
195
-
196
- The testing framework is designed to be extensible:
197
-
198
- 1. Add new model providers in `ModelTestingFramework.constructor()`
199
- 2. Extend evaluation metrics in the respective `evaluate*` methods
200
- 3. Add new report formats in `TestReportGenerator`
201
- 4. Enhance UI components in `model-testing.html`
202
-
203
- ## Results Interpretation
204
-
205
- ### Overall Scores
206
- - **90-100**: Excellent performance across all tasks
207
- - **80-89**: Very good with minor weaknesses
208
- - **70-79**: Good performance with some limitations
209
- - **60-69**: Adequate but needs improvement
210
- - **Below 60**: Poor performance, not recommended
211
-
212
- ### Success Rate Thresholds
213
- - **Word Selection**: >80% for production use
214
- - **Contextualization**: >90% for educational content
215
- - **Chat Hints**: >85% for effective tutoring
216
-
217
- Use these benchmarks to select the best model for your specific needs and performance requirements.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
model-testing.html DELETED
@@ -1,629 +0,0 @@
1
- <!DOCTYPE html>
2
- <html lang="en">
3
- <head>
4
- <meta charset="UTF-8">
5
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
- <title>Cloze Reader - Model Testing Framework</title>
7
- <style>
8
- body {
9
- font-family: 'Georgia', serif;
10
- background: linear-gradient(135deg, #f5f3f0 0%, #e8e4df 100%);
11
- margin: 0;
12
- padding: 20px;
13
- min-height: 100vh;
14
- }
15
-
16
- .container {
17
- max-width: 1200px;
18
- margin: 0 auto;
19
- background: rgba(255, 255, 255, 0.95);
20
- border-radius: 15px;
21
- box-shadow: 0 10px 30px rgba(0, 0, 0, 0.1);
22
- padding: 40px;
23
- }
24
-
25
- h1 {
26
- text-align: center;
27
- color: #2c3e50;
28
- font-size: 2.5rem;
29
- margin-bottom: 10px;
30
- text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.1);
31
- }
32
-
33
- .subtitle {
34
- text-align: center;
35
- color: #7f8c8d;
36
- font-size: 1.2rem;
37
- margin-bottom: 40px;
38
- }
39
-
40
- .model-selection {
41
- background: #f8f9fa;
42
- border-radius: 10px;
43
- padding: 30px;
44
- margin-bottom: 30px;
45
- border: 2px solid #e9ecef;
46
- }
47
-
48
- .model-selection h2 {
49
- color: #2c3e50;
50
- margin-bottom: 20px;
51
- font-size: 1.5rem;
52
- }
53
-
54
- .model-grid {
55
- display: grid;
56
- grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
57
- gap: 15px;
58
- margin-bottom: 20px;
59
- }
60
-
61
- .model-option {
62
- background: white;
63
- border: 2px solid #dee2e6;
64
- border-radius: 8px;
65
- padding: 15px;
66
- cursor: pointer;
67
- transition: all 0.3s ease;
68
- position: relative;
69
- }
70
-
71
- .model-option:hover {
72
- border-color: #007bff;
73
- box-shadow: 0 4px 8px rgba(0, 123, 255, 0.2);
74
- }
75
-
76
- .model-option.selected {
77
- border-color: #28a745;
78
- background: #f8fff9;
79
- }
80
-
81
- .model-option input[type="checkbox"] {
82
- position: absolute;
83
- top: 10px;
84
- right: 10px;
85
- transform: scale(1.2);
86
- }
87
-
88
- .model-name {
89
- font-weight: bold;
90
- color: #2c3e50;
91
- margin-bottom: 5px;
92
- }
93
-
94
- .model-provider {
95
- color: #6c757d;
96
- font-size: 0.9rem;
97
- margin-bottom: 5px;
98
- }
99
-
100
- .model-id {
101
- color: #495057;
102
- font-size: 0.8rem;
103
- font-family: monospace;
104
- background: #f1f3f4;
105
- padding: 2px 6px;
106
- border-radius: 4px;
107
- }
108
-
109
- .controls {
110
- display: flex;
111
- gap: 15px;
112
- align-items: center;
113
- flex-wrap: wrap;
114
- }
115
-
116
- .btn {
117
- background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
118
- color: white;
119
- border: none;
120
- padding: 12px 24px;
121
- border-radius: 8px;
122
- font-size: 1rem;
123
- cursor: pointer;
124
- transition: all 0.3s ease;
125
- font-weight: 500;
126
- }
127
-
128
- .btn:hover {
129
- transform: translateY(-2px);
130
- box-shadow: 0 6px 20px rgba(102, 126, 234, 0.4);
131
- }
132
-
133
- .btn:disabled {
134
- background: #6c757d;
135
- cursor: not-allowed;
136
- transform: none;
137
- box-shadow: none;
138
- }
139
-
140
- .btn-secondary {
141
- background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);
142
- }
143
-
144
- .btn-success {
145
- background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%);
146
- }
147
-
148
- .progress-section {
149
- margin-top: 30px;
150
- padding: 20px;
151
- background: #f8f9fa;
152
- border-radius: 10px;
153
- display: none;
154
- }
155
-
156
- .progress-section.active {
157
- display: block;
158
- }
159
-
160
- .progress-bar {
161
- width: 100%;
162
- height: 8px;
163
- background: #e9ecef;
164
- border-radius: 4px;
165
- overflow: hidden;
166
- margin-bottom: 10px;
167
- }
168
-
169
- .progress-fill {
170
- height: 100%;
171
- background: linear-gradient(90deg, #667eea, #764ba2);
172
- width: 0%;
173
- transition: width 0.3s ease;
174
- }
175
-
176
- .status-message {
177
- color: #495057;
178
- font-size: 1rem;
179
- margin-bottom: 10px;
180
- }
181
-
182
- .test-log {
183
- background: #2d3748;
184
- color: #e2e8f0;
185
- padding: 15px;
186
- border-radius: 8px;
187
- font-family: 'Courier New', monospace;
188
- font-size: 0.9rem;
189
- max-height: 300px;
190
- overflow-y: auto;
191
- white-space: pre-wrap;
192
- }
193
-
194
- .results-section {
195
- margin-top: 30px;
196
- padding: 20px;
197
- background: #f8f9fa;
198
- border-radius: 10px;
199
- display: none;
200
- }
201
-
202
- .results-section.active {
203
- display: block;
204
- }
205
-
206
- .results-grid {
207
- display: grid;
208
- grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
209
- gap: 20px;
210
- margin-top: 20px;
211
- }
212
-
213
- .result-card {
214
- background: white;
215
- border-radius: 8px;
216
- padding: 20px;
217
- box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
218
- }
219
-
220
- .result-card h3 {
221
- color: #2c3e50;
222
- margin-bottom: 15px;
223
- font-size: 1.2rem;
224
- }
225
-
226
- .metric {
227
- display: flex;
228
- justify-content: space-between;
229
- margin-bottom: 10px;
230
- padding-bottom: 8px;
231
- border-bottom: 1px solid #e9ecef;
232
- }
233
-
234
- .metric:last-child {
235
- border-bottom: none;
236
- margin-bottom: 0;
237
- }
238
-
239
- .metric-label {
240
- color: #6c757d;
241
- font-weight: 500;
242
- }
243
-
244
- .metric-value {
245
- color: #2c3e50;
246
- font-weight: bold;
247
- }
248
-
249
- .score-high { color: #28a745; }
250
- .score-medium { color: #ffc107; }
251
- .score-low { color: #dc3545; }
252
-
253
- .game-section {
254
- margin-top: 30px;
255
- padding: 20px;
256
- background: #f8f9fa;
257
- border-radius: 10px;
258
- display: none;
259
- }
260
-
261
- .game-section.active {
262
- display: block;
263
- }
264
-
265
- .game-frame {
266
- width: 100%;
267
- height: 600px;
268
- border: none;
269
- border-radius: 8px;
270
- background: white;
271
- }
272
-
273
- @media (max-width: 768px) {
274
- .container {
275
- padding: 20px;
276
- }
277
-
278
- .model-grid {
279
- grid-template-columns: 1fr;
280
- }
281
-
282
- .controls {
283
- flex-direction: column;
284
- align-items: stretch;
285
- }
286
- }
287
- </style>
288
- </head>
289
- <body>
290
- <div class="container">
291
- <h1>Model Testing Framework</h1>
292
- <p class="subtitle">Comprehensive evaluation of AI models for the Cloze Reader application</p>
293
-
294
- <div class="model-selection">
295
- <h2>Select Models to Test</h2>
296
- <div id="modelGrid" class="model-grid">
297
- <!-- Models will be populated by JavaScript -->
298
- </div>
299
-
300
- <div class="controls">
301
- <button id="selectAllBtn" class="btn btn-secondary">Select All</button>
302
- <button id="clearAllBtn" class="btn btn-secondary">Clear All</button>
303
- <button id="startTestBtn" class="btn">Start Comprehensive Test</button>
304
- <button id="testGameBtn" class="btn btn-success">Test Selected Model in Game</button>
305
- </div>
306
- </div>
307
-
308
- <div id="progressSection" class="progress-section">
309
- <h2>Testing Progress</h2>
310
- <div class="progress-bar">
311
- <div id="progressFill" class="progress-fill"></div>
312
- </div>
313
- <div id="statusMessage" class="status-message">Initializing tests...</div>
314
- <div id="testLog" class="test-log"></div>
315
- </div>
316
-
317
- <div id="resultsSection" class="results-section">
318
- <h2>Test Results</h2>
319
- <p>Results have been saved to the output folder as CSV files.</p>
320
- <div id="resultsGrid" class="results-grid">
321
- <!-- Results will be populated by JavaScript -->
322
- </div>
323
- </div>
324
-
325
- <div id="gameSection" class="game-section">
326
- <h2>Interactive Game Testing</h2>
327
- <p>Test the selected model by playing the game. Performance will be logged for analysis.</p>
328
- <iframe id="gameFrame" class="game-frame" src="about:blank"></iframe>
329
- </div>
330
- </div>
331
-
332
- <script type="module">
333
- import { ModelTestingFramework } from './src/modelTestingFramework.js';
334
-
335
- class ModelTestingUI {
336
- constructor() {
337
- this.framework = new ModelTestingFramework();
338
- this.selectedModels = new Set();
339
- this.isTestingInProgress = false;
340
- this.localServerStatus = null;
341
-
342
- this.initializeUI();
343
- this.setupEventListeners();
344
- }
345
-
346
- async initializeUI() {
347
- await this.checkLocalServer();
348
- await this.populateModelGrid();
349
- }
350
-
351
- async checkLocalServer() {
352
- this.localServerStatus = await this.framework.testLocalServerConnection();
353
- if (this.localServerStatus.connected) {
354
- console.log('Local LM Studio server detected:', this.localServerStatus.models.length, 'models available');
355
- await this.framework.detectLocalModels();
356
- } else {
357
- console.log('Local LM Studio server not available:', this.localServerStatus.error);
358
- }
359
- }
360
-
361
- populateModelGrid() {
362
- const grid = document.getElementById('modelGrid');
363
- grid.innerHTML = '';
364
-
365
- // Add local server status indicator
366
- if (this.localServerStatus) {
367
- const statusDiv = document.createElement('div');
368
- statusDiv.className = 'server-status';
369
- statusDiv.style.cssText = `
370
- grid-column: 1 / -1;
371
- padding: 15px;
372
- margin-bottom: 15px;
373
- border-radius: 8px;
374
- font-weight: bold;
375
- text-align: center;
376
- ${this.localServerStatus.connected
377
- ? 'background: #d4edda; color: #155724; border: 1px solid #c3e6cb;'
378
- : 'background: #f8d7da; color: #721c24; border: 1px solid #f5c6cb;'
379
- }
380
- `;
381
-
382
- if (this.localServerStatus.connected) {
383
- statusDiv.innerHTML = `
384
- ✓ Local LM Studio Server Connected (Port 1234)<br>
385
- <small>${this.localServerStatus.models.length} model(s) available</small>
386
- `;
387
- } else {
388
- statusDiv.innerHTML = `
389
- ✗ Local LM Studio Server Not Available<br>
390
- <small>Start LM Studio on port 1234 to test local models</small>
391
- `;
392
- }
393
-
394
- grid.appendChild(statusDiv);
395
- }
396
-
397
- this.framework.models.forEach(model => {
398
- const modelDiv = document.createElement('div');
399
- modelDiv.className = 'model-option';
400
- modelDiv.dataset.modelId = model.id;
401
-
402
- // Disable local models if server is not connected
403
- const isDisabled = model.provider === 'local' && !this.localServerStatus?.connected;
404
- if (isDisabled) {
405
- modelDiv.classList.add('disabled');
406
- modelDiv.style.opacity = '0.5';
407
- modelDiv.style.cursor = 'not-allowed';
408
- }
409
-
410
- const providerLabel = model.provider === 'local'
411
- ? `LOCAL ${this.localServerStatus?.connected ? '(✓)' : '(✗)'}`
412
- : model.provider.toUpperCase();
413
-
414
- modelDiv.innerHTML = `
415
- <input type="checkbox" id="model-${model.id}" ${isDisabled ? 'disabled' : ''} />
416
- <div class="model-name">${model.name}</div>
417
- <div class="model-provider">${providerLabel}</div>
418
- <div class="model-id">${model.id}</div>
419
- `;
420
-
421
- const checkbox = modelDiv.querySelector('input');
422
- checkbox.addEventListener('change', (e) => {
423
- if (e.target.checked) {
424
- this.selectedModels.add(model);
425
- modelDiv.classList.add('selected');
426
- } else {
427
- this.selectedModels.delete(model);
428
- modelDiv.classList.remove('selected');
429
- }
430
- this.updateControlsState();
431
- });
432
-
433
- if (!isDisabled) {
434
- modelDiv.addEventListener('click', (e) => {
435
- if (e.target !== checkbox) {
436
- checkbox.click();
437
- }
438
- });
439
- }
440
-
441
- grid.appendChild(modelDiv);
442
- });
443
- }
444
-
445
- setupEventListeners() {
446
- document.getElementById('selectAllBtn').addEventListener('click', () => {
447
- this.selectAllModels();
448
- });
449
-
450
- document.getElementById('clearAllBtn').addEventListener('click', () => {
451
- this.clearAllModels();
452
- });
453
-
454
- document.getElementById('startTestBtn').addEventListener('click', () => {
455
- this.startComprehensiveTest();
456
- });
457
-
458
- document.getElementById('testGameBtn').addEventListener('click', () => {
459
- this.startGameTest();
460
- });
461
- }
462
-
463
- selectAllModels() {
464
- this.framework.models.forEach(model => {
465
- this.selectedModels.add(model);
466
- const modelDiv = document.querySelector(`[data-model-id="${model.id}"]`);
467
- const checkbox = modelDiv.querySelector('input');
468
- checkbox.checked = true;
469
- modelDiv.classList.add('selected');
470
- });
471
- this.updateControlsState();
472
- }
473
-
474
- clearAllModels() {
475
- this.selectedModels.clear();
476
- document.querySelectorAll('.model-option').forEach(div => {
477
- div.classList.remove('selected');
478
- div.querySelector('input').checked = false;
479
- });
480
- this.updateControlsState();
481
- }
482
-
483
- updateControlsState() {
484
- const hasSelection = this.selectedModels.size > 0;
485
- document.getElementById('startTestBtn').disabled = !hasSelection || this.isTestingInProgress;
486
- document.getElementById('testGameBtn').disabled = this.selectedModels.size !== 1 || this.isTestingInProgress;
487
- }
488
-
489
- async startComprehensiveTest() {
490
- if (this.selectedModels.size === 0) {
491
- alert('Please select at least one model to test.');
492
- return;
493
- }
494
-
495
- this.isTestingInProgress = true;
496
- this.updateControlsState();
497
-
498
- const progressSection = document.getElementById('progressSection');
499
- const progressFill = document.getElementById('progressFill');
500
- const statusMessage = document.getElementById('statusMessage');
501
- const testLog = document.getElementById('testLog');
502
-
503
- progressSection.classList.add('active');
504
- testLog.textContent = '';
505
-
506
- const modelsArray = Array.from(this.selectedModels);
507
- let completedTests = 0;
508
-
509
- try {
510
- for (let i = 0; i < modelsArray.length; i++) {
511
- const model = modelsArray[i];
512
- const progress = (i / modelsArray.length) * 100;
513
-
514
- progressFill.style.width = `${progress}%`;
515
- statusMessage.textContent = `Testing ${model.name} (${i + 1}/${modelsArray.length})...`;
516
-
517
- this.log(`Starting test for ${model.name}...`);
518
-
519
- try {
520
- const result = await this.framework.testModel(model);
521
- this.log(`✓ ${model.name} completed - Score: ${result.overallScore.toFixed(1)}`);
522
- completedTests++;
523
- } catch (error) {
524
- this.log(`✗ ${model.name} failed: ${error.message}`);
525
- }
526
-
527
- progressFill.style.width = `${((i + 1) / modelsArray.length) * 100}%`;
528
- }
529
-
530
- statusMessage.textContent = `Testing completed! ${completedTests}/${modelsArray.length} models tested successfully.`;
531
- this.log(`\\nTesting completed! Results saved to output folder.`);
532
-
533
- // Show results
534
- this.displayResults();
535
-
536
- } catch (error) {
537
- this.log(`\\nTesting failed: ${error.message}`);
538
- statusMessage.textContent = 'Testing failed. Check the log for details.';
539
- } finally {
540
- this.isTestingInProgress = false;
541
- this.updateControlsState();
542
- }
543
- }
544
-
545
- startGameTest() {
546
- if (this.selectedModels.size !== 1) {
547
- alert('Please select exactly one model for game testing.');
548
- return;
549
- }
550
-
551
- const selectedModel = Array.from(this.selectedModels)[0];
552
- const gameSection = document.getElementById('gameSection');
553
- const gameFrame = document.getElementById('gameFrame');
554
-
555
- // Construct URL with model parameter
556
- const gameUrl = `index.html?testModel=${encodeURIComponent(selectedModel.id)}&testMode=true`;
557
- if (selectedModel.provider === 'local') {
558
- gameUrl += '&local=true';
559
- }
560
-
561
- gameFrame.src = gameUrl;
562
- gameSection.classList.add('active');
563
-
564
- this.log(`Starting game test with ${selectedModel.name}...`);
565
- }
566
-
567
- displayResults() {
568
- const resultsSection = document.getElementById('resultsSection');
569
- const resultsGrid = document.getElementById('resultsGrid');
570
-
571
- resultsGrid.innerHTML = '';
572
-
573
- this.framework.testResults.tests.forEach(result => {
574
- const card = document.createElement('div');
575
- card.className = 'result-card';
576
-
577
- const overallScoreClass = this.getScoreClass(result.overallScore);
578
-
579
- card.innerHTML = `
580
- <h3>${result.modelName}</h3>
581
- <div class="metric">
582
- <span class="metric-label">Overall Score</span>
583
- <span class="metric-value ${overallScoreClass}">${result.overallScore?.toFixed(1) || 'N/A'}</span>
584
- </div>
585
- <div class="metric">
586
- <span class="metric-label">Word Selection Success</span>
587
- <span class="metric-value">${(result.wordSelection?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
588
- </div>
589
- <div class="metric">
590
- <span class="metric-label">Contextualization Success</span>
591
- <span class="metric-value">${(result.contextualization?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
592
- </div>
593
- <div class="metric">
594
- <span class="metric-label">Chat Hints Success</span>
595
- <span class="metric-value">${(result.chatHints?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
596
- </div>
597
- <div class="metric">
598
- <span class="metric-label">Average Response Time</span>
599
- <span class="metric-value">${result.wordSelection?.averageTime?.toFixed(0) || 'N/A'}ms</span>
600
- </div>
601
- `;
602
-
603
- resultsGrid.appendChild(card);
604
- });
605
-
606
- resultsSection.classList.add('active');
607
- }
608
-
609
- getScoreClass(score) {
610
- if (score >= 80) return 'score-high';
611
- if (score >= 60) return 'score-medium';
612
- return 'score-low';
613
- }
614
-
615
- log(message) {
616
- const testLog = document.getElementById('testLog');
617
- const timestamp = new Date().toLocaleTimeString();
618
- testLog.textContent += `[${timestamp}] ${message}\\n`;
619
- testLog.scrollTop = testLog.scrollHeight;
620
- }
621
- }
622
-
623
- // Initialize the testing UI when the page loads
624
- window.addEventListener('DOMContentLoaded', () => {
625
- new ModelTestingUI();
626
- });
627
- </script>
628
- </body>
629
- </html>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/modelTestingFramework.js DELETED
@@ -1,703 +0,0 @@
1
- /**
2
- * Comprehensive Model Testing Framework for Cloze Reader
3
- * Tests all AI-powered features across different models
4
- */
5
-
6
- class ModelTestingFramework {
7
- constructor() {
8
- this.models = [
9
- // OpenRouter Models
10
- { id: 'openai/gpt-4o', name: 'GPT-4o', provider: 'openrouter' },
11
- { id: 'openai/gpt-4o-mini', name: 'GPT-4o Mini', provider: 'openrouter' },
12
- { id: 'anthropic/claude-3.5-sonnet', name: 'Claude 3.5 Sonnet', provider: 'openrouter' },
13
- { id: 'anthropic/claude-3-haiku', name: 'Claude 3 Haiku', provider: 'openrouter' },
14
- { id: 'google/gemini-pro-1.5', name: 'Gemini Pro 1.5', provider: 'openrouter' },
15
- { id: 'meta-llama/llama-3.1-8b-instruct', name: 'Llama 3.1 8B', provider: 'openrouter' },
16
- { id: 'meta-llama/llama-3.1-70b-instruct', name: 'Llama 3.1 70B', provider: 'openrouter' },
17
- { id: 'mistralai/mistral-7b-instruct', name: 'Mistral 7B', provider: 'openrouter' },
18
- { id: 'microsoft/phi-3-medium-4k-instruct', name: 'Phi-3 Medium', provider: 'openrouter' },
19
- { id: 'qwen/qwen-2-7b-instruct', name: 'Qwen 2 7B', provider: 'openrouter' },
20
-
21
- // Local LLM Models (LM Studio compatible)
22
- { id: 'local-llm', name: 'Local LLM (Auto-detect)', provider: 'local' },
23
- { id: 'gemma-3-12b', name: 'Gemma 3 12B (Local)', provider: 'local' },
24
- { id: 'llama-3.1-8b', name: 'Llama 3.1 8B (Local)', provider: 'local' },
25
- { id: 'mistral-7b', name: 'Mistral 7B (Local)', provider: 'local' },
26
- { id: 'qwen-2-7b', name: 'Qwen 2 7B (Local)', provider: 'local' },
27
- { id: 'phi-3-medium', name: 'Phi-3 Medium (Local)', provider: 'local' },
28
- { id: 'custom-local', name: 'Custom Local Model', provider: 'local' }
29
- ];
30
-
31
- this.testResults = {
32
- timestamp: new Date().toISOString(),
33
- tests: []
34
- };
35
-
36
- this.testPassages = [
37
- {
38
- text: "The old man sat by the fireplace, reading his favorite book. The flames danced in the hearth, casting shadows on the walls. He turned each page carefully, savoring every word of the ancient tale.",
39
- difficulty: 3,
40
- expectedWords: ['favorite', 'flames', 'shadows', 'carefully', 'ancient']
41
- },
42
- {
43
- text: "In the garden, colorful flowers bloomed under the warm sunshine. Bees buzzed from blossom to blossom, collecting nectar for their hive. The gardener watched with satisfaction as his hard work flourished.",
44
- difficulty: 2,
45
- expectedWords: ['colorful', 'warm', 'buzzed', 'collecting', 'satisfaction']
46
- },
47
- {
48
- text: "The protagonist's journey through the labyrinthine corridors revealed the edifice's architectural complexity. Each ornate chamber contained mysterious artifacts that suggested an ancient civilization's sophisticated understanding of mathematics and astronomy.",
49
- difficulty: 8,
50
- expectedWords: ['labyrinthine', 'edifice', 'architectural', 'ornate', 'artifacts', 'civilization', 'sophisticated']
51
- }
52
- ];
53
-
54
- this.chatQuestions = [
55
- { type: 'part_of_speech', prompt: 'What part of speech is this word?' },
56
- { type: 'sentence_role', prompt: 'What role does this word play in the sentence?' },
57
- { type: 'word_category', prompt: 'What category or type of word is this?' },
58
- { type: 'synonym', prompt: 'Can you suggest a synonym for this word?' }
59
- ];
60
- }
61
-
62
- async runComprehensiveTest(selectedModels = null) {
63
- const modelsToTest = selectedModels || this.models;
64
- console.log(`Starting comprehensive test of ${modelsToTest.length} models...`);
65
-
66
- for (const model of modelsToTest) {
67
- console.log(`\nTesting model: ${model.name}`);
68
- const modelResults = await this.testModel(model);
69
- this.testResults.tests.push(modelResults);
70
-
71
- // Save intermediate results
72
- await this.saveResults();
73
- }
74
-
75
- console.log('\nAll tests completed!');
76
- return this.testResults;
77
- }
78
-
79
- async testModel(model) {
80
- const startTime = Date.now();
81
- const results = {
82
- modelId: model.id,
83
- modelName: model.name,
84
- provider: model.provider,
85
- timestamp: new Date().toISOString(),
86
- totalTime: 0,
87
- wordSelection: {},
88
- contextualization: {},
89
- chatHints: {},
90
- errorRates: {},
91
- overallScore: 0
92
- };
93
-
94
- try {
95
- // Test word selection across different difficulty levels
96
- results.wordSelection = await this.testWordSelection(model);
97
-
98
- // Test contextualization
99
- results.contextualization = await this.testContextualization(model);
100
-
101
- // Test chat hint generation
102
- results.chatHints = await this.testChatHints(model);
103
-
104
- // Calculate overall metrics
105
- results.totalTime = Date.now() - startTime;
106
- results.overallScore = this.calculateOverallScore(results);
107
-
108
- } catch (error) {
109
- console.error(`Error testing model ${model.name}:`, error);
110
- results.error = error.message;
111
- results.overallScore = 0;
112
- }
113
-
114
- return results;
115
- }
116
-
117
- async testWordSelection(model) {
118
- const results = {
119
- tests: [],
120
- averageTime: 0,
121
- successRate: 0,
122
- qualityScore: 0,
123
- difficultyAccuracy: 0
124
- };
125
-
126
- let totalTime = 0;
127
- let successCount = 0;
128
- let qualitySum = 0;
129
- let difficultySum = 0;
130
-
131
- for (const passage of this.testPassages) {
132
- const testStart = Date.now();
133
-
134
- try {
135
- const words = await this.performWordSelection(model, passage);
136
- const testTime = Date.now() - testStart;
137
- totalTime += testTime;
138
-
139
- const test = {
140
- passageLength: passage.text.length,
141
- targetDifficulty: passage.difficulty,
142
- responseTime: testTime,
143
- selectedWords: words,
144
- wordCount: words.length,
145
- success: words.length > 0,
146
- qualityScore: this.evaluateWordQuality(words, passage),
147
- difficultyScore: this.evaluateDifficultyMatch(words, passage.difficulty)
148
- };
149
-
150
- results.tests.push(test);
151
-
152
- if (test.success) {
153
- successCount++;
154
- qualitySum += test.qualityScore;
155
- difficultySum += test.difficultyScore;
156
- }
157
-
158
- } catch (error) {
159
- results.tests.push({
160
- passageLength: passage.text.length,
161
- targetDifficulty: passage.difficulty,
162
- responseTime: Date.now() - testStart,
163
- error: error.message,
164
- success: false
165
- });
166
- }
167
-
168
- // Brief pause between tests
169
- await new Promise(resolve => setTimeout(resolve, 1000));
170
- }
171
-
172
- results.averageTime = totalTime / this.testPassages.length;
173
- results.successRate = successCount / this.testPassages.length;
174
- results.qualityScore = successCount > 0 ? qualitySum / successCount : 0;
175
- results.difficultyAccuracy = successCount > 0 ? difficultySum / successCount : 0;
176
-
177
- return results;
178
- }
179
-
180
- async testContextualization(model) {
181
- const results = {
182
- tests: [],
183
- averageTime: 0,
184
- successRate: 0,
185
- relevanceScore: 0
186
- };
187
-
188
- const testBooks = [
189
- { title: 'Pride and Prejudice', author: 'Jane Austen' },
190
- { title: 'The Adventures of Tom Sawyer', author: 'Mark Twain' },
191
- { title: 'Moby Dick', author: 'Herman Melville' }
192
- ];
193
-
194
- let totalTime = 0;
195
- let successCount = 0;
196
- let relevanceSum = 0;
197
-
198
- for (const book of testBooks) {
199
- const testStart = Date.now();
200
-
201
- try {
202
- const context = await this.performContextualization(model, book);
203
- const testTime = Date.now() - testStart;
204
- totalTime += testTime;
205
-
206
- const test = {
207
- bookTitle: book.title,
208
- author: book.author,
209
- responseTime: testTime,
210
- contextLength: context.length,
211
- success: context.length > 0,
212
- relevanceScore: this.evaluateContextRelevance(context, book)
213
- };
214
-
215
- results.tests.push(test);
216
-
217
- if (test.success) {
218
- successCount++;
219
- relevanceSum += test.relevanceScore;
220
- }
221
-
222
- } catch (error) {
223
- results.tests.push({
224
- bookTitle: book.title,
225
- author: book.author,
226
- responseTime: Date.now() - testStart,
227
- error: error.message,
228
- success: false
229
- });
230
- }
231
-
232
- await new Promise(resolve => setTimeout(resolve, 1000));
233
- }
234
-
235
- results.averageTime = totalTime / testBooks.length;
236
- results.successRate = successCount / testBooks.length;
237
- results.relevanceScore = successCount > 0 ? relevanceSum / successCount : 0;
238
-
239
- return results;
240
- }
241
-
242
- async testChatHints(model) {
243
- const results = {
244
- tests: [],
245
- averageTime: 0,
246
- successRate: 0,
247
- helpfulnessScore: 0,
248
- questionTypePerformance: {}
249
- };
250
-
251
- const testWords = [
252
- { word: 'magnificent', sentence: 'The cathedral was truly magnificent.', difficulty: 5 },
253
- { word: 'whispered', sentence: 'She whispered the secret to her friend.', difficulty: 3 },
254
- { word: 'extraordinary', sentence: 'His performance was extraordinary.', difficulty: 7 }
255
- ];
256
-
257
- let totalTime = 0;
258
- let successCount = 0;
259
- let helpfulnessSum = 0;
260
-
261
- // Initialize question type tracking
262
- this.chatQuestions.forEach(q => {
263
- results.questionTypePerformance[q.type] = {
264
- tests: 0,
265
- successes: 0,
266
- averageScore: 0
267
- };
268
- });
269
-
270
- for (const testWord of testWords) {
271
- for (const question of this.chatQuestions) {
272
- const testStart = Date.now();
273
-
274
- try {
275
- const hint = await this.performChatHint(model, testWord, question);
276
- const testTime = Date.now() - testStart;
277
- totalTime += testTime;
278
-
279
- const helpfulnessScore = this.evaluateHintHelpfulness(hint, testWord, question);
280
-
281
- const test = {
282
- word: testWord.word,
283
- questionType: question.type,
284
- difficulty: testWord.difficulty,
285
- responseTime: testTime,
286
- hintLength: hint.length,
287
- success: hint.length > 10, // Minimum meaningful response
288
- helpfulnessScore: helpfulnessScore
289
- };
290
-
291
- results.tests.push(test);
292
-
293
- // Update question type performance
294
- const qtPerf = results.questionTypePerformance[question.type];
295
- qtPerf.tests++;
296
-
297
- if (test.success) {
298
- successCount++;
299
- helpfulnessSum += helpfulnessScore;
300
- qtPerf.successes++;
301
- qtPerf.averageScore += helpfulnessScore;
302
- }
303
-
304
- } catch (error) {
305
- results.tests.push({
306
- word: testWord.word,
307
- questionType: question.type,
308
- difficulty: testWord.difficulty,
309
- responseTime: Date.now() - testStart,
310
- error: error.message,
311
- success: false
312
- });
313
-
314
- results.questionTypePerformance[question.type].tests++;
315
- }
316
-
317
- await new Promise(resolve => setTimeout(resolve, 500));
318
- }
319
- }
320
-
321
- // Calculate averages for question types
322
- Object.keys(results.questionTypePerformance).forEach(type => {
323
- const perf = results.questionTypePerformance[type];
324
- perf.successRate = perf.tests > 0 ? perf.successes / perf.tests : 0;
325
- perf.averageScore = perf.successes > 0 ? perf.averageScore / perf.successes : 0;
326
- });
327
-
328
- const totalTests = testWords.length * this.chatQuestions.length;
329
- results.averageTime = totalTime / totalTests;
330
- results.successRate = successCount / totalTests;
331
- results.helpfulnessScore = successCount > 0 ? helpfulnessSum / successCount : 0;
332
-
333
- return results;
334
- }
335
-
336
- async performWordSelection(model, passage) {
337
- // Create a temporary AI service instance for this model
338
- const aiService = await this.createModelAIService(model);
339
-
340
- const prompt = `Select ${Math.min(3, Math.floor(passage.difficulty / 2) + 1)} appropriate words to remove from this passage for a cloze exercise at difficulty level ${passage.difficulty}:
341
-
342
- "${passage.text}"
343
-
344
- Return only a JSON array of words, like: ["word1", "word2", "word3"]`;
345
-
346
- const response = await aiService.makeAIRequest(prompt);
347
-
348
- try {
349
- return JSON.parse(response);
350
- } catch {
351
- // Try to extract words from non-JSON response
352
- const matches = response.match(/\[.*?\]/);
353
- if (matches) {
354
- return JSON.parse(matches[0]);
355
- }
356
- return [];
357
- }
358
- }
359
-
360
- async performContextualization(model, book) {
361
- const aiService = await this.createModelAIService(model);
362
-
363
- const prompt = `Provide a brief historical and literary context for "${book.title}" by ${book.author}. Keep it concise and educational, suitable for language learners.`;
364
-
365
- return await aiService.makeAIRequest(prompt);
366
- }
367
-
368
- async performChatHint(model, testWord, question) {
369
- const aiService = await this.createModelAIService(model);
370
-
371
- const prompt = `You are helping a student understand a word in context. The word is "${testWord.word}" in the sentence: "${testWord.sentence}"
372
-
373
- ${question.prompt}
374
-
375
- Provide a helpful hint without revealing the word directly. Keep your response concise and educational.`;
376
-
377
- return await aiService.makeAIRequest(prompt);
378
- }
379
-
380
- async createModelAIService(model) {
381
- // Use the testing AI service for better performance tracking
382
- const { TestAIService } = await import('./testAIService.js');
383
-
384
- const config = {
385
- modelId: model.id,
386
- provider: model.provider,
387
- isLocal: model.provider === 'local'
388
- };
389
-
390
- return new TestAIService(config);
391
- }
392
-
393
- async detectLocalModels() {
394
- // Attempt to detect available local models from LM Studio
395
- try {
396
- const response = await fetch('http://localhost:1234/v1/models');
397
- if (response.ok) {
398
- const data = await response.json();
399
- const detectedModels = data.data.map(model => ({
400
- id: model.id,
401
- name: `${model.id} (Local)`,
402
- provider: 'local'
403
- }));
404
-
405
- // Update the local models list
406
- this.models = this.models.filter(m => m.provider !== 'local');
407
- this.models.push(...detectedModels);
408
-
409
- return detectedModels;
410
- }
411
- } catch (error) {
412
- console.log('No local LM Studio server detected on port 1234');
413
- }
414
-
415
- // Return default local models if detection fails
416
- return this.models.filter(m => m.provider === 'local');
417
- }
418
-
419
- async testLocalServerConnection() {
420
- try {
421
- const response = await fetch('http://localhost:1234/v1/models', {
422
- method: 'GET',
423
- headers: {
424
- 'Content-Type': 'application/json'
425
- }
426
- });
427
-
428
- if (response.ok) {
429
- const data = await response.json();
430
- return {
431
- connected: true,
432
- models: data.data || [],
433
- serverInfo: data
434
- };
435
- } else {
436
- return {
437
- connected: false,
438
- error: `HTTP ${response.status}: ${response.statusText}`
439
- };
440
- }
441
- } catch (error) {
442
- return {
443
- connected: false,
444
- error: error.message
445
- };
446
- }
447
- }
448
-
449
- evaluateWordQuality(words, passage) {
450
- if (!words || words.length === 0) return 0;
451
-
452
- let score = 0;
453
- const text = passage.text.toLowerCase();
454
-
455
- for (const word of words) {
456
- const wordLower = word.toLowerCase();
457
-
458
- // Check if word exists in passage
459
- if (text.includes(wordLower)) score += 20;
460
-
461
- // Check word length appropriateness
462
- const expectedMinLength = Math.max(4, passage.difficulty);
463
- const expectedMaxLength = Math.min(12, passage.difficulty + 6);
464
-
465
- if (word.length >= expectedMinLength && word.length <= expectedMaxLength) {
466
- score += 15;
467
- }
468
-
469
- // Avoid overly common words for higher difficulties
470
- const commonWords = ['the', 'and', 'but', 'for', 'are', 'was', 'his', 'her'];
471
- if (passage.difficulty > 5 && !commonWords.includes(wordLower)) {
472
- score += 10;
473
- }
474
- }
475
-
476
- return Math.min(100, score / words.length);
477
- }
478
-
479
- evaluateDifficultyMatch(words, targetDifficulty) {
480
- if (!words || words.length === 0) return 0;
481
-
482
- let score = 0;
483
-
484
- for (const word of words) {
485
- const wordLength = word.length;
486
- const expectedMin = Math.max(4, targetDifficulty);
487
- const expectedMax = Math.min(14, targetDifficulty + 6);
488
-
489
- if (wordLength >= expectedMin && wordLength <= expectedMax) {
490
- score += 100;
491
- } else {
492
- // Partial credit for close matches
493
- const distance = Math.min(
494
- Math.abs(wordLength - expectedMin),
495
- Math.abs(wordLength - expectedMax)
496
- );
497
- score += Math.max(0, 100 - (distance * 20));
498
- }
499
- }
500
-
501
- return score / words.length;
502
- }
503
-
504
- evaluateContextRelevance(context, book) {
505
- if (!context || context.length < 20) return 0;
506
-
507
- let score = 0;
508
- const contextLower = context.toLowerCase();
509
-
510
- // Check for book title mention
511
- if (contextLower.includes(book.title.toLowerCase())) score += 25;
512
-
513
- // Check for author mention
514
- if (contextLower.includes(book.author.toLowerCase().split(' ').pop())) score += 25;
515
-
516
- // Check for literary/historical terms
517
- const literaryTerms = ['novel', 'literature', 'author', 'published', 'century', 'period', 'style', 'theme'];
518
- const foundTerms = literaryTerms.filter(term => contextLower.includes(term));
519
- score += Math.min(30, foundTerms.length * 5);
520
-
521
- // Length appropriateness (100-500 chars is good)
522
- if (context.length >= 100 && context.length <= 500) score += 20;
523
-
524
- return Math.min(100, score);
525
- }
526
-
527
- evaluateHintHelpfulness(hint, testWord, question) {
528
- if (!hint || hint.length < 10) return 0;
529
-
530
- let score = 0;
531
- const hintLower = hint.toLowerCase();
532
- const wordLower = testWord.word.toLowerCase();
533
-
534
- // Penalize if the word is revealed directly
535
- if (hintLower.includes(wordLower)) {
536
- score -= 50;
537
- }
538
-
539
- // Check for question-appropriate responses
540
- switch (question.type) {
541
- case 'part_of_speech':
542
- const posTerms = ['noun', 'verb', 'adjective', 'adverb', 'pronoun'];
543
- if (posTerms.some(term => hintLower.includes(term))) score += 40;
544
- break;
545
-
546
- case 'sentence_role':
547
- const roleTerms = ['subject', 'object', 'predicate', 'modifier', 'describes'];
548
- if (roleTerms.some(term => hintLower.includes(term))) score += 40;
549
- break;
550
-
551
- case 'word_category':
552
- const categoryTerms = ['type', 'kind', 'category', 'group', 'family'];
553
- if (categoryTerms.some(term => hintLower.includes(term))) score += 40;
554
- break;
555
-
556
- case 'synonym':
557
- const synonymTerms = ['similar', 'means', 'like', 'same as', 'equivalent'];
558
- if (synonymTerms.some(term => hintLower.includes(term))) score += 40;
559
- break;
560
- }
561
-
562
- // Length appropriateness
563
- if (hint.length >= 20 && hint.length <= 200) score += 30;
564
-
565
- // Educational tone
566
- const educationalTerms = ['this word', 'in this context', 'here', 'sentence'];
567
- if (educationalTerms.some(term => hintLower.includes(term))) score += 20;
568
-
569
- return Math.max(0, Math.min(100, score));
570
- }
571
-
572
- calculateOverallScore(results) {
573
- const weights = {
574
- wordSelection: 0.4,
575
- contextualization: 0.3,
576
- chatHints: 0.3
577
- };
578
-
579
- let totalScore = 0;
580
-
581
- if (results.wordSelection.successRate !== undefined) {
582
- totalScore += results.wordSelection.successRate * 40 * weights.wordSelection;
583
- }
584
-
585
- if (results.contextualization.successRate !== undefined) {
586
- totalScore += results.contextualization.successRate * 50 * weights.contextualization;
587
- }
588
-
589
- if (results.chatHints.successRate !== undefined) {
590
- totalScore += results.chatHints.successRate * 60 * weights.chatHints;
591
- }
592
-
593
- // Bonus for consistent performance across all areas
594
- const allAreas = [results.wordSelection, results.contextualization, results.chatHints];
595
- const minSuccess = Math.min(...allAreas.map(area => area.successRate || 0));
596
- if (minSuccess > 0.8) totalScore += 10;
597
-
598
- return Math.min(100, totalScore);
599
- }
600
-
601
- async saveResults() {
602
- const csvContent = this.generateCSV();
603
- const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
604
- const filename = `model_test_results_${timestamp}.csv`;
605
-
606
- // Browser environment - download file
607
- this.downloadCSV(csvContent, filename);
608
-
609
- console.log(`Results saved as ${filename}`);
610
- return filename;
611
- }
612
-
613
- downloadCSV(content, filename) {
614
- const blob = new Blob([content], { type: 'text/csv' });
615
- const url = URL.createObjectURL(blob);
616
-
617
- const a = document.createElement('a');
618
- a.href = url;
619
- a.download = filename;
620
- document.body.appendChild(a);
621
- a.click();
622
- document.body.removeChild(a);
623
- URL.revokeObjectURL(url);
624
- }
625
-
626
- generateCSV() {
627
- const headers = [
628
- 'Model Name',
629
- 'Model ID',
630
- 'Provider',
631
- 'Timestamp',
632
- 'Total Time (ms)',
633
- 'Overall Score',
634
- 'Word Selection Success Rate',
635
- 'Word Selection Avg Time (ms)',
636
- 'Word Selection Quality Score',
637
- 'Word Selection Difficulty Accuracy',
638
- 'Contextualization Success Rate',
639
- 'Contextualization Avg Time (ms)',
640
- 'Contextualization Relevance Score',
641
- 'Chat Hints Success Rate',
642
- 'Chat Hints Avg Time (ms)',
643
- 'Chat Hints Helpfulness Score',
644
- 'Part of Speech Success Rate',
645
- 'Sentence Role Success Rate',
646
- 'Word Category Success Rate',
647
- 'Synonym Success Rate',
648
- 'User Satisfaction Score',
649
- 'Word Selection User Rating',
650
- 'Passage Quality User Rating',
651
- 'Hint Helpfulness User Rating',
652
- 'Overall Experience User Rating',
653
- 'User Comments Count',
654
- 'Error Message'
655
- ];
656
-
657
- const rows = [headers.join(',')];
658
-
659
- for (const test of this.testResults.tests) {
660
- // Get user ranking data if available
661
- const userRankings = test.userRankings || {};
662
- const userSatisfaction = userRankings.overallUserSatisfaction || 0;
663
- const avgRatings = userRankings.averageRatings || {};
664
- const commentsCount = userRankings.comments?.length || 0;
665
-
666
- const row = [
667
- `"${test.modelName}"`,
668
- `"${test.modelId}"`,
669
- `"${test.provider}"`,
670
- `"${test.timestamp}"`,
671
- test.totalTime || 0,
672
- test.overallScore || 0,
673
- test.wordSelection?.successRate || 0,
674
- test.wordSelection?.averageTime || 0,
675
- test.wordSelection?.qualityScore || 0,
676
- test.wordSelection?.difficultyAccuracy || 0,
677
- test.contextualization?.successRate || 0,
678
- test.contextualization?.averageTime || 0,
679
- test.contextualization?.relevanceScore || 0,
680
- test.chatHints?.successRate || 0,
681
- test.chatHints?.averageTime || 0,
682
- test.chatHints?.helpfulnessScore || 0,
683
- test.chatHints?.questionTypePerformance?.part_of_speech?.successRate || 0,
684
- test.chatHints?.questionTypePerformance?.sentence_role?.successRate || 0,
685
- test.chatHints?.questionTypePerformance?.word_category?.successRate || 0,
686
- test.chatHints?.questionTypePerformance?.synonym?.successRate || 0,
687
- userSatisfaction.toFixed(2),
688
- avgRatings.word_selection?.toFixed(2) || 0,
689
- avgRatings.passage_quality?.toFixed(2) || 0,
690
- avgRatings.hint_helpfulness?.toFixed(2) || 0,
691
- avgRatings.overall_experience?.toFixed(2) || 0,
692
- commentsCount,
693
- `"${test.error || ''}"`
694
- ];
695
-
696
- rows.push(row.join(','));
697
- }
698
-
699
- return rows.join('\n');
700
- }
701
- }
702
-
703
- export { ModelTestingFramework };
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/testAIService.js DELETED
@@ -1,154 +0,0 @@
1
- /**
2
- * Testing-specific AI Service wrapper
3
- * Extends the main AI service with testing capabilities
4
- */
5
-
6
- class TestAIService {
7
- constructor(config) {
8
- this.modelId = config.modelId;
9
- this.provider = config.provider;
10
- this.isLocal = config.isLocal || config.provider === 'local';
11
- this.baseUrl = this.isLocal ? 'http://localhost:1234' : 'https://openrouter.ai/api/v1';
12
- this.apiKey = this.isLocal ? 'test-key' : this.getApiKey();
13
-
14
- // Performance tracking
15
- this.requestCount = 0;
16
- this.totalResponseTime = 0;
17
- this.errorCount = 0;
18
- this.lastError = null;
19
- }
20
-
21
- getApiKey() {
22
- // Try to get API key from meta tag (injected by server)
23
- const metaTag = document.querySelector('meta[name="openrouter-api-key"]');
24
- if (metaTag) {
25
- return metaTag.content;
26
- }
27
-
28
- // Fallback to environment variable (for Node.js testing)
29
- if (typeof process !== 'undefined' && process.env) {
30
- return process.env.OPENROUTER_API_KEY;
31
- }
32
-
33
- return null;
34
- }
35
-
36
- async makeAIRequest(prompt, options = {}) {
37
- const startTime = Date.now();
38
- this.requestCount++;
39
-
40
- try {
41
- const response = await this.performRequest(prompt, options);
42
- this.totalResponseTime += Date.now() - startTime;
43
- return response;
44
- } catch (error) {
45
- this.errorCount++;
46
- this.lastError = error;
47
- this.totalResponseTime += Date.now() - startTime;
48
- throw error;
49
- }
50
- }
51
-
52
- async performRequest(prompt, options = {}) {
53
- const requestBody = {
54
- model: this.modelId,
55
- messages: [
56
- {
57
- role: "user",
58
- content: prompt
59
- }
60
- ],
61
- max_tokens: options.maxTokens || 500,
62
- temperature: options.temperature || 0.7,
63
- top_p: options.topP || 0.9
64
- };
65
-
66
- const headers = {
67
- 'Content-Type': 'application/json',
68
- 'Authorization': `Bearer ${this.apiKey}`
69
- };
70
-
71
- if (!this.isLocal) {
72
- headers['HTTP-Referer'] = window.location.origin;
73
- }
74
-
75
- const controller = new AbortController();
76
- const timeoutId = setTimeout(() => controller.abort(), 30000); // 30 second timeout
77
-
78
- try {
79
- const response = await fetch(`${this.baseUrl}/chat/completions`, {
80
- method: 'POST',
81
- headers: headers,
82
- body: JSON.stringify(requestBody),
83
- signal: controller.signal
84
- });
85
-
86
- clearTimeout(timeoutId);
87
-
88
- if (!response.ok) {
89
- throw new Error(`HTTP ${response.status}: ${response.statusText}`);
90
- }
91
-
92
- const data = await response.json();
93
-
94
- if (!data.choices || data.choices.length === 0) {
95
- throw new Error('No response from AI service');
96
- }
97
-
98
- let content = data.choices[0].message.content;
99
-
100
- // Clean up local LLM response artifacts
101
- if (this.isLocal) {
102
- content = this.cleanLocalLLMResponse(content);
103
- }
104
-
105
- return content;
106
- } catch (error) {
107
- clearTimeout(timeoutId);
108
- if (error.name === 'AbortError') {
109
- throw new Error('Request timeout');
110
- }
111
- throw error;
112
- }
113
- }
114
-
115
- cleanLocalLLMResponse(content) {
116
- // Remove common local LLM artifacts
117
- content = content.replace(/^\[.*?\]\s*/, ''); // Remove leading brackets
118
- content = content.replace(/\s*\[.*?\]$/, ''); // Remove trailing brackets
119
- content = content.replace(/^"(.*)"$/, '$1'); // Remove surrounding quotes
120
- content = content.replace(/\\n/g, '\n'); // Fix escaped newlines
121
- content = content.replace(/\\"/g, '"'); // Fix escaped quotes
122
-
123
- return content.trim();
124
- }
125
-
126
- // Performance metrics
127
- getAverageResponseTime() {
128
- return this.requestCount > 0 ? this.totalResponseTime / this.requestCount : 0;
129
- }
130
-
131
- getErrorRate() {
132
- return this.requestCount > 0 ? this.errorCount / this.requestCount : 0;
133
- }
134
-
135
- getPerformanceStats() {
136
- return {
137
- requestCount: this.requestCount,
138
- totalResponseTime: this.totalResponseTime,
139
- averageResponseTime: this.getAverageResponseTime(),
140
- errorCount: this.errorCount,
141
- errorRate: this.getErrorRate(),
142
- lastError: this.lastError?.message || null
143
- };
144
- }
145
-
146
- reset() {
147
- this.requestCount = 0;
148
- this.totalResponseTime = 0;
149
- this.errorCount = 0;
150
- this.lastError = null;
151
- }
152
- }
153
-
154
- export { TestAIService };
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/testGameRunner.js DELETED
@@ -1,473 +0,0 @@
1
- /**
2
- * Test Game Runner - Monitors and logs performance during game testing
3
- */
4
-
5
- class TestGameRunner {
6
- constructor(modelConfig) {
7
- this.modelConfig = modelConfig;
8
- this.sessionData = {
9
- modelId: modelConfig.modelId,
10
- modelName: modelConfig.modelName,
11
- provider: modelConfig.provider,
12
- startTime: Date.now(),
13
- rounds: [],
14
- interactions: [],
15
- userRankings: [],
16
- performance: {
17
- wordSelectionRequests: 0,
18
- wordSelectionSuccess: 0,
19
- wordSelectionTime: 0,
20
- contextualizationRequests: 0,
21
- contextualizationSuccess: 0,
22
- contextualizationTime: 0,
23
- chatHintRequests: 0,
24
- chatHintSuccess: 0,
25
- chatHintTime: 0,
26
- errors: []
27
- }
28
- };
29
-
30
- this.originalAIService = null;
31
- this.setupInterception();
32
- }
33
-
34
- setupInterception() {
35
- // Intercept AI service calls to track performance
36
- if (window.aiService) {
37
- this.originalAIService = window.aiService;
38
- this.wrapAIService();
39
- }
40
-
41
- // Monitor for game events
42
- this.setupGameEventListeners();
43
- }
44
-
45
- wrapAIService() {
46
- const testRunner = this;
47
-
48
- // Wrap the makeAIRequest method
49
- const originalMakeAIRequest = this.originalAIService.makeAIRequest.bind(this.originalAIService);
50
-
51
- window.aiService.makeAIRequest = async function(prompt, options = {}) {
52
- const startTime = Date.now();
53
- const requestType = testRunner.classifyRequest(prompt);
54
-
55
- testRunner.logInteraction({
56
- type: 'ai_request_start',
57
- requestType: requestType,
58
- prompt: prompt.substring(0, 200) + '...',
59
- timestamp: Date.now()
60
- });
61
-
62
- try {
63
- const result = await originalMakeAIRequest(prompt, options);
64
- const responseTime = Date.now() - startTime;
65
-
66
- testRunner.updatePerformanceMetrics(requestType, true, responseTime);
67
- testRunner.logInteraction({
68
- type: 'ai_request_success',
69
- requestType: requestType,
70
- responseTime: responseTime,
71
- responseLength: result.length,
72
- timestamp: Date.now()
73
- });
74
-
75
- return result;
76
- } catch (error) {
77
- const responseTime = Date.now() - startTime;
78
-
79
- testRunner.updatePerformanceMetrics(requestType, false, responseTime);
80
- testRunner.logInteraction({
81
- type: 'ai_request_error',
82
- requestType: requestType,
83
- error: error.message,
84
- responseTime: responseTime,
85
- timestamp: Date.now()
86
- });
87
-
88
- testRunner.sessionData.performance.errors.push({
89
- type: requestType,
90
- error: error.message,
91
- timestamp: Date.now()
92
- });
93
-
94
- throw error;
95
- }
96
- };
97
- }
98
-
99
- classifyRequest(prompt) {
100
- const promptLower = prompt.toLowerCase();
101
-
102
- if (promptLower.includes('select') && promptLower.includes('word')) {
103
- return 'word_selection';
104
- } else if (promptLower.includes('context') || promptLower.includes('background')) {
105
- return 'contextualization';
106
- } else if (promptLower.includes('hint') || promptLower.includes('help') || promptLower.includes('clue')) {
107
- return 'chat_hint';
108
- } else {
109
- return 'other';
110
- }
111
- }
112
-
113
- updatePerformanceMetrics(requestType, success, responseTime) {
114
- const perf = this.sessionData.performance;
115
-
116
- switch (requestType) {
117
- case 'word_selection':
118
- perf.wordSelectionRequests++;
119
- if (success) {
120
- perf.wordSelectionSuccess++;
121
- perf.wordSelectionTime += responseTime;
122
- }
123
- break;
124
-
125
- case 'contextualization':
126
- perf.contextualizationRequests++;
127
- if (success) {
128
- perf.contextualizationSuccess++;
129
- perf.contextualizationTime += responseTime;
130
- }
131
- break;
132
-
133
- case 'chat_hint':
134
- perf.chatHintRequests++;
135
- if (success) {
136
- perf.chatHintSuccess++;
137
- perf.chatHintTime += responseTime;
138
- }
139
- break;
140
- }
141
- }
142
-
143
- setupGameEventListeners() {
144
- // Listen for game-specific events
145
- document.addEventListener('gameRoundStart', (event) => {
146
- this.logInteraction({
147
- type: 'round_start',
148
- level: event.detail.level,
149
- round: event.detail.round,
150
- timestamp: Date.now()
151
- });
152
- });
153
-
154
- document.addEventListener('gameRoundComplete', (event) => {
155
- const roundData = {
156
- level: event.detail.level,
157
- round: event.detail.round,
158
- score: event.detail.score,
159
- correctAnswers: event.detail.correctAnswers,
160
- totalBlanks: event.detail.totalBlanks,
161
- timeSpent: event.detail.timeSpent,
162
- timestamp: Date.now()
163
- };
164
-
165
- this.sessionData.rounds.push(roundData);
166
-
167
- // Store the current round index for user ranking association
168
- this.currentRoundIndex = this.sessionData.rounds.length - 1;
169
-
170
- this.logInteraction({
171
- type: 'round_complete',
172
- level: event.detail.level,
173
- round: event.detail.round,
174
- score: event.detail.score,
175
- timestamp: Date.now()
176
- });
177
- });
178
-
179
- document.addEventListener('userAnswer', (event) => {
180
- this.logInteraction({
181
- type: 'user_answer',
182
- word: event.detail.targetWord,
183
- userAnswer: event.detail.userAnswer,
184
- correct: event.detail.correct,
185
- timestamp: Date.now()
186
- });
187
- });
188
-
189
- document.addEventListener('chatInteraction', (event) => {
190
- this.logInteraction({
191
- type: 'chat_interaction',
192
- questionType: event.detail.questionType,
193
- word: event.detail.word,
194
- timestamp: Date.now()
195
- });
196
- });
197
-
198
- // Listen for user ranking events
199
- document.addEventListener('userRanking', (event) => {
200
- const rankingData = {
201
- ...event.detail,
202
- roundIndex: this.currentRoundIndex,
203
- roundDetails: this.sessionData.rounds[this.currentRoundIndex]
204
- };
205
-
206
- this.sessionData.userRankings.push(rankingData);
207
-
208
- this.logInteraction({
209
- type: 'user_ranking',
210
- averageRating: event.detail.averageRating,
211
- ratings: event.detail.ratings,
212
- timestamp: Date.now()
213
- });
214
- });
215
- }
216
-
217
- logInteraction(interaction) {
218
- this.sessionData.interactions.push(interaction);
219
-
220
- // Log to console for real-time monitoring
221
- console.log(`[TestRunner] ${interaction.type}:`, interaction);
222
- }
223
-
224
- generateReport() {
225
- const endTime = Date.now();
226
- const totalTime = endTime - this.sessionData.startTime;
227
- const perf = this.sessionData.performance;
228
-
229
- // Calculate user ranking summary
230
- const userRankingSummary = this.calculateUserRankingSummary();
231
-
232
- const report = {
233
- ...this.sessionData,
234
- endTime: endTime,
235
- totalSessionTime: totalTime,
236
- summary: {
237
- totalRounds: this.sessionData.rounds.length,
238
- averageScore: this.sessionData.rounds.length > 0
239
- ? this.sessionData.rounds.reduce((sum, round) => sum + round.score, 0) / this.sessionData.rounds.length
240
- : 0,
241
- wordSelectionSuccessRate: perf.wordSelectionRequests > 0
242
- ? perf.wordSelectionSuccess / perf.wordSelectionRequests
243
- : 0,
244
- wordSelectionAvgTime: perf.wordSelectionSuccess > 0
245
- ? perf.wordSelectionTime / perf.wordSelectionSuccess
246
- : 0,
247
- contextualizationSuccessRate: perf.contextualizationRequests > 0
248
- ? perf.contextualizationSuccess / perf.contextualizationRequests
249
- : 0,
250
- contextualizationAvgTime: perf.contextualizationSuccess > 0
251
- ? perf.contextualizationTime / perf.contextualizationSuccess
252
- : 0,
253
- chatHintSuccessRate: perf.chatHintRequests > 0
254
- ? perf.chatHintSuccess / perf.chatHintRequests
255
- : 0,
256
- chatHintAvgTime: perf.chatHintSuccess > 0
257
- ? perf.chatHintTime / perf.chatHintSuccess
258
- : 0,
259
- totalErrors: perf.errors.length,
260
- userRankingSummary: userRankingSummary
261
- }
262
- };
263
-
264
- return report;
265
- }
266
-
267
- calculateUserRankingSummary() {
268
- if (this.sessionData.userRankings.length === 0) {
269
- return null;
270
- }
271
-
272
- const categories = ['word_selection', 'passage_quality', 'hint_helpfulness', 'overall_experience'];
273
- const summary = {
274
- totalRankings: this.sessionData.userRankings.length,
275
- averageRatings: {},
276
- categoryBreakdown: {},
277
- comments: [],
278
- overallUserSatisfaction: 0
279
- };
280
-
281
- // Calculate average ratings per category
282
- categories.forEach(category => {
283
- const ratings = this.sessionData.userRankings
284
- .map(r => r.ratings[category])
285
- .filter(r => r !== undefined);
286
-
287
- if (ratings.length > 0) {
288
- summary.averageRatings[category] =
289
- ratings.reduce((a, b) => a + b, 0) / ratings.length;
290
-
291
- // Distribution of ratings
292
- summary.categoryBreakdown[category] = {
293
- 1: ratings.filter(r => r === 1).length,
294
- 2: ratings.filter(r => r === 2).length,
295
- 3: ratings.filter(r => r === 3).length,
296
- 4: ratings.filter(r => r === 4).length,
297
- 5: ratings.filter(r => r === 5).length
298
- };
299
- }
300
- });
301
-
302
- // Calculate overall satisfaction
303
- const allRatings = this.sessionData.userRankings
304
- .map(r => r.averageRating)
305
- .filter(r => r !== undefined);
306
-
307
- if (allRatings.length > 0) {
308
- summary.overallUserSatisfaction =
309
- allRatings.reduce((a, b) => a + b, 0) / allRatings.length;
310
- }
311
-
312
- // Collect comments with context
313
- summary.comments = this.sessionData.userRankings
314
- .filter(r => r.comments)
315
- .map(r => ({
316
- timestamp: r.timestamp,
317
- comment: r.comments,
318
- averageRating: r.averageRating,
319
- roundLevel: r.roundDetails?.level,
320
- roundScore: r.roundDetails?.score
321
- }));
322
-
323
- return summary;
324
- }
325
-
326
- async saveReport() {
327
- const report = this.generateReport();
328
- const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
329
- const filename = `game_test_${this.modelConfig.modelId.replace(/[\/\\:]/g, '_')}_${timestamp}.json`;
330
-
331
- try {
332
- // Try to save via browser download
333
- this.downloadReport(report, filename);
334
-
335
- // Also try to save to output folder if possible (server-side)
336
- await this.saveToServer(report, filename);
337
-
338
- console.log(`Test report saved: ${filename}`);
339
- return filename;
340
- } catch (error) {
341
- console.error('Error saving test report:', error);
342
- return null;
343
- }
344
- }
345
-
346
- downloadReport(report, filename) {
347
- const jsonString = JSON.stringify(report, null, 2);
348
- const blob = new Blob([jsonString], { type: 'application/json' });
349
- const url = URL.createObjectURL(blob);
350
-
351
- const a = document.createElement('a');
352
- a.href = url;
353
- a.download = filename;
354
- document.body.appendChild(a);
355
- a.click();
356
- document.body.removeChild(a);
357
- URL.revokeObjectURL(url);
358
- }
359
-
360
- async saveToServer(report, filename) {
361
- try {
362
- const response = await fetch('/api/save-test-report', {
363
- method: 'POST',
364
- headers: {
365
- 'Content-Type': 'application/json'
366
- },
367
- body: JSON.stringify({
368
- filename: filename,
369
- data: report
370
- })
371
- });
372
-
373
- if (!response.ok) {
374
- throw new Error(`Server save failed: ${response.status}`);
375
- }
376
- } catch (error) {
377
- console.log('Server save not available, using browser download only');
378
- }
379
- }
380
-
381
- // Utility methods for analysis
382
- getWordSelectionAnalytics() {
383
- const wordSelectionInteractions = this.sessionData.interactions.filter(
384
- i => i.type === 'ai_request_success' && i.requestType === 'word_selection'
385
- );
386
-
387
- return {
388
- count: wordSelectionInteractions.length,
389
- averageResponseTime: wordSelectionInteractions.length > 0
390
- ? wordSelectionInteractions.reduce((sum, i) => sum + i.responseTime, 0) / wordSelectionInteractions.length
391
- : 0,
392
- averageResponseLength: wordSelectionInteractions.length > 0
393
- ? wordSelectionInteractions.reduce((sum, i) => sum + i.responseLength, 0) / wordSelectionInteractions.length
394
- : 0
395
- };
396
- }
397
-
398
- getChatHintAnalytics() {
399
- const chatHintInteractions = this.sessionData.interactions.filter(
400
- i => i.type === 'chat_interaction'
401
- );
402
-
403
- const questionTypes = {};
404
- chatHintInteractions.forEach(interaction => {
405
- const type = interaction.questionType || 'unknown';
406
- questionTypes[type] = (questionTypes[type] || 0) + 1;
407
- });
408
-
409
- return {
410
- totalHints: chatHintInteractions.length,
411
- questionTypeBreakdown: questionTypes
412
- };
413
- }
414
-
415
- getUserPerformanceAnalytics() {
416
- const answerInteractions = this.sessionData.interactions.filter(
417
- i => i.type === 'user_answer'
418
- );
419
-
420
- const correctAnswers = answerInteractions.filter(i => i.correct).length;
421
-
422
- return {
423
- totalAnswers: answerInteractions.length,
424
- correctAnswers: correctAnswers,
425
- accuracy: answerInteractions.length > 0 ? correctAnswers / answerInteractions.length : 0
426
- };
427
- }
428
- }
429
-
430
- // Initialize test runner if in test mode
431
- window.addEventListener('DOMContentLoaded', () => {
432
- const urlParams = new URLSearchParams(window.location.search);
433
- if (urlParams.get('testMode') === 'true') {
434
- const modelId = urlParams.get('testModel');
435
- const isLocal = urlParams.get('local') === 'true';
436
-
437
- if (modelId) {
438
- window.testGameRunner = new TestGameRunner({
439
- modelId: modelId,
440
- modelName: modelId,
441
- provider: isLocal ? 'local' : 'openrouter'
442
- });
443
-
444
- console.log('Test Game Runner initialized for model:', modelId);
445
-
446
- // Add end session button
447
- const endButton = document.createElement('button');
448
- endButton.textContent = 'End Test Session';
449
- endButton.style.cssText = `
450
- position: fixed;
451
- top: 10px;
452
- right: 10px;
453
- z-index: 1000;
454
- padding: 10px 15px;
455
- background: #dc3545;
456
- color: white;
457
- border: none;
458
- border-radius: 5px;
459
- cursor: pointer;
460
- `;
461
-
462
- endButton.addEventListener('click', async () => {
463
- const filename = await window.testGameRunner.saveReport();
464
- alert(`Test session ended. Report saved as: ${filename}`);
465
- window.close();
466
- });
467
-
468
- document.body.appendChild(endButton);
469
- }
470
- }
471
- });
472
-
473
- export { TestGameRunner };
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/testReportGenerator.js DELETED
@@ -1,453 +0,0 @@
1
- /**
2
- * Comprehensive Test Report Generator
3
- * Analyzes test results and generates detailed reports
4
- */
5
-
6
- class TestReportGenerator {
7
- constructor() {
8
- this.reportTemplates = {
9
- summary: this.generateSummaryReport.bind(this),
10
- detailed: this.generateDetailedReport.bind(this),
11
- comparison: this.generateComparisonReport.bind(this),
12
- performance: this.generatePerformanceReport.bind(this),
13
- markdown: this.generateMarkdownReport.bind(this)
14
- };
15
- }
16
-
17
- async generateAllReports(testResults, outputFormat = 'all') {
18
- const reports = {};
19
-
20
- if (outputFormat === 'all' || outputFormat === 'summary') {
21
- reports.summary = this.generateSummaryReport(testResults);
22
- }
23
-
24
- if (outputFormat === 'all' || outputFormat === 'detailed') {
25
- reports.detailed = this.generateDetailedReport(testResults);
26
- }
27
-
28
- if (outputFormat === 'all' || outputFormat === 'comparison') {
29
- reports.comparison = this.generateComparisonReport(testResults);
30
- }
31
-
32
- if (outputFormat === 'all' || outputFormat === 'performance') {
33
- reports.performance = this.generatePerformanceReport(testResults);
34
- }
35
-
36
- if (outputFormat === 'all' || outputFormat === 'markdown') {
37
- reports.markdown = this.generateMarkdownReport(testResults);
38
- }
39
-
40
- return reports;
41
- }
42
-
43
- generateSummaryReport(testResults) {
44
- const summary = {
45
- testOverview: {
46
- timestamp: testResults.timestamp,
47
- totalModels: testResults.tests.length,
48
- testDuration: this.calculateTotalTestDuration(testResults.tests),
49
- successfulTests: testResults.tests.filter(t => !t.error).length
50
- },
51
- topPerformers: this.getTopPerformers(testResults.tests),
52
- categoryAverages: this.calculateCategoryAverages(testResults.tests),
53
- recommendations: this.generateRecommendations(testResults.tests)
54
- };
55
-
56
- return summary;
57
- }
58
-
59
- generateDetailedReport(testResults) {
60
- const detailed = {
61
- testMetadata: {
62
- timestamp: testResults.timestamp,
63
- totalModels: testResults.tests.length,
64
- testFrameworkVersion: '1.0.0'
65
- },
66
- modelResults: testResults.tests.map(test => ({
67
- modelInfo: {
68
- id: test.modelId,
69
- name: test.modelName,
70
- provider: test.provider
71
- },
72
- overallPerformance: {
73
- score: test.overallScore,
74
- totalTime: test.totalTime,
75
- rank: this.calculateRank(test, testResults.tests)
76
- },
77
- wordSelection: this.analyzeWordSelection(test.wordSelection),
78
- contextualization: this.analyzeContextualization(test.contextualization),
79
- chatHints: this.analyzeChatHints(test.chatHints),
80
- errorAnalysis: this.analyzeErrors(test)
81
- }))
82
- };
83
-
84
- return detailed;
85
- }
86
-
87
- generateComparisonReport(testResults) {
88
- const validTests = testResults.tests.filter(t => !t.error);
89
-
90
- const comparison = {
91
- modelComparison: this.createModelComparisonMatrix(validTests),
92
- providerAnalysis: this.analyzeByProvider(validTests),
93
- performanceMetrics: {
94
- wordSelection: this.compareWordSelectionMetrics(validTests),
95
- contextualization: this.compareContextualizationMetrics(validTests),
96
- chatHints: this.compareChatHintMetrics(validTests),
97
- responseTime: this.compareResponseTimes(validTests)
98
- },
99
- recommendations: {
100
- bestOverall: this.getBestOverallModel(validTests),
101
- bestForWordSelection: this.getBestForTask(validTests, 'wordSelection'),
102
- bestForContextualization: this.getBestForTask(validTests, 'contextualization'),
103
- bestForChatHints: this.getBestForTask(validTests, 'chatHints'),
104
- fastestResponse: this.getFastestModel(validTests),
105
- mostReliable: this.getMostReliableModel(validTests)
106
- }
107
- };
108
-
109
- return comparison;
110
- }
111
-
112
- generatePerformanceReport(testResults) {
113
- const performance = {
114
- responseTimeAnalysis: this.analyzeResponseTimes(testResults.tests),
115
- successRateAnalysis: this.analyzeSuccessRates(testResults.tests),
116
- qualityMetrics: this.analyzeQualityMetrics(testResults.tests),
117
- scalabilityInsights: this.analyzeScalability(testResults.tests),
118
- reliabilityMetrics: this.analyzeReliability(testResults.tests)
119
- };
120
-
121
- return performance;
122
- }
123
-
124
- generateMarkdownReport(testResults) {
125
- const summary = this.generateSummaryReport(testResults);
126
- const comparison = this.generateComparisonReport(testResults);
127
-
128
- let markdown = `# Cloze Reader Model Testing Report\n\n`;
129
- markdown += `**Generated:** ${new Date().toLocaleString()}\n`;
130
- markdown += `**Test Timestamp:** ${testResults.timestamp}\n`;
131
- markdown += `**Models Tested:** ${testResults.tests.length}\n\n`;
132
-
133
- // Executive Summary
134
- markdown += `## Executive Summary\n\n`;
135
- markdown += `- **Successful Tests:** ${summary.testOverview.successfulTests}/${summary.testOverview.totalModels}\n`;
136
- markdown += `- **Best Overall Model:** ${comparison.recommendations.bestOverall.name} (${comparison.recommendations.bestOverall.score.toFixed(1)}/100)\n`;
137
- markdown += `- **Average Response Time:** ${this.formatTime(this.calculateAverageResponseTime(testResults.tests))}\n\n`;
138
-
139
- // Top Performers
140
- markdown += `## Top Performers\n\n`;
141
- markdown += `| Rank | Model | Score | Provider |\n`;
142
- markdown += `|------|-------|-------|----------|\n`;
143
- summary.topPerformers.forEach((model, index) => {
144
- markdown += `| ${index + 1} | ${model.name} | ${model.score.toFixed(1)} | ${model.provider} |\n`;
145
- });
146
- markdown += `\n`;
147
-
148
- // Performance by Category
149
- markdown += `## Performance by Category\n\n`;
150
- markdown += `### Word Selection\n`;
151
- markdown += `- **Best:** ${comparison.recommendations.bestForWordSelection.name} (${(comparison.recommendations.bestForWordSelection.successRate * 100).toFixed(1)}% success rate)\n`;
152
- markdown += `- **Average Success Rate:** ${(summary.categoryAverages.wordSelection.successRate * 100).toFixed(1)}%\n`;
153
- markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.wordSelection.averageTime)}\n\n`;
154
-
155
- markdown += `### Contextualization\n`;
156
- markdown += `- **Best:** ${comparison.recommendations.bestForContextualization.name} (${(comparison.recommendations.bestForContextualization.successRate * 100).toFixed(1)}% success rate)\n`;
157
- markdown += `- **Average Success Rate:** ${(summary.categoryAverages.contextualization.successRate * 100).toFixed(1)}%\n`;
158
- markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.contextualization.averageTime)}\n\n`;
159
-
160
- markdown += `### Chat Hints\n`;
161
- markdown += `- **Best:** ${comparison.recommendations.bestForChatHints.name} (${(comparison.recommendations.bestForChatHints.successRate * 100).toFixed(1)}% success rate)\n`;
162
- markdown += `- **Average Success Rate:** ${(summary.categoryAverages.chatHints.successRate * 100).toFixed(1)}%\n`;
163
- markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.chatHints.averageTime)}\n\n`;
164
-
165
- // Add user rankings section if available
166
- const hasUserRankings = testResults.tests.some(t => t.userRankings?.totalRankings > 0);
167
- if (hasUserRankings) {
168
- markdown += `## User Satisfaction Ratings\n\n`;
169
- markdown += `| Model | Overall Satisfaction | Word Selection | Passage Quality | Hint Helpfulness | Overall Experience |\n`;
170
- markdown += `|-------|---------------------|----------------|-----------------|------------------|--------------------|\n`;
171
-
172
- testResults.tests.forEach(test => {
173
- if (test.userRankings?.totalRankings > 0) {
174
- const ur = test.userRankings;
175
- const avg = ur.averageRatings || {};
176
- markdown += `| ${test.modelName} | ${ur.overallUserSatisfaction.toFixed(1)}/5 | ${(avg.word_selection || 0).toFixed(1)} | ${(avg.passage_quality || 0).toFixed(1)} | ${(avg.hint_helpfulness || 0).toFixed(1)} | ${(avg.overall_experience || 0).toFixed(1)} |\n`;
177
- }
178
- });
179
- markdown += `\n`;
180
-
181
- // Add user comments if any
182
- const allComments = testResults.tests
183
- .filter(t => t.userRankings?.comments?.length > 0)
184
- .flatMap(t => t.userRankings.comments.map(c => ({ ...c, model: t.modelName })));
185
-
186
- if (allComments.length > 0) {
187
- markdown += `### User Comments\n\n`;
188
- allComments.forEach(comment => {
189
- markdown += `- **${comment.model}** (Rating: ${comment.averageRating.toFixed(1)}): "${comment.comment}"\n`;
190
- });
191
- markdown += `\n`;
192
- }
193
- }
194
-
195
- // Detailed Results
196
- markdown += `## Detailed Results\n\n`;
197
- testResults.tests.forEach(test => {
198
- if (!test.error) {
199
- markdown += `### ${test.modelName}\n`;
200
- markdown += `- **Provider:** ${test.provider}\n`;
201
- markdown += `- **Overall Score:** ${test.overallScore.toFixed(1)}/100\n`;
202
- markdown += `- **Total Time:** ${this.formatTime(test.totalTime)}\n`;
203
- markdown += `- **Word Selection:** ${(test.wordSelection?.successRate * 100 || 0).toFixed(1)}% success\n`;
204
- markdown += `- **Contextualization:** ${(test.contextualization?.successRate * 100 || 0).toFixed(1)}% success\n`;
205
- markdown += `- **Chat Hints:** ${(test.chatHints?.successRate * 100 || 0).toFixed(1)}% success\n\n`;
206
- }
207
- });
208
-
209
- // Recommendations
210
- markdown += `## Recommendations\n\n`;
211
- summary.recommendations.forEach(rec => {
212
- markdown += `- ${rec}\n`;
213
- });
214
-
215
- return markdown;
216
- }
217
-
218
- // Helper methods for analysis
219
- calculateTotalTestDuration(tests) {
220
- return tests.reduce((total, test) => total + (test.totalTime || 0), 0);
221
- }
222
-
223
- getTopPerformers(tests, limit = 5) {
224
- return tests
225
- .filter(t => !t.error && t.overallScore)
226
- .sort((a, b) => b.overallScore - a.overallScore)
227
- .slice(0, limit)
228
- .map(test => ({
229
- name: test.modelName,
230
- score: test.overallScore,
231
- provider: test.provider
232
- }));
233
- }
234
-
235
- calculateCategoryAverages(tests) {
236
- const validTests = tests.filter(t => !t.error);
237
-
238
- return {
239
- wordSelection: this.calculateCategoryAverage(validTests, 'wordSelection'),
240
- contextualization: this.calculateCategoryAverage(validTests, 'contextualization'),
241
- chatHints: this.calculateCategoryAverage(validTests, 'chatHints')
242
- };
243
- }
244
-
245
- calculateCategoryAverage(tests, category) {
246
- const validCategoryTests = tests.filter(t => t[category]);
247
-
248
- if (validCategoryTests.length === 0) {
249
- return { successRate: 0, averageTime: 0, qualityScore: 0 };
250
- }
251
-
252
- return {
253
- successRate: validCategoryTests.reduce((sum, t) => sum + (t[category].successRate || 0), 0) / validCategoryTests.length,
254
- averageTime: validCategoryTests.reduce((sum, t) => sum + (t[category].averageTime || 0), 0) / validCategoryTests.length,
255
- qualityScore: validCategoryTests.reduce((sum, t) => sum + (t[category].qualityScore || t[category].relevanceScore || t[category].helpfulnessScore || 0), 0) / validCategoryTests.length
256
- };
257
- }
258
-
259
- generateRecommendations(tests) {
260
- const recommendations = [];
261
- const validTests = tests.filter(t => !t.error);
262
-
263
- if (validTests.length === 0) {
264
- return ['No successful tests to generate recommendations.'];
265
- }
266
-
267
- const bestOverall = validTests.reduce((best, test) =>
268
- test.overallScore > best.overallScore ? test : best
269
- );
270
-
271
- recommendations.push(`For overall best performance, use ${bestOverall.modelName} (${bestOverall.provider})`);
272
-
273
- // Provider-specific recommendations
274
- const providerPerformance = this.analyzeByProvider(validTests);
275
- const bestProvider = Object.keys(providerPerformance)
276
- .reduce((best, provider) =>
277
- providerPerformance[provider].averageScore > providerPerformance[best]?.averageScore ? provider : best
278
- );
279
-
280
- recommendations.push(`${bestProvider} models show the best average performance`);
281
-
282
- // Speed vs quality trade-offs
283
- const fastestGoodModel = validTests
284
- .filter(t => t.overallScore > 70)
285
- .sort((a, b) => a.totalTime - b.totalTime)[0];
286
-
287
- if (fastestGoodModel) {
288
- recommendations.push(`For fastest good performance, consider ${fastestGoodModel.modelName}`);
289
- }
290
-
291
- return recommendations;
292
- }
293
-
294
- analyzeByProvider(tests) {
295
- const providerGroups = {};
296
-
297
- tests.forEach(test => {
298
- if (!providerGroups[test.provider]) {
299
- providerGroups[test.provider] = [];
300
- }
301
- providerGroups[test.provider].push(test);
302
- });
303
-
304
- const analysis = {};
305
- Object.keys(providerGroups).forEach(provider => {
306
- const providerTests = providerGroups[provider];
307
- analysis[provider] = {
308
- count: providerTests.length,
309
- averageScore: providerTests.reduce((sum, t) => sum + t.overallScore, 0) / providerTests.length,
310
- averageTime: providerTests.reduce((sum, t) => sum + t.totalTime, 0) / providerTests.length,
311
- successRate: providerTests.filter(t => !t.error).length / providerTests.length
312
- };
313
- });
314
-
315
- return analysis;
316
- }
317
-
318
- getBestOverallModel(tests) {
319
- return tests.reduce((best, test) =>
320
- test.overallScore > best.overallScore ? {
321
- name: test.modelName,
322
- score: test.overallScore,
323
- provider: test.provider
324
- } : best
325
- , { name: '', score: 0, provider: '' });
326
- }
327
-
328
- getBestForTask(tests, taskName) {
329
- const validTests = tests.filter(t => t[taskName] && t[taskName].successRate !== undefined);
330
-
331
- if (validTests.length === 0) {
332
- return { name: 'N/A', successRate: 0, provider: '' };
333
- }
334
-
335
- return validTests.reduce((best, test) =>
336
- test[taskName].successRate > best.successRate ? {
337
- name: test.modelName,
338
- successRate: test[taskName].successRate,
339
- provider: test.provider
340
- } : best
341
- , { name: '', successRate: 0, provider: '' });
342
- }
343
-
344
- getFastestModel(tests) {
345
- return tests.reduce((fastest, test) =>
346
- test.totalTime < fastest.time ? {
347
- name: test.modelName,
348
- time: test.totalTime,
349
- provider: test.provider
350
- } : fastest
351
- , { name: '', time: Infinity, provider: '' });
352
- }
353
-
354
- getMostReliableModel(tests) {
355
- // Model with fewest errors and highest success rates across all tasks
356
- const reliability = tests.map(test => {
357
- const wordSelectionReliability = test.wordSelection?.successRate || 0;
358
- const contextualizationReliability = test.contextualization?.successRate || 0;
359
- const chatHintReliability = test.chatHints?.successRate || 0;
360
-
361
- const overallReliability = (wordSelectionReliability + contextualizationReliability + chatHintReliability) / 3;
362
-
363
- return {
364
- name: test.modelName,
365
- reliability: overallReliability,
366
- provider: test.provider
367
- };
368
- });
369
-
370
- return reliability.reduce((most, test) =>
371
- test.reliability > most.reliability ? test : most
372
- , { name: '', reliability: 0, provider: '' });
373
- }
374
-
375
- calculateAverageResponseTime(tests) {
376
- const validTests = tests.filter(t => t.totalTime);
377
- return validTests.reduce((sum, t) => sum + t.totalTime, 0) / validTests.length;
378
- }
379
-
380
- formatTime(milliseconds) {
381
- if (milliseconds < 1000) {
382
- return `${milliseconds.toFixed(0)}ms`;
383
- } else if (milliseconds < 60000) {
384
- return `${(milliseconds / 1000).toFixed(1)}s`;
385
- } else {
386
- return `${(milliseconds / 60000).toFixed(1)}m`;
387
- }
388
- }
389
-
390
- async saveReports(reports, baseFilename) {
391
- const savedFiles = [];
392
-
393
- for (const [type, content] of Object.entries(reports)) {
394
- const filename = `${baseFilename}_${type}`;
395
- let fileContent, extension;
396
-
397
- if (type === 'markdown') {
398
- fileContent = content;
399
- extension = '.md';
400
- } else {
401
- fileContent = JSON.stringify(content, null, 2);
402
- extension = '.json';
403
- }
404
-
405
- try {
406
- await this.saveFile(`${filename}${extension}`, fileContent);
407
- savedFiles.push(`${filename}${extension}`);
408
- } catch (error) {
409
- console.error(`Error saving ${filename}:`, error);
410
- }
411
- }
412
-
413
- return savedFiles;
414
- }
415
-
416
- async saveFile(filename, content) {
417
- // Try to save via browser download
418
- const blob = new Blob([content], {
419
- type: filename.endsWith('.md') ? 'text/markdown' : 'application/json'
420
- });
421
- const url = URL.createObjectURL(blob);
422
-
423
- const a = document.createElement('a');
424
- a.href = url;
425
- a.download = filename;
426
- document.body.appendChild(a);
427
- a.click();
428
- document.body.removeChild(a);
429
- URL.revokeObjectURL(url);
430
- }
431
-
432
- // Stub methods for detailed analysis (implement as needed)
433
- analyzeWordSelection(data) { return data; }
434
- analyzeContextualization(data) { return data; }
435
- analyzeChatHints(data) { return data; }
436
- analyzeErrors(test) { return test.error ? [test.error] : []; }
437
- calculateRank(test, allTests) {
438
- const sorted = allTests.filter(t => !t.error).sort((a, b) => b.overallScore - a.overallScore);
439
- return sorted.findIndex(t => t.modelId === test.modelId) + 1;
440
- }
441
- createModelComparisonMatrix(tests) { return {}; }
442
- compareWordSelectionMetrics(tests) { return {}; }
443
- compareContextualizationMetrics(tests) { return {}; }
444
- compareChatHintMetrics(tests) { return {}; }
445
- compareResponseTimes(tests) { return {}; }
446
- analyzeResponseTimes(tests) { return {}; }
447
- analyzeSuccessRates(tests) { return {}; }
448
- analyzeQualityMetrics(tests) { return {}; }
449
- analyzeScalability(tests) { return {}; }
450
- analyzeReliability(tests) { return {}; }
451
- }
452
-
453
- export { TestReportGenerator };
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
src/userRankingInterface.js DELETED
@@ -1,650 +0,0 @@
1
- /**
2
- * User Ranking Interface for Model Testing
3
- * Allows users to rate model performance on each task during gameplay
4
- */
5
-
6
- class UserRankingInterface {
7
- constructor() {
8
- this.rankings = {
9
- rounds: [],
10
- currentRound: null
11
- };
12
-
13
- this.rankingCategories = [
14
- {
15
- id: 'word_selection',
16
- name: 'Word Selection Quality',
17
- description: 'How appropriate were the selected words for this difficulty level?',
18
- criteria: [
19
- 'Words match the difficulty level',
20
- 'Vocabulary is challenging but fair',
21
- 'Selected words are meaningful in context'
22
- ]
23
- },
24
- {
25
- id: 'passage_quality',
26
- name: 'Passage Selection',
27
- description: 'How suitable was this passage for language learning?',
28
- criteria: [
29
- 'Text is engaging and appropriate',
30
- 'Content is educational',
31
- 'Difficulty matches the level'
32
- ]
33
- },
34
- {
35
- id: 'hint_helpfulness',
36
- name: 'Hint Quality',
37
- description: 'How helpful were the AI-generated hints?',
38
- criteria: [
39
- 'Hints guide without revealing answers',
40
- 'Explanations are clear and educational',
41
- 'Responses are contextually appropriate'
42
- ]
43
- },
44
- {
45
- id: 'overall_experience',
46
- name: 'Overall Round Experience',
47
- description: 'How was the overall quality of this round?',
48
- criteria: [
49
- 'Smooth gameplay experience',
50
- 'AI responses were timely',
51
- 'Educational value was high'
52
- ]
53
- }
54
- ];
55
-
56
- this.createRankingUI();
57
- this.setupEventListeners();
58
- }
59
-
60
- createRankingUI() {
61
- // Create ranking modal
62
- const modal = document.createElement('div');
63
- modal.id = 'ranking-modal';
64
- modal.className = 'ranking-modal';
65
- modal.innerHTML = `
66
- <div class="ranking-modal-content">
67
- <h2>Rate This Round</h2>
68
- <p class="ranking-subtitle">Help us improve by rating the AI's performance</p>
69
-
70
- <div id="ranking-categories" class="ranking-categories">
71
- <!-- Categories will be populated dynamically -->
72
- </div>
73
-
74
- <div class="ranking-comments">
75
- <label for="ranking-comments-input">Additional Comments (Optional):</label>
76
- <textarea id="ranking-comments-input" rows="3" placeholder="Any specific feedback about this round..."></textarea>
77
- </div>
78
-
79
- <div class="ranking-actions">
80
- <button id="skip-ranking-btn" class="btn-secondary">Skip</button>
81
- <button id="submit-ranking-btn" class="btn-primary" disabled>Submit Rating</button>
82
- </div>
83
- </div>
84
- `;
85
-
86
- // Create ranking trigger button
87
- const triggerButton = document.createElement('button');
88
- triggerButton.id = 'ranking-trigger-btn';
89
- triggerButton.className = 'ranking-trigger-btn';
90
- triggerButton.innerHTML = '⭐ Rate Round';
91
- triggerButton.style.cssText = `
92
- position: fixed;
93
- bottom: 20px;
94
- left: 20px;
95
- z-index: 999;
96
- padding: 10px 20px;
97
- background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
98
- color: white;
99
- border: none;
100
- border-radius: 25px;
101
- cursor: pointer;
102
- font-size: 14px;
103
- font-weight: bold;
104
- box-shadow: 0 4px 15px rgba(102, 126, 234, 0.4);
105
- transition: all 0.3s ease;
106
- display: none;
107
- `;
108
-
109
- // Add styles
110
- const styles = document.createElement('style');
111
- styles.textContent = `
112
- .ranking-modal {
113
- display: none;
114
- position: fixed;
115
- top: 0;
116
- left: 0;
117
- width: 100%;
118
- height: 100%;
119
- background: rgba(0, 0, 0, 0.5);
120
- z-index: 1000;
121
- backdrop-filter: blur(5px);
122
- }
123
-
124
- .ranking-modal.active {
125
- display: flex;
126
- align-items: center;
127
- justify-content: center;
128
- }
129
-
130
- .ranking-modal-content {
131
- background: white;
132
- border-radius: 15px;
133
- padding: 30px;
134
- max-width: 600px;
135
- width: 90%;
136
- max-height: 80vh;
137
- overflow-y: auto;
138
- box-shadow: 0 10px 40px rgba(0, 0, 0, 0.3);
139
- }
140
-
141
- .ranking-modal-content h2 {
142
- color: #2c3e50;
143
- margin-bottom: 10px;
144
- text-align: center;
145
- }
146
-
147
- .ranking-subtitle {
148
- color: #7f8c8d;
149
- text-align: center;
150
- margin-bottom: 30px;
151
- }
152
-
153
- .ranking-category {
154
- margin-bottom: 25px;
155
- padding: 20px;
156
- background: #f8f9fa;
157
- border-radius: 10px;
158
- border: 2px solid #e9ecef;
159
- }
160
-
161
- .ranking-category h3 {
162
- color: #2c3e50;
163
- margin-bottom: 8px;
164
- font-size: 1.1rem;
165
- }
166
-
167
- .ranking-category-description {
168
- color: #6c757d;
169
- font-size: 0.9rem;
170
- margin-bottom: 15px;
171
- }
172
-
173
- .ranking-criteria {
174
- font-size: 0.85rem;
175
- color: #6c757d;
176
- margin-bottom: 15px;
177
- padding-left: 20px;
178
- }
179
-
180
- .ranking-criteria li {
181
- margin-bottom: 5px;
182
- }
183
-
184
- .ranking-stars {
185
- display: flex;
186
- gap: 10px;
187
- justify-content: center;
188
- margin-top: 10px;
189
- }
190
-
191
- .ranking-star {
192
- font-size: 30px;
193
- color: #ddd;
194
- cursor: pointer;
195
- transition: all 0.2s ease;
196
- }
197
-
198
- .ranking-star:hover,
199
- .ranking-star.hover {
200
- color: #ffd700;
201
- transform: scale(1.1);
202
- }
203
-
204
- .ranking-star.selected {
205
- color: #ffd700;
206
- }
207
-
208
- .ranking-comments {
209
- margin: 20px 0;
210
- }
211
-
212
- .ranking-comments label {
213
- display: block;
214
- color: #2c3e50;
215
- margin-bottom: 8px;
216
- font-weight: 500;
217
- }
218
-
219
- .ranking-comments textarea {
220
- width: 100%;
221
- padding: 10px;
222
- border: 2px solid #e9ecef;
223
- border-radius: 8px;
224
- font-family: inherit;
225
- resize: vertical;
226
- }
227
-
228
- .ranking-actions {
229
- display: flex;
230
- gap: 15px;
231
- justify-content: flex-end;
232
- margin-top: 20px;
233
- }
234
-
235
- .btn-primary, .btn-secondary {
236
- padding: 10px 24px;
237
- border: none;
238
- border-radius: 8px;
239
- font-size: 1rem;
240
- cursor: pointer;
241
- transition: all 0.3s ease;
242
- font-weight: 500;
243
- }
244
-
245
- .btn-primary {
246
- background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
247
- color: white;
248
- }
249
-
250
- .btn-primary:hover:not(:disabled) {
251
- transform: translateY(-2px);
252
- box-shadow: 0 6px 20px rgba(102, 126, 234, 0.4);
253
- }
254
-
255
- .btn-primary:disabled {
256
- background: #6c757d;
257
- cursor: not-allowed;
258
- }
259
-
260
- .btn-secondary {
261
- background: #e9ecef;
262
- color: #495057;
263
- }
264
-
265
- .btn-secondary:hover {
266
- background: #dee2e6;
267
- }
268
-
269
- .ranking-trigger-btn:hover {
270
- transform: translateY(-2px) scale(1.05);
271
- box-shadow: 0 6px 20px rgba(102, 126, 234, 0.6);
272
- }
273
-
274
- @media (max-width: 600px) {
275
- .ranking-modal-content {
276
- padding: 20px;
277
- }
278
-
279
- .ranking-star {
280
- font-size: 24px;
281
- }
282
-
283
- .ranking-trigger-btn {
284
- bottom: 70px;
285
- padding: 8px 16px;
286
- font-size: 12px;
287
- }
288
- }
289
- `;
290
-
291
- document.head.appendChild(styles);
292
- document.body.appendChild(modal);
293
- document.body.appendChild(triggerButton);
294
-
295
- this.populateCategories();
296
- }
297
-
298
- populateCategories() {
299
- const container = document.getElementById('ranking-categories');
300
- container.innerHTML = '';
301
-
302
- this.rankingCategories.forEach(category => {
303
- const categoryDiv = document.createElement('div');
304
- categoryDiv.className = 'ranking-category';
305
- categoryDiv.dataset.categoryId = category.id;
306
-
307
- const criteriaHtml = category.criteria.map(c => `<li>${c}</li>`).join('');
308
-
309
- categoryDiv.innerHTML = `
310
- <h3>${category.name}</h3>
311
- <p class="ranking-category-description">${category.description}</p>
312
- <ul class="ranking-criteria">${criteriaHtml}</ul>
313
- <div class="ranking-stars" data-category="${category.id}">
314
- ${[1, 2, 3, 4, 5].map(i =>
315
- `<span class="ranking-star" data-rating="${i}">★</span>`
316
- ).join('')}
317
- </div>
318
- `;
319
-
320
- container.appendChild(categoryDiv);
321
- });
322
-
323
- // Setup star interactions
324
- this.setupStarInteractions();
325
- }
326
-
327
- setupStarInteractions() {
328
- const starContainers = document.querySelectorAll('.ranking-stars');
329
-
330
- starContainers.forEach(container => {
331
- const stars = container.querySelectorAll('.ranking-star');
332
- const categoryId = container.dataset.category;
333
-
334
- stars.forEach((star, index) => {
335
- star.addEventListener('mouseenter', () => {
336
- this.highlightStars(stars, index + 1);
337
- });
338
-
339
- star.addEventListener('click', () => {
340
- this.selectRating(categoryId, index + 1);
341
- this.markStarsAsSelected(stars, index + 1);
342
- this.updateSubmitButton();
343
- });
344
- });
345
-
346
- container.addEventListener('mouseleave', () => {
347
- const currentRating = this.getCurrentRating(categoryId);
348
- if (currentRating > 0) {
349
- this.markStarsAsSelected(stars, currentRating);
350
- } else {
351
- this.highlightStars(stars, 0);
352
- }
353
- });
354
- });
355
- }
356
-
357
- highlightStars(stars, count) {
358
- stars.forEach((star, index) => {
359
- if (index < count) {
360
- star.classList.add('hover');
361
- } else {
362
- star.classList.remove('hover');
363
- }
364
- });
365
- }
366
-
367
- markStarsAsSelected(stars, count) {
368
- stars.forEach((star, index) => {
369
- if (index < count) {
370
- star.classList.add('selected');
371
- star.classList.remove('hover');
372
- } else {
373
- star.classList.remove('selected');
374
- star.classList.remove('hover');
375
- }
376
- });
377
- }
378
-
379
- selectRating(categoryId, rating) {
380
- if (!this.currentRound) {
381
- this.currentRound = {
382
- timestamp: Date.now(),
383
- ratings: {},
384
- comments: ''
385
- };
386
- }
387
-
388
- this.currentRound.ratings[categoryId] = rating;
389
- }
390
-
391
- getCurrentRating(categoryId) {
392
- return this.currentRound?.ratings[categoryId] || 0;
393
- }
394
-
395
- setupEventListeners() {
396
- const modal = document.getElementById('ranking-modal');
397
- const triggerBtn = document.getElementById('ranking-trigger-btn');
398
- const skipBtn = document.getElementById('skip-ranking-btn');
399
- const submitBtn = document.getElementById('submit-ranking-btn');
400
- const commentsInput = document.getElementById('ranking-comments-input');
401
-
402
- // Show modal
403
- triggerBtn.addEventListener('click', () => {
404
- this.showRankingModal();
405
- });
406
-
407
- // Skip ranking
408
- skipBtn.addEventListener('click', () => {
409
- this.hideRankingModal();
410
- this.currentRound = null;
411
- });
412
-
413
- // Submit ranking
414
- submitBtn.addEventListener('click', () => {
415
- this.submitRanking();
416
- });
417
-
418
- // Update comments
419
- commentsInput.addEventListener('input', (e) => {
420
- if (this.currentRound) {
421
- this.currentRound.comments = e.target.value;
422
- }
423
- });
424
-
425
- // Close modal on background click
426
- modal.addEventListener('click', (e) => {
427
- if (e.target === modal) {
428
- this.hideRankingModal();
429
- }
430
- });
431
-
432
- // Listen for round completion events
433
- document.addEventListener('gameRoundComplete', (event) => {
434
- this.onRoundComplete(event.detail);
435
- });
436
- }
437
-
438
- updateSubmitButton() {
439
- const submitBtn = document.getElementById('submit-ranking-btn');
440
- const allRated = this.rankingCategories.every(category =>
441
- this.getCurrentRating(category.id) > 0
442
- );
443
-
444
- submitBtn.disabled = !allRated;
445
- }
446
-
447
- showRankingModal() {
448
- const modal = document.getElementById('ranking-modal');
449
- modal.classList.add('active');
450
-
451
- // Reset current round if needed
452
- if (!this.currentRound) {
453
- this.currentRound = {
454
- timestamp: Date.now(),
455
- ratings: {},
456
- comments: ''
457
- };
458
- }
459
-
460
- // Clear previous selections
461
- this.resetUI();
462
- }
463
-
464
- hideRankingModal() {
465
- const modal = document.getElementById('ranking-modal');
466
- modal.classList.remove('active');
467
- }
468
-
469
- resetUI() {
470
- // Clear all star selections
471
- document.querySelectorAll('.ranking-star').forEach(star => {
472
- star.classList.remove('selected', 'hover');
473
- });
474
-
475
- // Clear comments
476
- document.getElementById('ranking-comments-input').value = '';
477
-
478
- // Disable submit button
479
- document.getElementById('submit-ranking-btn').disabled = true;
480
- }
481
-
482
- submitRanking() {
483
- if (!this.currentRound) return;
484
-
485
- // Add metadata
486
- this.currentRound.submittedAt = Date.now();
487
- this.currentRound.modelId = window.testGameRunner?.modelConfig?.modelId || 'unknown';
488
-
489
- // Calculate average rating
490
- const ratings = Object.values(this.currentRound.ratings);
491
- this.currentRound.averageRating = ratings.reduce((a, b) => a + b, 0) / ratings.length;
492
-
493
- // Save ranking
494
- this.rankings.rounds.push(this.currentRound);
495
-
496
- // Dispatch event for test runner
497
- document.dispatchEvent(new CustomEvent('userRanking', {
498
- detail: this.currentRound
499
- }));
500
-
501
- // Show confirmation
502
- this.showConfirmation();
503
-
504
- // Reset
505
- this.hideRankingModal();
506
- this.currentRound = null;
507
-
508
- console.log('Ranking submitted:', this.rankings);
509
- }
510
-
511
- showConfirmation() {
512
- const confirmation = document.createElement('div');
513
- confirmation.style.cssText = `
514
- position: fixed;
515
- bottom: 100px;
516
- left: 50%;
517
- transform: translateX(-50%);
518
- background: #28a745;
519
- color: white;
520
- padding: 15px 30px;
521
- border-radius: 8px;
522
- box-shadow: 0 4px 15px rgba(40, 167, 69, 0.4);
523
- z-index: 1001;
524
- animation: slideInUp 0.3s ease;
525
- `;
526
- confirmation.textContent = '✓ Thank you for your feedback!';
527
-
528
- document.body.appendChild(confirmation);
529
-
530
- setTimeout(() => {
531
- confirmation.style.animation = 'slideOutDown 0.3s ease';
532
- setTimeout(() => confirmation.remove(), 300);
533
- }, 2000);
534
- }
535
-
536
- onRoundComplete(roundDetails) {
537
- // Store round details for context
538
- if (!this.currentRound) {
539
- this.currentRound = {
540
- timestamp: Date.now(),
541
- ratings: {},
542
- comments: '',
543
- roundDetails: roundDetails
544
- };
545
- } else {
546
- this.currentRound.roundDetails = roundDetails;
547
- }
548
-
549
- // Show ranking trigger button
550
- const triggerBtn = document.getElementById('ranking-trigger-btn');
551
- triggerBtn.style.display = 'block';
552
-
553
- // Auto-show modal after a short delay (optional)
554
- if (window.testGameRunner?.modelConfig?.autoShowRanking) {
555
- setTimeout(() => this.showRankingModal(), 1500);
556
- }
557
- }
558
-
559
- exportRankings() {
560
- const exportData = {
561
- ...this.rankings,
562
- exportedAt: new Date().toISOString(),
563
- modelId: window.testGameRunner?.modelConfig?.modelId || 'unknown'
564
- };
565
-
566
- return exportData;
567
- }
568
-
569
- getRankingSummary() {
570
- if (this.rankings.rounds.length === 0) {
571
- return null;
572
- }
573
-
574
- const summary = {
575
- totalRounds: this.rankings.rounds.length,
576
- averageRatings: {},
577
- categoryBreakdown: {},
578
- comments: []
579
- };
580
-
581
- // Calculate average ratings per category
582
- this.rankingCategories.forEach(category => {
583
- const ratings = this.rankings.rounds
584
- .map(r => r.ratings[category.id])
585
- .filter(r => r !== undefined);
586
-
587
- if (ratings.length > 0) {
588
- summary.averageRatings[category.id] =
589
- ratings.reduce((a, b) => a + b, 0) / ratings.length;
590
-
591
- // Distribution of ratings
592
- summary.categoryBreakdown[category.id] = {
593
- 1: ratings.filter(r => r === 1).length,
594
- 2: ratings.filter(r => r === 2).length,
595
- 3: ratings.filter(r => r === 3).length,
596
- 4: ratings.filter(r => r === 4).length,
597
- 5: ratings.filter(r => r === 5).length
598
- };
599
- }
600
- });
601
-
602
- // Collect all comments
603
- summary.comments = this.rankings.rounds
604
- .filter(r => r.comments)
605
- .map(r => ({
606
- timestamp: r.timestamp,
607
- comment: r.comments,
608
- averageRating: r.averageRating
609
- }));
610
-
611
- return summary;
612
- }
613
- }
614
-
615
- // Initialize when in test mode
616
- window.addEventListener('DOMContentLoaded', () => {
617
- const urlParams = new URLSearchParams(window.location.search);
618
- if (urlParams.get('testMode') === 'true') {
619
- window.userRankingInterface = new UserRankingInterface();
620
-
621
- // Add CSS animation keyframes
622
- const animationStyles = document.createElement('style');
623
- animationStyles.textContent = `
624
- @keyframes slideInUp {
625
- from {
626
- transform: translate(-50%, 100%);
627
- opacity: 0;
628
- }
629
- to {
630
- transform: translate(-50%, 0);
631
- opacity: 1;
632
- }
633
- }
634
-
635
- @keyframes slideOutDown {
636
- from {
637
- transform: translate(-50%, 0);
638
- opacity: 1;
639
- }
640
- to {
641
- transform: translate(-50%, 100%);
642
- opacity: 0;
643
- }
644
- }
645
- `;
646
- document.head.appendChild(animationStyles);
647
- }
648
- });
649
-
650
- export { UserRankingInterface };
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
test-direct.js DELETED
@@ -1,28 +0,0 @@
1
- import { AIService } from './src/aiService.js';
2
-
3
- // Force local mode
4
- const originalSearch = window.location.search;
5
- window.location.search = '?local=true';
6
-
7
- const ai = new AIService();
8
-
9
- console.log('Testing direct AI connection...');
10
- console.log('Config:', {
11
- url: ai.apiUrl,
12
- model: ai.model,
13
- isLocal: ai.isLocalMode
14
- });
15
-
16
- const testPassage = "The ancient library contained thousands of manuscripts, each one carefully preserved by generations of scholars who dedicated their lives to knowledge.";
17
-
18
- try {
19
- console.log('\nTesting word selection...');
20
- const words = await ai.selectSignificantWords(testPassage, 2, 3);
21
- console.log('Selected words:', words);
22
- console.log('✅ Success!');
23
- } catch (error) {
24
- console.error('❌ Error:', error.message);
25
- }
26
-
27
- // Restore original search
28
- window.location.search = originalSearch;
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
test-local-llm.js DELETED
@@ -1,155 +0,0 @@
1
- #!/usr/bin/env node
2
-
3
- // Stress test for local LLM on port 1234
4
- // Tests word selection functionality with Gutenberg passages
5
-
6
- import http from 'http';
7
-
8
- // Sample Gutenberg passages for testing
9
- const testPassages = [
10
- "The sun was shining brightly on the sea, shining with all his might. He did his very best to make the billows smooth and bright. And this was odd, because it was the middle of the night.",
11
- "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity.",
12
- "In a hole in the ground there lived a hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on.",
13
- "Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little.",
14
- "It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife. However little known the feelings or views of such a man may be."
15
- ];
16
-
17
- // Word selection prompt template (based on cloze reader's format)
18
- function createWordSelectionPrompt(passage, level = 1) {
19
- const wordCount = level < 6 ? 1 : level < 11 ? 2 : 3;
20
- const minLength = level < 3 ? 4 : 5;
21
- const maxLength = level < 3 ? 7 : level < 5 ? 10 : 14;
22
-
23
- return {
24
- model: "gemma-3-12b",
25
- messages: [
26
- {
27
- role: "system",
28
- content: "You are a vocabulary expert who selects appropriate words for cloze exercises."
29
- },
30
- {
31
- role: "user",
32
- content: `Select ${wordCount} word${wordCount > 1 ? 's' : ''} from this passage for a cloze exercise.
33
-
34
- Passage: "${passage}"
35
-
36
- Requirements:
37
- - Select exactly ${wordCount} different word${wordCount > 1 ? 's' : ''}
38
- - Each word must be ${minLength}-${maxLength} letters long
39
- - Words must be meaningful nouns, verbs, adjectives, or adverbs
40
- - Avoid pronouns, articles, and common words
41
- - Return ONLY the selected word${wordCount > 1 ? 's' : ''}, ${wordCount > 1 ? 'comma-separated' : 'nothing else'}
42
-
43
- Selected word${wordCount > 1 ? 's' : ''}:`
44
- }
45
- ],
46
- temperature: 0.7,
47
- max_tokens: 50
48
- };
49
- }
50
-
51
- // Function to make HTTP request to local LLM
52
- function testLLMConnection(passage, testNumber) {
53
- return new Promise((resolve, reject) => {
54
- const prompt = createWordSelectionPrompt(passage, Math.floor(Math.random() * 10) + 1);
55
- const data = JSON.stringify(prompt);
56
-
57
- const options = {
58
- hostname: 'localhost',
59
- port: 1234,
60
- path: '/v1/chat/completions',
61
- method: 'POST',
62
- headers: {
63
- 'Content-Type': 'application/json',
64
- 'Content-Length': data.length
65
- }
66
- };
67
-
68
- console.log(`\n=== Test ${testNumber} ===`);
69
- console.log(`Passage: "${passage.substring(0, 80)}..."`);
70
- console.log(`Sending request to http://localhost:1234/v1/chat/completions`);
71
-
72
- const startTime = Date.now();
73
-
74
- const req = http.request(options, (res) => {
75
- let responseData = '';
76
-
77
- res.on('data', (chunk) => {
78
- responseData += chunk;
79
- });
80
-
81
- res.on('end', () => {
82
- const elapsed = Date.now() - startTime;
83
- console.log(`Response received in ${elapsed}ms`);
84
- console.log(`Status: ${res.statusCode}`);
85
-
86
- try {
87
- const parsed = JSON.parse(responseData);
88
- if (parsed.choices && parsed.choices[0] && parsed.choices[0].message) {
89
- const selectedWords = parsed.choices[0].message.content.trim();
90
- console.log(`Selected words: ${selectedWords}`);
91
- console.log(`✓ Test ${testNumber} PASSED`);
92
- resolve({ success: true, words: selectedWords, time: elapsed });
93
- } else {
94
- console.log(`Response structure unexpected:`, parsed);
95
- resolve({ success: false, error: 'Invalid response structure', time: elapsed });
96
- }
97
- } catch (error) {
98
- console.log(`Failed to parse response:`, error.message);
99
- console.log(`Raw response:`, responseData.substring(0, 200));
100
- resolve({ success: false, error: error.message, time: elapsed });
101
- }
102
- });
103
- });
104
-
105
- req.on('error', (error) => {
106
- const elapsed = Date.now() - startTime;
107
- console.log(`✗ Test ${testNumber} FAILED - Connection error after ${elapsed}ms`);
108
- console.log(`Error: ${error.message}`);
109
- resolve({ success: false, error: error.message, time: elapsed });
110
- });
111
-
112
- req.write(data);
113
- req.end();
114
- });
115
- }
116
-
117
- // Run stress test
118
- async function runStressTest() {
119
- console.log('Starting stress test for Gemma-3-12b on localhost:1234');
120
- console.log('Testing word selection for cloze reader game...\n');
121
-
122
- const results = [];
123
-
124
- // Test each passage
125
- for (let i = 0; i < testPassages.length; i++) {
126
- const result = await testLLMConnection(testPassages[i], i + 1);
127
- results.push(result);
128
-
129
- // Small delay between tests
130
- await new Promise(resolve => setTimeout(resolve, 500));
131
- }
132
-
133
- // Summary
134
- console.log('\n=== STRESS TEST SUMMARY ===');
135
- const successful = results.filter(r => r.success).length;
136
- const failed = results.length - successful;
137
- const avgTime = results.reduce((sum, r) => sum + r.time, 0) / results.length;
138
-
139
- console.log(`Total tests: ${results.length}`);
140
- console.log(`Successful: ${successful}`);
141
- console.log(`Failed: ${failed}`);
142
- console.log(`Average response time: ${avgTime.toFixed(0)}ms`);
143
- console.log(`Success rate: ${(successful / results.length * 100).toFixed(1)}%`);
144
-
145
- if (successful === results.length) {
146
- console.log('\n✓ All tests passed! The Gemma-3-12b server is functioning correctly for cloze reader.');
147
- } else if (successful > 0) {
148
- console.log('\n⚠ Some tests passed. The server is partially functional.');
149
- } else {
150
- console.log('\n✗ All tests failed. Please check if the server is running on port 1234.');
151
- }
152
- }
153
-
154
- // Run the test
155
- runStressTest().catch(console.error);