Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,127 +11,78 @@ thumbnail: >-
|
|
| 11 |
|
| 12 |
# Cloze Reader
|
| 13 |
|
| 14 |
-
##
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
-
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
-
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
-
|
| 31 |
|
| 32 |
-
**
|
| 33 |
|
| 34 |
-
**
|
| 35 |
|
| 36 |
-
**
|
| 37 |
|
| 38 |
-
**
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
-
|
|
|
|
| 43 |
|
| 44 |
-
|
|
|
|
| 45 |
|
| 46 |
-
|
| 47 |
-
- **Ondov et al. (2024, NAACL)** argue: "The cloze training objective of Masked Language Models makes them a natural choice for generating plausible distractors for human cloze questions"
|
| 48 |
-
- **Zhang & Hashimoto (2021)** analyzed the inductive biases of masked tokens, showing that models learn statistical and syntactic dependencies through the same mechanisms cloze tests measure in humans
|
| 49 |
|
| 50 |
-
|
| 51 |
|
| 52 |
-
##
|
| 53 |
-
|
| 54 |
-
Cloze Reader doesn't resolve these tensions. It stages them. Through vintage aesthetics and classic texts, it creates a space where the convergence of educational assessment and machine learning becomes palpable. You're playing a literacy game designed by an algorithm that learned literacy by playing the same game billions of times. Every passage is a historical text processed by a model trained on historical texts. Every hint comes from a system that doesn't "understand" in any human sense but can nonetheless guide you toward understanding.
|
| 55 |
-
|
| 56 |
-
The experience raises more questions than it answers. Is this pedagogy or pattern replication? Assessment or performance? Human learning or collaborative prediction with a statistical engine? These aren't rhetorical questions—they're open empirical questions about what education looks like when the tools we use to measure learning are built from the same processes we're trying to measure.
|
| 57 |
-
|
| 58 |
-
## How It Works
|
| 59 |
-
|
| 60 |
-
**Single-Model Architecture:** The system uses Google's Gemma-3-27b model for all operations—analyzing passages, selecting words to mask, generating contextual hints, and powering the chat interface. The model handles both assessment design and pedagogical guidance through the same algorithmic system.
|
| 61 |
-
|
| 62 |
-
**Progressive Levels:** The game implements a level system (1-5 with 1 blank, 6-10 with 2 blanks, 11+ with 3 blanks) that scaffolds difficulty through word length constraints, historical period selection, and hint disclosure. Early levels use 1900s texts and show first+last letters; advanced levels draw from any era and provide only first letters. Each round presents two passages from different books, requiring consistent performance across rounds before advancing.
|
| 63 |
-
|
| 64 |
-
**Serendipitous Selection:** Passages stream directly from Hugging Face's Project Gutenberg dataset. The model selects words based on its training rather than curricular logic—sometimes choosing obvious vocabulary, sometimes obscure terms, sometimes generating exercises that are trivially easy or frustratingly hard. This unpredictability is a feature: it reveals how algorithmic assessment differs from human-designed pedagogy.
|
| 65 |
-
|
| 66 |
-
**Chat as Scaffold:** Click the 💬 icon beside any blank to engage the model in conversation. It attempts to guide you through Socratic questioning, semantic clues, and contextual hints—replicating what a tutor might do, constrained by what a language model trained on text prediction can actually accomplish.
|
| 67 |
-
|
| 68 |
-
The system filters out dictionaries, technical documentation, and poetry—ensuring narrative prose where blanks are theoretically inferable from context, even if the model's choices sometimes suggest otherwise.
|
| 69 |
-
|
| 70 |
-
## Technology
|
| 71 |
-
|
| 72 |
-
**Vanilla JavaScript, No Build Step:** The application runs entirely in the browser using ES6 modules—no webpack, no bundler, no compilation. This architectural choice mirrors the project's conceptual interests: keeping the machinery visible and modifiable rather than obscured behind layers of tooling. A minimal FastAPI backend serves static files and injects API keys; everything else happens client-side.
|
| 73 |
-
|
| 74 |
-
**Open-Weight Models:** Uses Google's Gemma-3-27b model (27 billion parameters) via OpenRouter, or alternatively connects to local LLM servers (LM Studio, etc.) on port 1234 with smaller models like Gemma-3-12b. The choice of open-weight models is deliberate: these systems can be downloaded, inspected, run locally, modified. When assessment becomes algorithmic, transparency about the algorithm matters. You can examine exactly which model is generating your exercises, run the same models yourself, experiment with alternatives.
|
| 75 |
-
|
| 76 |
-
**Streaming from Public Archives:** Book data streams directly from Hugging Face's mirror of Project Gutenberg's corpus—public domain texts, open dataset infrastructure, no proprietary content libraries. The entire pipeline from literature to exercises relies on openly accessible resources, making the system reproducible and auditable.
|
| 77 |
-
|
| 78 |
-
## Running Locally with Docker
|
| 79 |
-
|
| 80 |
-
To run the Cloze Reader application locally using Docker:
|
| 81 |
-
|
| 82 |
-
1. **Build the Docker image**:
|
| 83 |
-
```bash
|
| 84 |
-
docker build -t cloze-reader .
|
| 85 |
-
```
|
| 86 |
-
|
| 87 |
-
2. **Run the container**:
|
| 88 |
-
```bash
|
| 89 |
-
docker run -p 7860:7860 cloze-reader
|
| 90 |
-
```
|
| 91 |
|
| 92 |
-
|
| 93 |
-
Open your browser and navigate to `http://localhost:7860`
|
| 94 |
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
|
| 99 |
-
|
| 100 |
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
make dev # or python3 local-server.py 8000
|
| 108 |
-
```
|
| 109 |
-
3. **Access with local LLM**:
|
| 110 |
-
- Navigate to `http://localhost:8000/index.html?local=true`
|
| 111 |
-
- The `?local=true` parameter switches from OpenRouter to your local LLM
|
| 112 |
-
|
| 113 |
-
### Local LLM Features
|
| 114 |
-
- **No API key required** - works entirely offline with your local model
|
| 115 |
-
- **Automatic response cleaning** - handles local LLM output artifacts
|
| 116 |
-
- **Compatible with LM Studio** and other OpenAI-compatible local servers
|
| 117 |
-
- **Same game experience** - all features work identically to cloud version
|
| 118 |
-
|
| 119 |
-
### Testing Local Integration
|
| 120 |
-
- Test page: `http://localhost:8000/test-local-llm.html?local=true`
|
| 121 |
-
- Stress test script: `node test-local-llm.js`
|
| 122 |
-
- Direct integration test available in test files
|
| 123 |
|
| 124 |
## Architecture
|
| 125 |
-
This is a **vanilla JavaScript modular application** with no build step. Key architectural patterns:
|
| 126 |
|
| 127 |
**Module Organization:**
|
| 128 |
-
- `app.js` -
|
| 129 |
-
- `clozeGameEngine.js` -
|
| 130 |
-
- `bookDataService.js` -
|
| 131 |
-
- `aiService.js` - OpenRouter API integration
|
| 132 |
-
- `chatInterface.js` - Modal-based chat UI
|
| 133 |
-
- `conversationManager.js` - AI conversation state management
|
| 134 |
- `welcomeOverlay.js` - First-time user onboarding
|
| 135 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 136 |
---
|
| 137 |
[milwright](https://huggingface.co/milwright), *Zach Muhlbauer*, CUNY Graduate Center
|
|
|
|
| 11 |
|
| 12 |
# Cloze Reader
|
| 13 |
|
| 14 |
+
## Overview
|
| 15 |
|
| 16 |
+
Cloze Reader transforms classic literature into interactive vocabulary exercises using AI-powered word selection and hint generation. The application uses Google's Gemma-3-27B model to analyze passages from Project Gutenberg, select contextually appropriate blanks, and provide adaptive guidance through a chat interface.
|
| 17 |
|
| 18 |
+
## Background
|
| 19 |
|
| 20 |
+
The cloze procedure, introduced by Wilson Taylor in 1953, measures reading comprehension by having readers fill in deleted words from passages. BERT and subsequent masked language models use the same fundamental technique as their training objective: predict missing tokens from context. Cloze Reader closes this loop by using models trained on masked prediction to generate cloze exercises for human learners.
|
| 21 |
|
| 22 |
+
## Features
|
| 23 |
|
| 24 |
+
**Progressive Difficulty:** Levels 1-5 present single blanks with full hints from recent texts; levels 6-10 add multiple blanks with partial hints; levels 11+ use historical texts with minimal hints. Players must complete two passages per round before advancing.
|
| 25 |
|
| 26 |
+
**Interactive Chat:** Each blank includes a chat interface providing contextual hints through Socratic questioning and semantic guidance.
|
| 27 |
|
| 28 |
+
**Public Domain Content:** All passages stream from Hugging Face's Project Gutenberg dataset, filtered to exclude dictionaries, technical documentation, and poetry.
|
| 29 |
|
| 30 |
+
## Technology Stack
|
| 31 |
|
| 32 |
+
**Frontend:** Vanilla JavaScript with ES6 modules, no build tooling. The application runs entirely in the browser.
|
| 33 |
|
| 34 |
+
**Backend:** Minimal FastAPI server for static file serving and API key injection.
|
| 35 |
|
| 36 |
+
**Models:** Google Gemma-3-27B via OpenRouter, with support for local LLM servers on port 1234.
|
| 37 |
|
| 38 |
+
**Data Source:** Hugging Face Datasets API streaming from Project Gutenberg corpus.
|
| 39 |
|
| 40 |
+
## Running with Docker
|
| 41 |
|
| 42 |
+
# Build the image
|
| 43 |
+
docker build -t cloze-reader .
|
| 44 |
|
| 45 |
+
# Run the container
|
| 46 |
+
docker run -p 7860:7860 cloze-reader
|
| 47 |
|
| 48 |
+
# Access at http://localhost:7860
|
|
|
|
|
|
|
| 49 |
|
| 50 |
+
**Prerequisites:** Docker installed, port 7860 available.
|
| 51 |
|
| 52 |
+
## Local LLM Integration
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
+
Run with a local LLM server instead of OpenRouter:
|
|
|
|
| 55 |
|
| 56 |
+
# Start local LLM server on port 1234 (e.g., LM Studio with Gemma-3-27b)
|
| 57 |
+
# Run development server
|
| 58 |
+
make dev # or python3 local-server.py 8000
|
| 59 |
|
| 60 |
+
# Access at http://localhost:8000/index.html?local=true
|
| 61 |
|
| 62 |
+
**Features:**
|
| 63 |
+
- No API key required
|
| 64 |
+
- Offline operation
|
| 65 |
+
- Automatic response cleaning for local LLM output
|
| 66 |
+
- Compatible with LM Studio and OpenAI-compatible servers
|
| 67 |
+
- Testing available at http://localhost:8000/test-local-llm.html?local=true
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
## Architecture
|
|
|
|
| 70 |
|
| 71 |
**Module Organization:**
|
| 72 |
+
- `app.js` - Application controller and UI state management
|
| 73 |
+
- `clozeGameEngine.js` - Game logic, word selection, scoring
|
| 74 |
+
- `bookDataService.js` - Book data fetching from Hugging Face
|
| 75 |
+
- `aiService.js` - OpenRouter API integration
|
| 76 |
+
- `chatInterface.js` - Modal-based chat UI
|
| 77 |
+
- `conversationManager.js` - AI conversation state management
|
| 78 |
- `welcomeOverlay.js` - First-time user onboarding
|
| 79 |
|
| 80 |
+
## Research Context
|
| 81 |
+
|
| 82 |
+
Recent work connects cloze assessment and masked language modeling:
|
| 83 |
+
- Matsumori et al. (2023) developed CLOZER using masked language models for L2 English cloze question generation
|
| 84 |
+
- Ondov et al. (2024, NAACL) demonstrated masked language models as natural generators for cloze distractors
|
| 85 |
+
- Zhang & Hashimoto (2021) analyzed inductive biases in masked tokens, showing models learn statistical and syntactic dependencies that cloze tests measure in humans
|
| 86 |
+
|
| 87 |
---
|
| 88 |
[milwright](https://huggingface.co/milwright), *Zach Muhlbauer*, CUNY Graduate Center
|