Spaces:
Running
on
Zero
Running
on
Zero
A newer version of the Gradio SDK is available:
6.0.1
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
This is a Gradio application for image generation using the Qwen-Image model with Lightning LoRA acceleration. It's designed to run on Hugging Face Spaces with GPU support, providing fast 8-step image generation with advanced text rendering capabilities.
Commands
Run the application locally
python app.py
Install dependencies
pip install -r requirements.txt
Architecture
Core Components
Model Pipeline (
app.py:130-164)- Uses
Qwen/Qwen-Imagediffusion model with custom FlowMatchEulerDiscreteScheduler - Loads Lightning LoRA weights for 8-step acceleration
- Configured for bfloat16 precision on CUDA
- Uses
Prompt Enhancement System (
app.py:41-125)polish_prompt(): Uses Hugging Face InferenceClient with Cerebras provider to enhance promptsget_caption_language(): Detects Chinese vs English promptsrewrite(): Language-specific prompt enhancement with different system prompts for Chinese/English- Requires
HF_TOKENenvironment variable for API access
Style Presets System (
app.py:16-87)load_style_presets(): Loads style presets fromstyle_presets.yamlapply_style_preset(): Applies selected style to prompts- Supports custom styles and random style selection
- Each preset includes prefix, suffix, and negative prompt components
Page Layouts System (
app.py:89-145)load_page_layouts(): Loads multi-image layouts frompage_layouts.yamlget_layout_choices(): Returns available layouts for a given number of imagesget_layout_metadata(): Extracts panel metadata (type, focus, composition) for each position- Supports 1-8 images per page with 5-6 layout variations each
- Dynamic layout selection based on number of images
- Panel Metadata System: Each panel position includes metadata that describes:
panel_type: establishing/action/closeup/dialogue/reaction/transition/detail/splashfocus: environment/character/characters/action/emotion/object/eventcomposition: wide/tall/square/portrait/landscape
- Metadata is used to guide the LLM in generating appropriate scene descriptions
Story Generation System (
app.py:147-265)generate_story_scenes(): Uses Hugging Face InferenceClient with Qwen3-235B to generate scene descriptions- Takes panel metadata as input to generate contextually appropriate content
- Adapts descriptions based on panel type, focus, and composition
- Returns structured scene data with captions and dialogue
parse_yaml_scenes(): Parses LLM output into structured scene data
Image Size Calculation (
app.py:267-330)get_image_size_for_position(): Calculates precise image dimensions based on layout aspect ratio- Uses 8px rounding for model compatibility while maintaining aspect ratio accuracy
- Ensures images fill their layout containers without floating
get_layout_position_for_image(): Retrieves position data for a specific panel
PDF Generation (
app.py:450-540)create_single_page_pdf(): Creates PDF page with images arranged per layoutcreate_multi_page_pdf(): Combines multiple pages into a single document- Uses ReportLab for high-quality PDF generation
- Preserves image quality at 95% JPEG compression
- A4 page size with flexible positioning system
- Smart filling: fills space completely when aspect ratios match (<2% difference)
Multi-Image Generation (
app.py:545-650)infer_page(): Main generation orchestrator- Generates multiple images and combines into PDF
- Progressive generation with status updates
- Seed management for reproducibility across multiple images
- Returns PDF file, preview image, and seed information
Gradio Interface (
app.py:750-900+)- Slider for selecting 1-8 images per page
- Dynamic layout dropdown that updates based on image count
- Style preset dropdown with custom style text option
- PDF download and image preview outputs
- Advanced settings for all generation parameters
Key Configuration
- Scheduler Config (
app.py:133-148): Custom configuration for FlowMatchEulerDiscreteScheduler with exponential time shifting - Aspect Ratios (
app.py:170-188): Predefined aspect ratios optimized for 1024 base resolution - Style Presets (
style_presets.yaml): Configurable style presets with prompt modifiers and negative prompts - Page Layouts (
page_layouts.yaml): Flexible layout system for 1-4 images per page - Default Settings: 8 inference steps, guidance scale 1.0, prompt enhancement enabled, 1 image per page
Environment Variables
HF_TOKEN: Required for prompt enhancement via Hugging Face InferenceClient- Used for accessing Cerebras provider for Qwen3-235B model
Key Features
- Session-based storage: Each user session gets a unique temporary directory that persists for 24 hours
- Multi-page PDF generation: Users can generate up to 128 pages in a single document
- Dynamic page addition: Click "Generate page N" to add the next page to the PDF
- Flexible layouts: Different layout options for 1-4 images per page
- Style presets: 20+ predefined artistic styles
- Automatic cleanup: Old sessions are automatically cleaned after 24 hours
Model Dependencies
- Main model:
Qwen/Qwen-Image - LoRA weights:
lightx2v/Qwen-Image-Lightning(V1.1 safetensors) - Prompt enhancement model:
Qwen/Qwen3-235B-A22B-Instruct-2507via Cerebras