AiComicFactory2 / CLAUDE.md
Julian Bilcke
improve everything using AI
355629c

A newer version of the Gradio SDK is available: 6.0.1

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a Gradio application for image generation using the Qwen-Image model with Lightning LoRA acceleration. It's designed to run on Hugging Face Spaces with GPU support, providing fast 8-step image generation with advanced text rendering capabilities.

Commands

Run the application locally

python app.py

Install dependencies

pip install -r requirements.txt

Architecture

Core Components

  1. Model Pipeline (app.py:130-164)

    • Uses Qwen/Qwen-Image diffusion model with custom FlowMatchEulerDiscreteScheduler
    • Loads Lightning LoRA weights for 8-step acceleration
    • Configured for bfloat16 precision on CUDA
  2. Prompt Enhancement System (app.py:41-125)

    • polish_prompt(): Uses Hugging Face InferenceClient with Cerebras provider to enhance prompts
    • get_caption_language(): Detects Chinese vs English prompts
    • rewrite(): Language-specific prompt enhancement with different system prompts for Chinese/English
    • Requires HF_TOKEN environment variable for API access
  3. Style Presets System (app.py:16-87)

    • load_style_presets(): Loads style presets from style_presets.yaml
    • apply_style_preset(): Applies selected style to prompts
    • Supports custom styles and random style selection
    • Each preset includes prefix, suffix, and negative prompt components
  4. Page Layouts System (app.py:89-145)

    • load_page_layouts(): Loads multi-image layouts from page_layouts.yaml
    • get_layout_choices(): Returns available layouts for a given number of images
    • get_layout_metadata(): Extracts panel metadata (type, focus, composition) for each position
    • Supports 1-8 images per page with 5-6 layout variations each
    • Dynamic layout selection based on number of images
    • Panel Metadata System: Each panel position includes metadata that describes:
      • panel_type: establishing/action/closeup/dialogue/reaction/transition/detail/splash
      • focus: environment/character/characters/action/emotion/object/event
      • composition: wide/tall/square/portrait/landscape
    • Metadata is used to guide the LLM in generating appropriate scene descriptions
  5. Story Generation System (app.py:147-265)

    • generate_story_scenes(): Uses Hugging Face InferenceClient with Qwen3-235B to generate scene descriptions
    • Takes panel metadata as input to generate contextually appropriate content
    • Adapts descriptions based on panel type, focus, and composition
    • Returns structured scene data with captions and dialogue
    • parse_yaml_scenes(): Parses LLM output into structured scene data
  6. Image Size Calculation (app.py:267-330)

    • get_image_size_for_position(): Calculates precise image dimensions based on layout aspect ratio
    • Uses 8px rounding for model compatibility while maintaining aspect ratio accuracy
    • Ensures images fill their layout containers without floating
    • get_layout_position_for_image(): Retrieves position data for a specific panel
  7. PDF Generation (app.py:450-540)

    • create_single_page_pdf(): Creates PDF page with images arranged per layout
    • create_multi_page_pdf(): Combines multiple pages into a single document
    • Uses ReportLab for high-quality PDF generation
    • Preserves image quality at 95% JPEG compression
    • A4 page size with flexible positioning system
    • Smart filling: fills space completely when aspect ratios match (<2% difference)
  8. Multi-Image Generation (app.py:545-650)

    • infer_page(): Main generation orchestrator
    • Generates multiple images and combines into PDF
    • Progressive generation with status updates
    • Seed management for reproducibility across multiple images
    • Returns PDF file, preview image, and seed information
  9. Gradio Interface (app.py:750-900+)

    • Slider for selecting 1-8 images per page
    • Dynamic layout dropdown that updates based on image count
    • Style preset dropdown with custom style text option
    • PDF download and image preview outputs
    • Advanced settings for all generation parameters

Key Configuration

  • Scheduler Config (app.py:133-148): Custom configuration for FlowMatchEulerDiscreteScheduler with exponential time shifting
  • Aspect Ratios (app.py:170-188): Predefined aspect ratios optimized for 1024 base resolution
  • Style Presets (style_presets.yaml): Configurable style presets with prompt modifiers and negative prompts
  • Page Layouts (page_layouts.yaml): Flexible layout system for 1-4 images per page
  • Default Settings: 8 inference steps, guidance scale 1.0, prompt enhancement enabled, 1 image per page

Environment Variables

  • HF_TOKEN: Required for prompt enhancement via Hugging Face InferenceClient
  • Used for accessing Cerebras provider for Qwen3-235B model

Key Features

  • Session-based storage: Each user session gets a unique temporary directory that persists for 24 hours
  • Multi-page PDF generation: Users can generate up to 128 pages in a single document
  • Dynamic page addition: Click "Generate page N" to add the next page to the PDF
  • Flexible layouts: Different layout options for 1-4 images per page
  • Style presets: 20+ predefined artistic styles
  • Automatic cleanup: Old sessions are automatically cleaned after 24 hours

Model Dependencies

  • Main model: Qwen/Qwen-Image
  • LoRA weights: lightx2v/Qwen-Image-Lightning (V1.1 safetensors)
  • Prompt enhancement model: Qwen/Qwen3-235B-A22B-Instruct-2507 via Cerebras