Spaces:

jbilcke-hf
/

ai-toolkit

Paused

App Files Files Community

ai-toolkit / CLAUDE.md

jbilcke-hf

Convert AI-Toolkit to a HF Space

8822914 3 months ago

preview code

raw

history blame contribute delete

4.2 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project Overview

	This is the AI Toolkit by Ostris, packaged as a Hugging Face Space for Docker deployment. It's a comprehensive training suite for diffusion models supporting the latest models on consumer-grade hardware. The toolkit includes both CLI and web UI interfaces for training LoRA models, particularly focused on FLUX.1 models.

	## Architecture

	### Core Structure
	- Main Entry Points:
	- `run.py` - CLI interface for running training jobs with config files
	- `flux_train_ui.py` - Gradio-based simple training interface
	- `start.sh` - Docker entry point that launches the web UI

	- Web UI (`ui/`): Next.js application with TypeScript
	- Frontend in `src/app/` with API routes
	- Background worker process for job management
	- SQLite database via Prisma for job persistence

	- Core Toolkit (`toolkit/`): Python modules for ML operations
	- Model implementations in `toolkit/models/`
	- Training processes in `jobs/process/`
	- Configuration management and data loading utilities

	- Extensions (`extensions_built_in/`): Modular training components
	- Support for various model types (FLUX, SDXL, SD 1.5, etc.)
	- Different training strategies (LoRA, fine-tuning, etc.)

	### Key Configuration
	- Training configs in `config/examples/` with YAML format
	- Docker setup supports GPU passthrough with nvidia runtime
	- Environment variables for HuggingFace tokens and authentication

	## Common Development Commands

	### Setup and Installation
	```bash
	# Python environment setup
	python3 -m venv venv
	source venv/bin/activate # or .\venv\Scripts\activate on Windows
	pip3 install --no-cache-dir torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126
	pip3 install -r requirements.txt
	```

	### Running Training Jobs
	```bash
	# CLI training with config file
	python run.py config/your_config.yml

	# Simple Gradio UI for FLUX training
	python flux_train_ui.py
	```

	### Web UI Development
	```bash
	# Development mode (from ui/ directory)
	cd ui
	npm install
	npm run dev

	# Production build and start
	npm run build_and_start

	# Database updates
	npm run update_db
	```

	### Docker Operations
	```bash
	# Run with docker-compose
	docker-compose up

	# Build custom image
	docker build -f docker/Dockerfile -t ai-toolkit .
	```

	## Authentication Requirements

	### HuggingFace Access
	- FLUX.1-dev requires accepting license at https://huggingface.co/black-forest-labs/FLUX.1-dev
	- Set `HF_TOKEN` environment variable with READ access token
	- Create `.env` file in root: `HF_TOKEN=your_key_here`

	### UI Security
	- Set `AI_TOOLKIT_AUTH` environment variable for UI authentication
	- Default password is "password" if not set

	## Training Configuration

	### Model Support
	- FLUX.1-dev: Requires HF token, non-commercial license
	- FLUX.1-schnell: Apache 2.0, needs training adapter
	- SDXL, SD 1.5: Standard Stable Diffusion models
	- Video models: Various I2V and text-to-video architectures

	### Memory Requirements
	- FLUX.1 training requires minimum 24GB VRAM
	- Use `low_vram: true` in config if running with displays attached
	- Supports various quantization options to reduce memory usage

	### Dataset Format
	- Images: JPG, JPEG, PNG (no WebP)
	- Captions: `.txt` files with same name as images
	- Use `[trigger]` placeholder in captions, replaced by `trigger_word` config
	- Images auto-resized and bucketed, no manual preprocessing needed

	## Key Files to Understand

	- `run.py:46-85` - Main training job runner and argument parsing
	- `toolkit/job.py` - Job management and configuration loading
	- `ui/src/app/api/jobs/route.ts` - API endpoints for job management
	- `config/examples/train_lora_flux_24gb.yaml` - Standard FLUX training template
	- `extensions_built_in/sd_trainer/SDTrainer.py` - Core training logic

	## Development Notes

	- Jobs run independently of UI - UI is only for management
	- Training can be stopped/resumed via checkpoints
	- Output stored in `output/` directory with samples and models
	- Extensions system allows custom training implementations
	- Multi-GPU support via accelerate library