VelocityLM / README.md
dixisouls's picture
changed color
e0acb15

A newer version of the Gradio SDK is available: 5.49.1

Upgrade
metadata
title: VelocityLM
emoji: πŸš€
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false
license: mit
models:
  - gpt2
datasets:
  - tiiuae/falcon-refinedweb
tags:
  - text-generation
  - transformer
  - pytorch
  - custom-model
  - llm
  - foundational-model
short_description: FoundationalLM for fast text-generation

πŸ€– Custom LLM - Foundational Language Model

A custom-trained foundational language model with 2 billion parameters, built with modern transformer architecture and deployed with streaming text generation capabilities.

πŸš€ Features

  • Custom Architecture: Modern transformer with RoPE (Rotary Position Embedding), RMSNorm, and SwiGLU activation
  • Streaming Generation: Real-time text generation with token-by-token streaming
  • Flexible Sampling: Configurable temperature, top-p, top-k, and repetition penalty
  • ZeroGPU Integration: Optimized for Hugging Face Spaces with GPU acceleration
  • Responsive UI: Clean, intuitive Gradio interface

πŸ“Š Model Details

Specification Value
Parameters ~2 billion
Architecture Custom Transformer
Context Length 2,048 tokens
Vocab Size 50,257 (GPT-2 tokenizer)
Layers 24
Attention Heads 32
Hidden Size 2,048
Intermediate Size 8,192

πŸ—οΈ Architecture Components

  • RMSNorm: Root Mean Square Layer Normalization for better training stability
  • RoPE: Rotary Position Embeddings for better length extrapolation
  • SwiGLU: Switch GLU activation function for improved performance
  • Causal Attention: Standard autoregressive attention mechanism

🎯 Training Details

  • Dataset: Falcon RefinedWeb (curated web text)
  • Training Steps: 100,000 steps
  • Learning Rate: 6e-4 with warmup and decay
  • Batch Size: 32 (4 per device Γ— 8 accumulation steps)
  • Optimization: AdamW with Ξ²1=0.9, Ξ²2=0.95
  • Precision: Mixed precision (FP16)

πŸ› οΈ Generation Parameters

  • Max Tokens: Control the length of generated text (1-1024)
  • Temperature: Sampling randomness (0.1-2.0, higher = more creative)
  • Top-p: Nucleus sampling threshold (0.1-1.0)
  • Top-k: Top-k sampling limit (0-200, 0 = disabled)
  • Repetition Penalty: Reduce repetitive text (1.0-2.0)

πŸ’‘ Usage Tips

  1. For Creative Writing: Use higher temperature (1.0-1.5) and top-p (0.9-0.95)
  2. For Factual Content: Use lower temperature (0.3-0.7) and top-p (0.8-0.9)
  3. For Code Generation: Use temperature ~0.2 with top-k filtering
  4. Longer Context: The model handles up to 2,048 tokens of context

🚨 Limitations

  • Knowledge Cutoff: Training data knowledge cutoff varies by source
  • Biases: May reflect biases present in training data
  • Factuality: Generated content should be verified for factual accuracy
  • Context Window: Limited to 2,048 tokens (approximately 1,500 words)

πŸ”§ Technical Implementation

The model uses a custom PyTorch implementation with:

  • Efficient attention mechanisms
  • Memory-optimized layer implementations
  • Streaming generation with proper token handling
  • GPU acceleration via ZeroGPU

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Hugging Face for the Spaces platform and ZeroGPU infrastructure
  • The open-source community for transformer implementations and best practices
  • TII UAE for the Falcon RefinedWeb dataset

Note: This is a foundational language model trained for research and educational purposes. Please use responsibly and be aware of potential biases and limitations.