Spaces:
Running
on
Zero
Running
on
Zero
A newer version of the Gradio SDK is available:
5.49.1
metadata
title: VelocityLM
emoji: π
colorFrom: yellow
colorTo: red
sdk: gradio
sdk_version: 5.43.1
app_file: app.py
pinned: false
license: mit
models:
- gpt2
datasets:
- tiiuae/falcon-refinedweb
tags:
- text-generation
- transformer
- pytorch
- custom-model
- llm
- foundational-model
short_description: FoundationalLM for fast text-generation
π€ Custom LLM - Foundational Language Model
A custom-trained foundational language model with 2 billion parameters, built with modern transformer architecture and deployed with streaming text generation capabilities.
π Features
- Custom Architecture: Modern transformer with RoPE (Rotary Position Embedding), RMSNorm, and SwiGLU activation
- Streaming Generation: Real-time text generation with token-by-token streaming
- Flexible Sampling: Configurable temperature, top-p, top-k, and repetition penalty
- ZeroGPU Integration: Optimized for Hugging Face Spaces with GPU acceleration
- Responsive UI: Clean, intuitive Gradio interface
π Model Details
| Specification | Value |
|---|---|
| Parameters | ~2 billion |
| Architecture | Custom Transformer |
| Context Length | 2,048 tokens |
| Vocab Size | 50,257 (GPT-2 tokenizer) |
| Layers | 24 |
| Attention Heads | 32 |
| Hidden Size | 2,048 |
| Intermediate Size | 8,192 |
ποΈ Architecture Components
- RMSNorm: Root Mean Square Layer Normalization for better training stability
- RoPE: Rotary Position Embeddings for better length extrapolation
- SwiGLU: Switch GLU activation function for improved performance
- Causal Attention: Standard autoregressive attention mechanism
π― Training Details
- Dataset: Falcon RefinedWeb (curated web text)
- Training Steps: 100,000 steps
- Learning Rate: 6e-4 with warmup and decay
- Batch Size: 32 (4 per device Γ 8 accumulation steps)
- Optimization: AdamW with Ξ²1=0.9, Ξ²2=0.95
- Precision: Mixed precision (FP16)
π οΈ Generation Parameters
- Max Tokens: Control the length of generated text (1-1024)
- Temperature: Sampling randomness (0.1-2.0, higher = more creative)
- Top-p: Nucleus sampling threshold (0.1-1.0)
- Top-k: Top-k sampling limit (0-200, 0 = disabled)
- Repetition Penalty: Reduce repetitive text (1.0-2.0)
π‘ Usage Tips
- For Creative Writing: Use higher temperature (1.0-1.5) and top-p (0.9-0.95)
- For Factual Content: Use lower temperature (0.3-0.7) and top-p (0.8-0.9)
- For Code Generation: Use temperature ~0.2 with top-k filtering
- Longer Context: The model handles up to 2,048 tokens of context
π¨ Limitations
- Knowledge Cutoff: Training data knowledge cutoff varies by source
- Biases: May reflect biases present in training data
- Factuality: Generated content should be verified for factual accuracy
- Context Window: Limited to 2,048 tokens (approximately 1,500 words)
π§ Technical Implementation
The model uses a custom PyTorch implementation with:
- Efficient attention mechanisms
- Memory-optimized layer implementations
- Streaming generation with proper token handling
- GPU acceleration via ZeroGPU
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Acknowledgments
- Hugging Face for the Spaces platform and ZeroGPU infrastructure
- The open-source community for transformer implementations and best practices
- TII UAE for the Falcon RefinedWeb dataset
Note: This is a foundational language model trained for research and educational purposes. Please use responsibly and be aware of potential biases and limitations.