Spaces:
Running
on
Zero
Running
on
Zero
| title: VelocityLM | |
| emoji: π | |
| colorFrom: yellow | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 5.43.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| models: | |
| - gpt2 | |
| datasets: | |
| - tiiuae/falcon-refinedweb | |
| tags: | |
| - text-generation | |
| - transformer | |
| - pytorch | |
| - custom-model | |
| - llm | |
| - foundational-model | |
| short_description: FoundationalLM for fast text-generation | |
| # π€ Custom LLM - Foundational Language Model | |
| A custom-trained foundational language model with **2 billion parameters**, built with modern transformer architecture and deployed with streaming text generation capabilities. | |
| ## π Features | |
| - **Custom Architecture**: Modern transformer with RoPE (Rotary Position Embedding), RMSNorm, and SwiGLU activation | |
| - **Streaming Generation**: Real-time text generation with token-by-token streaming | |
| - **Flexible Sampling**: Configurable temperature, top-p, top-k, and repetition penalty | |
| - **ZeroGPU Integration**: Optimized for Hugging Face Spaces with GPU acceleration | |
| - **Responsive UI**: Clean, intuitive Gradio interface | |
| ## π Model Details | |
| | Specification | Value | | |
| |---------------|-------| | |
| | **Parameters** | ~2 billion | | |
| | **Architecture** | Custom Transformer | | |
| | **Context Length** | 2,048 tokens | | |
| | **Vocab Size** | 50,257 (GPT-2 tokenizer) | | |
| | **Layers** | 24 | | |
| | **Attention Heads** | 32 | | |
| | **Hidden Size** | 2,048 | | |
| | **Intermediate Size** | 8,192 | | |
| ## ποΈ Architecture Components | |
| - **RMSNorm**: Root Mean Square Layer Normalization for better training stability | |
| - **RoPE**: Rotary Position Embeddings for better length extrapolation | |
| - **SwiGLU**: Switch GLU activation function for improved performance | |
| - **Causal Attention**: Standard autoregressive attention mechanism | |
| ## π― Training Details | |
| - **Dataset**: Falcon RefinedWeb (curated web text) | |
| - **Training Steps**: 100,000 steps | |
| - **Learning Rate**: 6e-4 with warmup and decay | |
| - **Batch Size**: 32 (4 per device Γ 8 accumulation steps) | |
| - **Optimization**: AdamW with Ξ²1=0.9, Ξ²2=0.95 | |
| - **Precision**: Mixed precision (FP16) | |
| ## π οΈ Generation Parameters | |
| - **Max Tokens**: Control the length of generated text (1-1024) | |
| - **Temperature**: Sampling randomness (0.1-2.0, higher = more creative) | |
| - **Top-p**: Nucleus sampling threshold (0.1-1.0) | |
| - **Top-k**: Top-k sampling limit (0-200, 0 = disabled) | |
| - **Repetition Penalty**: Reduce repetitive text (1.0-2.0) | |
| ## π‘ Usage Tips | |
| 1. **For Creative Writing**: Use higher temperature (1.0-1.5) and top-p (0.9-0.95) | |
| 2. **For Factual Content**: Use lower temperature (0.3-0.7) and top-p (0.8-0.9) | |
| 3. **For Code Generation**: Use temperature ~0.2 with top-k filtering | |
| 4. **Longer Context**: The model handles up to 2,048 tokens of context | |
| ## π¨ Limitations | |
| - **Knowledge Cutoff**: Training data knowledge cutoff varies by source | |
| - **Biases**: May reflect biases present in training data | |
| - **Factuality**: Generated content should be verified for factual accuracy | |
| - **Context Window**: Limited to 2,048 tokens (approximately 1,500 words) | |
| ## π§ Technical Implementation | |
| The model uses a custom PyTorch implementation with: | |
| - Efficient attention mechanisms | |
| - Memory-optimized layer implementations | |
| - Streaming generation with proper token handling | |
| - GPU acceleration via ZeroGPU | |
| ## π License | |
| This project is licensed under the MIT License - see the LICENSE file for details. | |
| ## π Acknowledgments | |
| - Hugging Face for the Spaces platform and ZeroGPU infrastructure | |
| - The open-source community for transformer implementations and best practices | |
| - TII UAE for the Falcon RefinedWeb dataset | |
| --- | |
| **Note**: This is a foundational language model trained for research and educational purposes. Please use responsibly and be aware of potential biases and limitations. |