Spaces:

dixisouls
/

VelocityLM

Running on Zero

App Files Files Community

VelocityLM / README.md

dixisouls

changed color

e0acb15 3 months ago

preview code

raw

history blame contribute delete

3.7 kB

	---
	title: VelocityLM
	emoji: 🚀
	colorFrom: yellow
	colorTo: red
	sdk: gradio
	sdk_version: 5.43.1
	app_file: app.py
	pinned: false
	license: mit
	models:
	- gpt2
	datasets:
	- tiiuae/falcon-refinedweb
	tags:
	- text-generation
	- transformer
	- pytorch
	- custom-model
	- llm
	- foundational-model
	short_description: FoundationalLM for fast text-generation
	---

	# 🤖 Custom LLM - Foundational Language Model

	A custom-trained foundational language model with 2 billion parameters, built with modern transformer architecture and deployed with streaming text generation capabilities.

	## 🚀 Features

	- Custom Architecture: Modern transformer with RoPE (Rotary Position Embedding), RMSNorm, and SwiGLU activation
	- Streaming Generation: Real-time text generation with token-by-token streaming
	- Flexible Sampling: Configurable temperature, top-p, top-k, and repetition penalty
	- ZeroGPU Integration: Optimized for Hugging Face Spaces with GPU acceleration
	- Responsive UI: Clean, intuitive Gradio interface

	## 📊 Model Details

	\| Specification \| Value \|
	\|---------------\|-------\|
	\| Parameters \| ~2 billion \|
	\| Architecture \| Custom Transformer \|
	\| Context Length \| 2,048 tokens \|
	\| Vocab Size \| 50,257 (GPT-2 tokenizer) \|
	\| Layers \| 24 \|
	\| Attention Heads \| 32 \|
	\| Hidden Size \| 2,048 \|
	\| Intermediate Size \| 8,192 \|

	## 🏗️ Architecture Components

	- RMSNorm: Root Mean Square Layer Normalization for better training stability
	- RoPE: Rotary Position Embeddings for better length extrapolation
	- SwiGLU: Switch GLU activation function for improved performance
	- Causal Attention: Standard autoregressive attention mechanism

	## 🎯 Training Details

	- Dataset: Falcon RefinedWeb (curated web text)
	- Training Steps: 100,000 steps
	- Learning Rate: 6e-4 with warmup and decay
	- Batch Size: 32 (4 per device × 8 accumulation steps)
	- Optimization: AdamW with β1=0.9, β2=0.95
	- Precision: Mixed precision (FP16)

	## 🛠️ Generation Parameters

	- Max Tokens: Control the length of generated text (1-1024)
	- Temperature: Sampling randomness (0.1-2.0, higher = more creative)
	- Top-p: Nucleus sampling threshold (0.1-1.0)
	- Top-k: Top-k sampling limit (0-200, 0 = disabled)
	- Repetition Penalty: Reduce repetitive text (1.0-2.0)

	## 💡 Usage Tips

	1. For Creative Writing: Use higher temperature (1.0-1.5) and top-p (0.9-0.95)
	2. For Factual Content: Use lower temperature (0.3-0.7) and top-p (0.8-0.9)
	3. For Code Generation: Use temperature ~0.2 with top-k filtering
	4. Longer Context: The model handles up to 2,048 tokens of context

	## 🚨 Limitations

	- Knowledge Cutoff: Training data knowledge cutoff varies by source
	- Biases: May reflect biases present in training data
	- Factuality: Generated content should be verified for factual accuracy
	- Context Window: Limited to 2,048 tokens (approximately 1,500 words)

	## 🔧 Technical Implementation

	The model uses a custom PyTorch implementation with:
	- Efficient attention mechanisms
	- Memory-optimized layer implementations
	- Streaming generation with proper token handling
	- GPU acceleration via ZeroGPU

	## 📝 License

	This project is licensed under the MIT License - see the LICENSE file for details.

	## 🙏 Acknowledgments

	- Hugging Face for the Spaces platform and ZeroGPU infrastructure
	- The open-source community for transformer implementations and best practices
	- TII UAE for the Falcon RefinedWeb dataset

	---

	Note: This is a foundational language model trained for research and educational purposes. Please use responsibly and be aware of potential biases and limitations.