EO-1 Vision-Language-Action Model (Initialization)
A pre-initialized vision-language-action model based on Qwen2.5-VL-3B-Instruct, specifically designed for recent Lerobot PR: https://github.com/huggingface/lerobot/pull/1971
π Quick Start
from transformers import AutoProcessor, AutoModelForCausalLM
# Load the model and processor
model = AutoModelForCausalLM.from_pretrained("IPEC-COMMUNITY/eo1-qwen2_5_vl-initial", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("IPEC-COMMUNITY/eo1-qwen2_5_vl-initial", trust_remote_code=True)
# Ready for training - no additional setup required!
π― Key Features
- Pre-configured Special Tokens: All EO-1 robotic tokens are pre-added to the vocabulary
 - Multimodal Processing: Integrated processor handles images, videos, text, robot states, and actions
 - Training-Ready: Directly loadable for fine-tuning without modifications
 - Based on Qwen2.5-VL-3B: Inherits strong vision-language understanding capabilities
 
π§ Special Tokens
The model includes pre-configured special tokens for robotic manipulation:
| Token | Purpose | 
|---|---|
<|action_start|> | 
Marks the beginning of action sequences | 
<|action_pad|> | 
Padding token for actions | 
<|action_pass|> | 
Pass-through token for actions | 
<|action_end|> | 
Marks the end of action sequences | 
<|state_start|> | 
Marks the beginning of state sequences | 
<|state_pad|> | 
Padding token for states | 
<|state_end|> | 
Marks the end of state sequences | 
<|vla|> | 
Vision-Language-Action task token | 
π Data Processing
The integrated processor handles multiple modalities:
- Images: Automatically resized to adaptive pixels
 - Videos: Automatically resized to adaptive pixels
 - Text: Standard tokenization with special token support
 - Robot States: Vectorized and tokenized
 - Actions: Vectorized and tokenized with denoising support
 
ποΈ Model Architecture
- Base Model: Qwen2.5-VL-3B-Instruct
 - Vision Encoder: Pre-trained vision transformer
 - Language Model: 3B parameter transformer
 - Action Projector: Custom layers for robotic action prediction
 - Flow Matching: Integrated denoising mechanism for action generation
 
π‘ Usage Project
- π€Lerobot: https://github.com/huggingface/lerobot/tree/main/src/lerobot/policies/eo1
 - πEO-1: https://github.com/EO-Robotics/EO-1
 
π€ Contributing
For issues, questions, or contributions, please visit our GitHub repository.
Note: This is an initialization model. For best results, fine-tune on your specific robotic task data.
- Downloads last month
 - 26
 
	Inference Providers
	NEW
	
	
	This model isn't deployed by any Inference Provider.
	π
			
		Ask for provider support