mdj1412/Qwen2.5-0.5B-tuned_numinamath_cot-lr_5e_6-epoch_3-packing_f-min_qwen_chatml-setup_two_tokens Updated 16 days ago
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Paper • 2507.07990 • Published Jul 10, 2025 • 45
Teaching Metric Distance to Autoregressive Multimodal Foundational Models Paper • 2503.02379 • Published Mar 4, 2025 • 4