@DmitryRyumin on Hugging Face: "🚀👁️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟👁️🚀 📄 Title: Token…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

DmitryRyumin

posted an update 9 days ago

Post

2927

🚀👁️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟👁️🚀
📄 Title: Token Activation Map to Visually Explain Multimodal LLMs 🔝

📝 Description: The Token Activation Map (TAM) is an advanced explainability method for multimodal LLMs. Using causal inference and a Rank Gaussian Filter, TAM reveals token-level interactions and eliminates redundant activations. The result is clearer, high-quality visualizations that enhance understanding of object localization, reasoning and multimodal alignment across models.

👥 Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, and Xiaomeng Li

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Token Activation Map to Visually Explain Multimodal LLMs (2506.23270)

📁 Repository: https://github.com/xmed-lab/TAM

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Multi-Modal Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/multi-modal-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #TokenActivationMap #TAM #CausalInference #VisualReasoning #Multimodal #Explainability #VisionLanguage #LLM #XAI #AI #ICCV2025 #ResearchHighlight

Sgpneck

9 days ago

Fascinating work ,,, TAM looks like a big step toward making multimodal LLMs more interpretable. Excited to see how it performs across different architectures and real-world datasets.

KasperDS

5 days ago

This is great! In my experience, half of the struggle implementing vision tasks and multimodal tasks in real-world, industrial use cases, is the interpretability. People don't trust what can't be properly visualised, so this is a great advancement.

In this post