Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
DmitryRyuminΒ 
posted an update 9 days ago
Post
2927
πŸš€πŸ‘οΈπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ‘οΈπŸš€
πŸ“„ Title: Token Activation Map to Visually Explain Multimodal LLMs πŸ”

πŸ“ Description: The Token Activation Map (TAM) is an advanced explainability method for multimodal LLMs. Using causal inference and a Rank Gaussian Filter, TAM reveals token-level interactions and eliminates redundant activations. The result is clearer, high-quality visualizations that enhance understanding of object localization, reasoning and multimodal alignment across models.

πŸ‘₯ Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, and Xiaomeng Li

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Token Activation Map to Visually Explain Multimodal LLMs (2506.23270)

πŸ“ Repository: https://github.com/xmed-lab/TAM

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Multi-Modal Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/multi-modal-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #TokenActivationMap #TAM #CausalInference #VisualReasoning #Multimodal #Explainability #VisionLanguage #LLM #XAI #AI #ICCV2025 #ResearchHighlight

Fascinating work ,,, TAM looks like a big step toward making multimodal LLMs more interpretable. Excited to see how it performs across different architectures and real-world datasets.

This is great! In my experience, half of the struggle implementing vision tasks and multimodal tasks in real-world, industrial use cases, is the interpretability. People don't trust what can't be properly visualised, so this is a great advancement.