Daniil Vodolazsky's picture

35 3

Daniil Vodolazsky

s231644

·

AI & ML interests

None yet

Organizations

upvoted 5 papers 7 months ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14 • 117

API Agents vs. GUI Agents: Divergence and Convergence

Paper • 2503.11069 • Published Mar 14 • 37

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Paper • 2503.13964 • Published Mar 18 • 20

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 166

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Paper • 2503.21460 • Published Mar 27 • 83

upvoted 3 papers 9 months ago

Search-o1: Agentic Search-Enhanced Large Reasoning Models

Paper • 2501.05366 • Published Jan 9 • 102

MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents

Paper • 2501.08828 • Published Jan 15 • 31

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 298

upvoted 5 papers 11 months ago

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 159

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Paper • 2412.07626 • Published Dec 10, 2024 • 27

POINTS1.5: Building a Vision-Language Model towards Real World Applications

Paper • 2412.08443 • Published Dec 11, 2024 • 38

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 98

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 121

upvoted 4 papers about 1 year ago

StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published Oct 11, 2024 • 47

Toward General Instruction-Following Alignment for Retrieval-Augmented Generation

Paper • 2410.09584 • Published Oct 12, 2024 • 49

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Paper • 2410.10563 • Published Oct 14, 2024 • 38

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Paper • 2410.10594 • Published Oct 14, 2024 • 29

upvoted a collection about 1 year ago

Molmo

Artifacts for open multimodal language models. • 5 items • Updated Apr 30 • 308

upvoted 2 papers about 1 year ago

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3, 2024 • 83

LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models

Paper • 2409.00509 • Published Aug 31, 2024 • 42