Glazkov/qwen2.5-vl-table-extraction-ru-v0.1

Fine-tuned Qwen2.5-VL-3B model for extracting structured data from financial and economic table images. This model has been trained on synthetic table data to convert table images into JSON format with parameter, date, value, and measurement fields.

Model Details

Base Model: Qwen/Qwen2.5-VL-3B-Instruct
Model Type: Vision-Language Model (VLM)
Task: Table Extraction from Images
Architecture: Qwen2.5-VL with LoRA fine-tuning

Intended Use

This model is designed to extract structured data from financial and economic table images. It can handle various table formats including:

Simple tables with rows and columns
Tables with additional notes/comments
Pivot table layouts
Tables with multi-level headers

How to Use

Installation

pip install transformers torch accelerate Pillow qwen-vl-utils

Basic Usage

from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image
import torch

model_id = "Glazkov/qwen2.5-vl-table-extraction-ru-v0.1"
model = AutoModelForImageTextToText.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained(model_id)

# Load image
image = Image.open("table_image.png")

# Create conversation
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Extract the data from this table image and return it as a JSON array where each object has the keys: 'parameter', 'date', 'value', and 'measurement'."},
        ],
    }
]

# Process and generate
text = processor.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)
image_inputs, _ = process_vision_info(conversation)
inputs = processor(text=[text], images=image_inputs, return_tensors="pt", padding=True)

# Generate
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=4096)

response = processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0]
print(response)

Model Input

The model expects:

Image: Table images in any common format (PNG, JPEG, etc.)
Text Prompt: Instructions for table extraction
Output Format: JSON array with parameter, date, value, and measurement fields

Model Output

The model generates structured JSON output in the following format:

[
    {
        "parameter": "指标名称",
        "date": "年份",
        "value": "数值",
        "measurement": "单位"
    }
]

Training Details

Training Data

The model was trained on synthetic financial and economic table data with various table formats including simple tables, tables with extra columns, pivot tables, and combined header tables. Training data includes tables in multiple languages (English, Russian, Chinese) with realistic financial metrics and formatting.

Limitations

The model is specifically trained for financial and economic tables
Performance may vary with table layouts not seen during training
Handwritten text or highly stylized tables may be challenging
The model assumes tables contain structured financial/economic data

Ethical Considerations

This model is intended for legitimate document processing and data extraction tasks. Users should ensure they have appropriate rights to process any documents and comply with relevant data protection regulations.

Technical Specifications

Framework: PyTorch
Hardware: Trained on GPU with mixed precision
Quantization: 4-bit quantization support available
Memory Requirements: ~8GB GPU memory (4-bit quantized), ~16GB GPU memory (full precision)

Citation

If you use this model in your research or applications, please cite the original Qwen2.5-VL model and acknowledge any specific modifications made during fine-tuning.

Downloads last month: 138

Safetensors

Model size

4B params

Tensor type

BF16