Glazkov/qwen2.5-vl-table-extraction-ru-v0.1

Fine-tuned Qwen2.5-VL-3B model for extracting structured data from financial and economic table images. This model has been trained on synthetic table data to convert table images into JSON format with parameter, date, value, and measurement fields.

Model Details

  • Base Model: Qwen/Qwen2.5-VL-3B-Instruct
  • Model Type: Vision-Language Model (VLM)
  • Task: Table Extraction from Images
  • Architecture: Qwen2.5-VL with LoRA fine-tuning

Intended Use

This model is designed to extract structured data from financial and economic table images. It can handle various table formats including:

  • Simple tables with rows and columns
  • Tables with additional notes/comments
  • Pivot table layouts
  • Tables with multi-level headers

How to Use

Installation

pip install transformers torch accelerate Pillow qwen-vl-utils

Basic Usage

from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image
import torch

model_id = "Glazkov/qwen2.5-vl-table-extraction-ru-v0.1"
model = AutoModelForImageTextToText.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained(model_id)

# Load image
image = Image.open("table_image.png")

# Create conversation
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Extract the data from this table image and return it as a JSON array where each object has the keys: 'parameter', 'date', 'value', and 'measurement'."},
        ],
    }
]

# Process and generate
text = processor.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)
image_inputs, _ = process_vision_info(conversation)
inputs = processor(text=[text], images=image_inputs, return_tensors="pt", padding=True)

# Generate
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=4096)

response = processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0]
print(response)

Model Input

The model expects:

  • Image: Table images in any common format (PNG, JPEG, etc.)
  • Text Prompt: Instructions for table extraction
  • Output Format: JSON array with parameter, date, value, and measurement fields

Model Output

The model generates structured JSON output in the following format:

[
    {
        "parameter": "指标名称",
        "date": "年份",
        "value": "数值",
        "measurement": "单位"
    }
]

Training Details

Training Data

The model was trained on synthetic financial and economic table data with various table formats including simple tables, tables with extra columns, pivot tables, and combined header tables. Training data includes tables in multiple languages (English, Russian, Chinese) with realistic financial metrics and formatting.

Limitations

  • The model is specifically trained for financial and economic tables
  • Performance may vary with table layouts not seen during training
  • Handwritten text or highly stylized tables may be challenging
  • The model assumes tables contain structured financial/economic data

Ethical Considerations

This model is intended for legitimate document processing and data extraction tasks. Users should ensure they have appropriate rights to process any documents and comply with relevant data protection regulations.

Technical Specifications

  • Framework: PyTorch
  • Hardware: Trained on GPU with mixed precision
  • Quantization: 4-bit quantization support available
  • Memory Requirements: ~8GB GPU memory (4-bit quantized), ~16GB GPU memory (full precision)

Citation

If you use this model in your research or applications, please cite the original Qwen2.5-VL model and acknowledge any specific modifications made during fine-tuning.

Downloads last month
138
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support