Fix: Resolve TypeError for video_processor during model loading.

#38

Subject: Fix: Resolve TypeError for video_processor during model loading

Description:

This pull request addresses a TypeError that occurs when loading the dots-ocr model with the latest versions of the transformers library. The error message, "Received a NoneType for argument 'video_processor', but a BaseVideoProcessor was expected," is triggered because the DotsVLProcessor class inherits from a processor that now expects a video_processor attribute.

The Problem:

The current implementation of DotsVLProcessor does not explicitly handle the video_processor argument in its constructor. As the base classes in the transformers library have evolved, this argument has become a required part of the processor's initialization, leading to a NoneType being passed and causing the TypeError.

The Solution:

This has been resolved by making a minor but critical addition to the DotsVLProcessor class. By adding video_processor=None to the __init__ method, we explicitly initialize the video processor as None, satisfying the requirements of the parent class without altering the model's core OCR functionality.

The change is as follows:

class DotsVLProcessor(Qwen2_5_VLProcessor):
    attributes = ["image_processor", "tokenizer"]
    def __init__(self, image_processor=None, tokenizer=None, video_processor=None, chat_template=None, **kwargs):
        super().__init__(image_processor, tokenizer, chat_template=chat_template)
        self.image_token = "<|imgpad|>" if not hasattr(tokenizer, "image_token") else tokenizer.image_token
        self.image_token_id = 151665 if not hasattr(tokenizer, "image_token_id") else tokenizer.image_token_id

This ensures that the model remains compatible with recent library updates and can be loaded without error.

The updated implementation with transformers==4.57.1 is as follows:

HF Space: https://huggingface.co/spaces/prithivMLmods/Multimodal-OCR3

Screenshot 1 Screenshot 2
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment