Fix: Resolve TypeError for video_processor during model loading.
Subject: Fix: Resolve TypeError for video_processor during model loading
Description:
This pull request addresses a TypeError that occurs when loading the dots-ocr model with the latest versions of the transformers library. The error message, "Received a NoneType for argument 'video_processor', but a BaseVideoProcessor was expected," is triggered because the DotsVLProcessor class inherits from a processor that now expects a video_processor attribute.
The Problem:
The current implementation of DotsVLProcessor does not explicitly handle the video_processor argument in its constructor. As the base classes in the transformers library have evolved, this argument has become a required part of the processor's initialization, leading to a NoneType being passed and causing the TypeError.
The Solution:
This has been resolved by making a minor but critical addition to the DotsVLProcessor class. By adding video_processor=None to the __init__ method, we explicitly initialize the video processor as None, satisfying the requirements of the parent class without altering the model's core OCR functionality.
The change is as follows:
class DotsVLProcessor(Qwen2_5_VLProcessor):
attributes = ["image_processor", "tokenizer"]
def __init__(self, image_processor=None, tokenizer=None, video_processor=None, chat_template=None, **kwargs):
super().__init__(image_processor, tokenizer, chat_template=chat_template)
self.image_token = "<|imgpad|>" if not hasattr(tokenizer, "image_token") else tokenizer.image_token
self.image_token_id = 151665 if not hasattr(tokenizer, "image_token_id") else tokenizer.image_token_id
This ensures that the model remains compatible with recent library updates and can be loaded without error.
The updated implementation with transformers==4.57.1 is as follows:
HF Space: https://huggingface.co/spaces/prithivMLmods/Multimodal-OCR3
|
|