Guilherme34's picture
Duplicate from babs/vlfm-v3-3B
9f6787b verified

VLFM Model: Custom audio+text model with tokenizer (expanded with <|audio|>), Whisper feature extractor, and processor.