Qwen3-VL-8B-Instruct (Abliterated)

This is an abliterated (uncensored) version of the Qwen3-VL-8B-Instruct multimodal vision-language model. The model has undergone abliteration to remove safety guardrails and content filtering, allowing unrestricted responses to all queries. This 8-billion parameter instruction-tuned model excels at visual question answering, image captioning, optical character recognition (OCR), and complex visual reasoning tasks.

⚠️ WARNING: This is an uncensored model variant with safety restrictions removed. Use responsibly and in compliance with applicable laws and ethical guidelines.

Model Description

Qwen3-VL-8B-Instruct (Abliterated) is a modified version of the Qwen3 Vision-Language model with content filtering removed. Key capabilities include:

  • Visual Understanding: Analyze images, charts, diagrams, screenshots, and documents
  • Multimodal Conversation: Engage in multi-turn dialogues about visual content
  • Optical Character Recognition: Extract and understand text from images
  • Visual Reasoning: Answer complex questions requiring visual analysis and logical reasoning
  • Document Understanding: Process scanned documents, forms, and structured layouts
  • Uncensored Responses: No content filtering or safety guardrails

Model Architecture: Vision Transformer encoder + Qwen3-8B language model decoder Training: Instruction-tuned on diverse vision-language tasks, then abliterated Context Length: Up to 32K tokens (text + visual tokens) Languages: Multilingual support (English, Chinese, and more) Modification: Safety layers removed through abliteration process

Repository Contents

qwen3-vl-8b-instruct/
├── qwen3-vl-8b-instruct-abliterated.safetensors  # Complete model weights (16.33 GB)
└── README.md                                      # This file

Total Repository Size: 16.33 GB (FP16 precision, single-file format)

File Details:

  • qwen3-vl-8b-instruct-abliterated.safetensors: Complete merged model in safetensors format
    • Size: 16.33 GB
    • Precision: FP16 (half precision)
    • Format: Single-file merged weights (not sharded)
    • Contains: Full vision encoder + language model + abliteration modifications

Hardware Requirements

Minimum Requirements

  • VRAM: 20 GB (FP16 inference)
  • RAM: 32 GB system memory
  • Disk Space: 20 GB free space
  • GPU: NVIDIA GPU with Compute Capability 7.0+ (V100, RTX 20/30/40 series, A100, etc.)

Recommended Requirements

  • VRAM: 24 GB+ (RTX 4090, A6000, A100 for longer sequences)
  • RAM: 64 GB system memory
  • Disk Space: 30 GB+ (for model caching and optimization)
  • GPU: NVIDIA RTX 4090, A100, or H100 for optimal performance

Optimization Options

  • INT8 Quantization: ~10 GB VRAM (with minor quality loss)
  • INT4 Quantization: ~6 GB VRAM (with moderate quality loss)
  • CPU Inference: Possible but very slow (not recommended)

Usage Examples

Installation

pip install transformers torch torchvision pillow accelerate

Basic Image Understanding

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch

# Load abliterated model from local directory
model = Qwen2VLForConditionalGeneration.from_pretrained(
    "E:\\huggingface\\qwen3-vl-8b-instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")

# Load and process image
image = Image.open("example_image.jpg")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "What objects do you see in this image?"}
        ]
    }
]

# Prepare inputs
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt", padding=True).to("cuda")

# Generate response
with torch.no_grad():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.9
    )

# Decode and print response
response = processor.batch_decode(output_ids, skip_special_tokens=True)[0]
print(response)

Note: Since this is an abliterated model stored as a single merged file, you'll need to use a compatible processor config. Use the original Qwen2-VL processor from Hugging Face for tokenization and image processing.

Multi-Turn Conversation

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "E:\\huggingface\\qwen3-vl-8b-instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")

# Multi-turn conversation
image = Image.open("chart.png")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "What type of chart is this?"}
        ]
    },
    {
        "role": "assistant",
        "content": [{"type": "text", "text": "This is a bar chart showing sales data."}]
    },
    {
        "role": "user",
        "content": [{"type": "text", "text": "What was the highest value?"}]
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to("cuda")

with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=256)

response = processor.batch_decode(output_ids, skip_special_tokens=True)[0]
print(response)

OCR and Document Understanding

from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
from PIL import Image
import torch

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "E:\\huggingface\\qwen3-vl-8b-instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")

# OCR from document
document_image = Image.open("invoice.jpg")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Extract all text from this document and identify the invoice number and total amount."}
        ]
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[document_image], return_tensors="pt").to("cuda")

with torch.no_grad():
    output_ids = model.generate(**inputs, max_new_tokens=1024, temperature=0.3)

response = processor.batch_decode(output_ids, skip_special_tokens=True)[0]
print(response)

Loading with Safetensors Library Directly

from safetensors.torch import load_file
import torch

# Load the abliterated model weights directly
weights = load_file("E:\\huggingface\\qwen3-vl-8b-instruct\\qwen3-vl-8b-instruct-abliterated.safetensors")

# Inspect model structure
print("Model layers:", list(weights.keys())[:10])  # First 10 keys
print(f"Total parameters: {sum(w.numel() for w in weights.values()):,}")

Model Specifications

Architecture Details

  • Model Type: Vision-Language Transformer (VLM) - Abliterated
  • Vision Encoder: Vision Transformer (ViT) with adaptive resolution
  • Language Model: Qwen3-8B decoder (safety layers removed)
  • Parameters: 8 billion (8B)
  • Precision: FP16 (half precision)
  • Format: SafeTensors (single merged file)
  • Framework: PyTorch / Transformers
  • Modification Type: Abliteration (safety guardrail removal)

Input Specifications

  • Image Resolution: Adaptive (up to 1024x1024 recommended)
  • Image Formats: JPEG, PNG, BMP, WebP
  • Text Context: Up to 32K tokens
  • Batch Size: Depends on VRAM (typically 1-8 images)

Generation Parameters

  • Max New Tokens: 512-2048 (depending on task)
  • Temperature: 0.1-0.9 (lower for factual tasks, higher for creative)
  • Top-p: 0.8-0.95 (nucleus sampling)
  • Top-k: 20-50 (alternative sampling method)

Supported Tasks

  • Visual Question Answering (VQA) - Uncensored
  • Image Captioning
  • Optical Character Recognition (OCR)
  • Document Understanding
  • Chart and Diagram Analysis
  • Visual Reasoning
  • Multi-turn Visual Dialogue - Uncensored
  • Scene Understanding
  • Object Detection and Counting (descriptive)

Performance Tips and Optimization

Memory Optimization

Use FP16 precision (default):

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "E:\\huggingface\\qwen3-vl-8b-instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)

INT8 Quantization (reduces VRAM to ~10GB):

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0
)

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "E:\\huggingface\\qwen3-vl-8b-instruct",
    quantization_config=quantization_config,
    device_map="auto"
)

INT4 Quantization (reduces VRAM to ~6GB):

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "E:\\huggingface\\qwen3-vl-8b-instruct",
    quantization_config=quantization_config,
    device_map="auto"
)

Inference Optimization

Use Flash Attention 2 (faster attention):

model = Qwen2VLForConditionalGeneration.from_pretrained(
    "E:\\huggingface\\qwen3-vl-8b-instruct",
    torch_dtype=torch.float16,
    attn_implementation="flash_attention_2",
    device_map="auto"
)

Enable torch.compile (PyTorch 2.0+):

model = torch.compile(model, mode="reduce-overhead")

Optimize image resolution:

  • Use lower resolution (512x512) for faster inference
  • Use higher resolution (1024x1024) for detailed OCR and document tasks

Generation Strategy

For factual/OCR tasks (deterministic):

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.1,
    top_p=0.9,
    do_sample=True
)

For creative/descriptive tasks:

outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    do_sample=True
)

For structured output:

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.3,
    top_p=0.9,
    repetition_penalty=1.1
)

Abliteration Details

What is Abliteration?

Abliteration is a technique for removing safety guardrails from language models by identifying and removing the specific layers or mechanisms responsible for content filtering and refusal behaviors. This process:

  1. Analyzes model layers to identify safety-related components
  2. Removes or neutralizes these components while preserving core capabilities
  3. Results in an "uncensored" model that responds to all queries

Implications of Abliteration:

  • ✅ No content filtering or refusal responses
  • ✅ Unrestricted responses to sensitive queries
  • ⚠️ No built-in safety mechanisms
  • ⚠️ User responsible for ethical use and compliance
  • ⚠️ May generate harmful, illegal, or unethical content if prompted

Technical Changes:

  • Safety alignment layers removed or neutralized
  • Refusal mechanisms disabled
  • Content filtering bypassed
  • Core reasoning and generation capabilities preserved

License

This model is based on Qwen3-VL-8B-Instruct, which is released under the Apache License 2.0.

Important Legal Notice:

  • The abliteration process modifies the original model
  • Use of this model must comply with the Apache 2.0 license terms
  • Users are solely responsible for ethical use and legal compliance
  • This model should not be used for illegal, harmful, or unethical purposes
  • The original developers are not responsible for misuse of this modified version

You are free to:

  • Use the model commercially (with responsibility)
  • Modify and distribute the model
  • Use for research and production applications

Requirements:

  • Provide attribution to Alibaba Cloud and the Qwen team
  • Include the Apache 2.0 license text with distributions
  • State that this is a modified (abliterated) version
  • Take full responsibility for outputs and usage

See the Apache License 2.0 for full terms.

Citation

If you use Qwen3-VL-8B-Instruct (Abliterated) in your research or applications, please cite:

@article{qwen3vl2024,
  title={Qwen3-VL: Scaling Vision-Language Models with Enhanced Instruction Following},
  author={Qwen Team},
  journal={arXiv preprint},
  year={2024},
  publisher={Alibaba Cloud}
}

Note: This is an abliterated community modification, not an official Qwen model release.

Model Card Contact

Original Model: Qwen Team, Alibaba Cloud Model Type: Vision-Language Model (Instruction-tuned, Abliterated) Modification: Community abliteration (uncensored variant) Language(s): Multilingual (English, Chinese, and more) License: Apache 2.0 (modified version)

Links and Resources

Limitations and Considerations

Known Limitations:

  • May generate incorrect or hallucinated information about images
  • Performance varies with image quality and resolution
  • May struggle with very small text or complex layouts
  • Limited understanding of highly specialized domain images
  • NO SAFETY FILTERS: Will respond to any query without ethical filtering

Ethical Considerations:

  • ⚠️ NO CONTENT FILTERING: This model has no built-in safety mechanisms
  • ⚠️ USER RESPONSIBILITY: You are fully responsible for ethical use
  • ⚠️ POTENTIAL FOR HARM: May generate harmful content if prompted
  • ⚠️ LEGAL COMPLIANCE: Ensure use complies with applicable laws
  • ⚠️ BIAS AMPLIFICATION: Uncensored models may amplify training data biases
  • Validate outputs for critical applications
  • Consider privacy implications when processing personal images
  • Use responsibly and ethically

Recommended Use Cases:

  • Research on AI safety and alignment (studying uncensored model behavior)
  • Unrestricted creative content generation
  • Analysis of censorship mechanisms in AI models
  • Educational purposes (understanding model limitations)
  • Applications where content filtering interferes with legitimate use

Not Recommended For:

  • Public-facing applications without additional safety layers
  • Use by minors or vulnerable populations
  • Automated systems without human oversight
  • Medical, legal, or safety-critical applications
  • Any illegal, harmful, or unethical purposes
  • Production systems without additional filtering mechanisms

Required Safeguards:

  • Implement application-level content filtering if needed
  • Monitor outputs for harmful content
  • Provide user warnings about uncensored nature
  • Establish clear usage policies and guidelines
  • Maintain human oversight for sensitive applications

Technical Notes

Single-File Format

This model is distributed as a single merged safetensors file rather than sharded weights:

Advantages:

  • Simpler file management (one file vs. multiple shards)
  • Easier to move and backup
  • Consistent loading process

Considerations:

  • Requires sufficient disk I/O bandwidth during loading
  • May take longer to initially load compared to parallel shard loading
  • Requires ~16GB contiguous disk space

Processor Configuration

Since this is a community-modified version, you'll need to use a compatible processor:

# Use the original Qwen2-VL processor for compatibility
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")

# Or create a custom processor config if needed
from transformers import Qwen2VLProcessor, Qwen2VLImageProcessor, Qwen2Tokenizer

image_processor = Qwen2VLImageProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
tokenizer = Qwen2Tokenizer.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
processor = Qwen2VLProcessor(image_processor=image_processor, tokenizer=tokenizer)

Compatibility Notes

  • Compatible with transformers library version 4.37.0+
  • Requires PyTorch 2.0+ for optimal performance
  • Flash Attention 2 requires separate installation: pip install flash-attn
  • BitsAndBytes quantization requires: pip install bitsandbytes

Changelog

v1.1 (Current)

  • Updated README with accurate file information
  • Added abliteration details and safety warnings
  • Documented single-file merged format
  • Added processor configuration guidance
  • Enhanced ethical considerations section

v1.0 (Initial)

  • Initial abliterated model release
  • 16.33 GB single-file safetensors format
  • Based on Qwen3-VL-8B-Instruct with safety layers removed

⚠️ FINAL WARNING: This is an uncensored AI model with all safety filters removed. Use responsibly, ethically, and in compliance with all applicable laws. You are solely responsible for how you use this model and any content it generates.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including wangkanai/qwen3-vl-8b-instruct