Lamapi
/

next-4b

@@ -1,5 +1,57 @@
 ---
-language: tr
 license: mit
 tags:
 - turkish
@@ -23,22 +75,36 @@ tags:
 - machine-learning
 - ai-research
 - natural-language-processing
 - nlp
 - finetuned
 - lightweight
 - creative
 - summarization
 - question-answering
-- chat-model
 - generative-ai
-- optimized-model
 - unsloth
 - trl
 - sft
 pipeline_tag: text-generation
-metrics:
-- bleu
-- accuracy
 ---
 # 🚀 Next 4B
@@ -51,13 +117,26 @@ metrics:
 ---
 ## 📖 Overview
 **Next 4B** is a **4-billion parameter multimodal Vision-Language Model (VLM)** based on **Gemma 3**, fine-tuned to handle **both text and images** efficiently. It is **Türkiye’s first open-source vision-language model**, designed for:
 * Understanding and generating **text and image descriptions**.
 * Efficient reasoning and context-aware multimodal outputs.
-* Native Turkish support with multilingual capabilities.
 * Low-resource deployment using **8-bit quantization** for consumer-grade GPUs.
 This model is ideal for **researchers, developers, and organizations** who need a **high-performance multimodal AI** capable of **visual understanding, reasoning, and creative generation**.
@@ -102,35 +181,56 @@ This model is ideal for **researchers, developers, and organizations** who need
 ## 🚀 Installation & Usage
-### Python Example
 ```python
-from unsloth import FastModel
-from transformers import TextStreamer
 from PIL import Image
-model_path = "Lamapi/next-x1-v-7b"
-# Load 4-bit model for low VRAM
-model, tokenizer = FastModel.from_pretrained(model_path, load_in_4bit=True)
-# Example multimodal prompt
 messages = [
-    {"role": "system", "content": "You are a creative, reasoning-focused vision-language assistant."},
-    {"role": "user", "content": "Describe the content of this image and its possible context."},
 ]
-image = Image.open("example.jpg")  # Your input image
-# Prepare prompt
-prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
-streamer = TextStreamer(tokenizer, skip_prompt=True)
-inputs = tokenizer(prompt, images=[image], return_tensors="pt").to(model.device)
-# Generate multimodal output
-_ = model.generate(**inputs, streamer=streamer, max_new_tokens=300, temperature=0.7, top_p=0.9)
 ```
 ---
@@ -144,23 +244,6 @@ _ = model.generate(**inputs, streamer=streamer, max_new_tokens=300, temperature=
 | ✍️ Creative          | "Write a story based on the image content."                  |
 | 🎓 Cultural          | "Describe historical or cultural elements in the image."     |
----
-## 📊 Performance & Benchmarks
-Next-X1-V 7B has been evaluated for **text and image understanding**, reasoning, and generation:
-* **Perplexity (Turkish text):** ~12–15
-* **Tokens/sec on 4-bit consumer GPUs:** 500–1200
-* **Image captioning accuracy:** High fidelity for complex scenes
-* **Multimodal reasoning:** Consistent and coherent across images and text
-> Indicates competitive performance for a **7B multimodal model**, deployable on standard GPUs with low latency.
 ---
 ## 📄 License

 ---
+language:
+- tr
+- en
+- de
+- ka
+- el
+- ku
+- es
+- sl
+- sk
+- af
+- da
+- nl
+- fa
+- fi
+- fr
+- ga
+- hi
+- hu
+- hy
+- ja
+- kg
+- kk
+- ko
+- ky
+- la
+- lb
+- id
+- it
+- is
+- za
+- zh
+- zu
+- cs
+- vi
+- be
+- bg
+- bs
+- ne
+- mn
+- rm
+- ro
+- ru
+- te
+- th
+- tk
+- tt
+- uk
+- uz
+- ug
+- pl
+- pt
+- 'no'
 license: mit
 tags:
 - turkish
 - machine-learning
 - ai-research
 - natural-language-processing
+- language
+- multilingual
+- multimodal
 - nlp
 - finetuned
 - lightweight
 - creative
 - summarization
 - question-answering
+- chat
 - generative-ai
+- optimized
 - unsloth
 - trl
 - sft
+- chemistry
+- code
+- biology
+- finance
+- legal
+- music
+- art
+- climate
+- medical
+- agent
+- text-generation-inference
+- merge
+- dense
 pipeline_tag: text-generation
+library_name: transformers
 ---
 # 🚀 Next 4B
 ---
+## 📊 Performance & Benchmarks
+Next-X1-V 7B has been evaluated for **text and image understanding**, reasoning, and generation:
+* **Perplexity (Turkish text):** ~12–15
+* **Tokens/sec on 4-bit consumer GPUs:** 500–1200
+* **Image captioning accuracy:** High fidelity for complex scenes
+* **Multimodal reasoning:** Consistent and coherent across images and text
+> Indicates competitive performance for a **4B multimodal model**, deployable on standard GPUs with **very low latency**.
+---
 ## 📖 Overview
 **Next 4B** is a **4-billion parameter multimodal Vision-Language Model (VLM)** based on **Gemma 3**, fine-tuned to handle **both text and images** efficiently. It is **Türkiye’s first open-source vision-language model**, designed for:
 * Understanding and generating **text and image descriptions**.
 * Efficient reasoning and context-aware multimodal outputs.
+* Turkish support with multilingual capabilities.
 * Low-resource deployment using **8-bit quantization** for consumer-grade GPUs.
 This model is ideal for **researchers, developers, and organizations** who need a **high-performance multimodal AI** capable of **visual understanding, reasoning, and creative generation**.
 ## 🚀 Installation & Usage
+### Load the model (with vision).
 ```python
+from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
 from PIL import Image
+import torch
+model_id = "Lamapi/next-4b"
+model = AutoModelForCausalLM.from_pretrained(model_id)
+processor = AutoProcessor.from_pretrained(model_id) # For vision.
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+```
+### Using the vision.
+```python
+# Read image
+image = Image.open("image.jpg")
+# Create a message in chat format
 messages = [
+    {
+        "role": "user",
+        "content": [
+            {"type": "image", "image": image},
+            {"type": "text", "text": "Who is in this image?"}
+        ]
+    }
 ]
+# Prepare input with Tokenizer
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = processor(text=prompt, images=[image], return_tensors="pt")
+# Output from the model
+output = model.generate(**inputs, max_new_tokens=50)
+print(tokenizer.decode(output[0], skip_special_tokens=True))
 ```
+<div style='background-color:#222220;box-shadow:0px 0px 40px #222220;border-radius:16px;width:700px;height:100px; '>
+  <div style='background-color:rgba(15,15,15,0.7);top:10px;right:3px;border-radius:16px;border-bottom-right-radius:0px;padding:1px 10px;width:fit-content;max-width:400px;position:absolute;'>
+  <img src=''>
+    Who is in this image?
+  </div>
+  <div style='background-color:rgba(0,140,255,0.5);top:28px;right:300px;border-radius:16px;border-bottom-left-radius:0px;padding:1px 10px;width:fit-content;max-width:400px;position:absolute;'>
+  The image shows <strong>Mustafa Kemal Atatürk</strong>, the founder and first President of the Republic of Turkey.
+  </div>
+</div>
 ---
 | ✍️ Creative          | "Write a story based on the image content."                  |
 | 🎓 Cultural          | "Describe historical or cultural elements in the image."     |
 ---
 ## 📄 License