Lamapi
/

next-4b

@@ -143,45 +143,9 @@ This model is ideal for **researchers, developers, and organizations** who need
 ---
-## 🎯 Goals
-1. **Multimodal Intelligence:** Understand and reason over images and text.
-2. **Efficiency:** Run on modest GPUs using 8-bit quantization.
-3. **Accessibility:** Open-source availability for research and applications.
-4. **Cultural Relevance:** Optimized for Turkish language and context while remaining multilingual.
----
-## ✨ Key Features
-| Feature                           | Description                                                             |
-| --------------------------------- | ----------------------------------------------------------------------- |
-| 🔋 Efficient Architecture         | Optimized for low VRAM; supports 8-bit quantization for consumer GPUs.  |
-| 🖼️ Vision-Language Capable       | Understands images, captions them, and performs visual reasoning tasks. |
-| 🇹🇷 Multilingual & Turkish-Ready | Handles complex Turkish text with high accuracy.                        |
-| 🧠 Advanced Reasoning             | Supports logical and analytical reasoning for both text and images.     |
-| 📊 Consistent & Reliable Outputs  | Reproducible responses across multiple runs.                            |
-| 🌍 Open Source                    | Transparent, community-driven, and research-friendly.                   |
----
-## 📐 Model Specifications
-| Specification      | Details                                                                            |
-| ------------------ | ---------------------------------------------------------------------------------- |
-| Base Model         | Gemma 3                                                                       |
-| Parameter Count    | 4 Billion                                                                          |
-| Architecture       | Transformer, causal LLM + Vision Encoder                                           |
-| Fine-Tuning Method | Instruction & multimodal fine-tuning (SFT) on Turkish and multilingual datasets    |
-| Optimizations      | Q8_0, F16, F32 quantizations for low VRAM and high VRAM usage                       |
-| Modalities         | Text & Image                                                                       |
-| Use Cases          | Image captioning, multimodal QA, text generation, reasoning, creative storytelling |
----
 ## 🚀 Installation & Usage
-### Load the model (with vision).
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
@@ -189,26 +153,23 @@ from PIL import Image
 import torch
 model_id = "Lamapi/next-4b"
 model = AutoModelForCausalLM.from_pretrained(model_id)
 processor = AutoProcessor.from_pretrained(model_id) # For vision.
 tokenizer = AutoTokenizer.from_pretrained(model_id)
-```
-### Using the vision.
-```python
 # Read image
 image = Image.open("image.jpg")
 # Create a message in chat format
 messages = [
-    {
-        "role": "user",
-        "content": [
-            {"type": "image", "image": image},
-            {"type": "text", "text": "Who is in this image?"}
-        ]
-    }
 ]
 # Prepare input with Tokenizer
@@ -221,28 +182,86 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
 ```
-<div style='background-color:#222220;box-shadow:0px 0px 40px #222220;border-radius:16px;width:700px;height:100px; '>
-  <div style='background-color:rgba(15,15,15,0.7);top:10px;right:3px;border-radius:16px;border-bottom-right-radius:0px;padding:1px 10px;width:fit-content;max-width:400px;position:absolute;'>
-  <img src=''>
     Who is in this image?
   </div>
-  <div style='background-color:rgba(0,140,255,0.5);top:28px;right:300px;border-radius:16px;border-bottom-left-radius:0px;padding:1px 10px;width:fit-content;max-width:400px;position:absolute;'>
   The image shows <strong>Mustafa Kemal Atatürk</strong>, the founder and first President of the Republic of Turkey.
   </div>
 </div>
 ---
-### 💡 Usage Examples
-| Category             | Example Prompt                                               |
-| -------------------- | ------------------------------------------------------------ |
-| 🖼️ Image Captioning | "Generate a detailed caption for this image in Turkish."     |
-| 🗣️ Conversation     | "Explain the relationship between the objects in the image." |
-| 📊 Analytical        | "Analyze this chart and summarize key points."               |
-| ✍️ Creative          | "Write a story based on the image content."                  |
-| 🎓 Cultural          | "Describe historical or cultural elements in the image."     |
 ---

 ---
 ## 🚀 Installation & Usage
+### Use with vision:
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
 import torch
 model_id = "Lamapi/next-4b"
 model = AutoModelForCausalLM.from_pretrained(model_id)
 processor = AutoProcessor.from_pretrained(model_id) # For vision.
 tokenizer = AutoTokenizer.from_pretrained(model_id)
 # Read image
 image = Image.open("image.jpg")
 # Create a message in chat format
 messages = [
+  {"role": "system","content": [{"type": "text", "text": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."}]},
+  {
+      "role": "user","content": [{"type": "image", "image": image},
+      {"type": "text", "text": "Who is in this image?"}
+    ]
+  }
 ]
 # Prepare input with Tokenizer
 ```
+<div style='width:700px;'>
+  <img src='/Lamapi/next-4b/resolve/main/assets/image.jpg' style='height:192px;border-radius:16px;margin-left:225px;'>
+  <div style='background-color:rgba(0,140,255,0.5);border-radius:16px;border-bottom-right-radius:0px;padding:3px 10px;width:fit-content;max-width:400px;margin-left:250px;margin-top:-25px;margin-bottom:10px;'>
     Who is in this image?
   </div>
+  <div style='background-color:rgba(42,42,40,0.7);border-radius:16px;border-bottom-left-radius:0px;padding:3px 10px;width:fit-content;max-width:400px;'>
   The image shows <strong>Mustafa Kemal Atatürk</strong>, the founder and first President of the Republic of Turkey.
   </div>
 </div>
+### Use without vision:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_id = "Lamapi/next-4b"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id)
+# Chat message
+messages = [
+    {"role": "system", "content": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."},
+    {"role": "user", "content": "Hello, how are you?"}
+]
+# Prepare input with Tokenizer
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(prompt, return_tensors="pt")
+# Output from the model
+output = model.generate(**inputs, max_new_tokens=50)
+print(tokenizer.decode(output[0], skip_special_tokens=True))
+```
+<div style='width:700px;'>
+  <div style='background-color:rgba(0,140,255,0.5);border-radius:16px;border-bottom-right-radius:0px;padding:3px 10px;width:fit-content;max-width:400px;margin-left:250px;margin-top:-15px;margin-bottom:10px;'>
+    Hello, how are you?
+  </div>
+  <div style='background-color:rgba(42,42,40,0.7);border-radius:16px;border-bottom-left-radius:0px;padding:3px 10px;width:fit-content;max-width:400px;'>
+  I'm fine, thank you. How are you?
+  </div>
+</div>
+---
+## 🎯 Goals
+1. **Multimodal Intelligence:** Understand and reason over images and text.
+2. **Efficiency:** Run on modest GPUs using 8-bit quantization.
+3. **Accessibility:** Open-source availability for research and applications.
+4. **Cultural Relevance:** Optimized for Turkish language and context while remaining multilingual.
+---
+## ✨ Key Features
+| Feature                           | Description                                                             |
+| --------------------------------- | ----------------------------------------------------------------------- |
+| 🔋 Efficient Architecture         | Optimized for low VRAM; supports 8-bit quantization for consumer GPUs.  |
+| 🖼️ Vision-Language Capable       | Understands images, captions them, and performs visual reasoning tasks. |
+| 🇹🇷 Multilingual & Turkish-Ready | Handles complex Turkish text with high accuracy.                        |
+| 🧠 Advanced Reasoning             | Supports logical and analytical reasoning for both text and images.     |
+| 📊 Consistent & Reliable Outputs  | Reproducible responses across multiple runs.                            |
+| 🌍 Open Source                    | Transparent, community-driven, and research-friendly.                   |
 ---
+## 📐 Model Specifications
+| Specification      | Details                                                                            |
+| ------------------ | ---------------------------------------------------------------------------------- |
+| Base Model         | Gemma 3                                                                       |
+| Parameter Count    | 4 Billion                                                                          |
+| Architecture       | Transformer, causal LLM + Vision Encoder                                           |
+| Fine-Tuning Method | Instruction & multimodal fine-tuning (SFT) on Turkish and multilingual datasets    |
+| Optimizations      | Q8_0, F16, F32 quantizations for low VRAM and high VRAM usage                       |
+| Modalities         | Text & Image                                                                       |
+| Use Cases          | Image captioning, multimodal QA, text generation, reasoning, creative storytelling |
 ---