Lamapi commited on
Commit
f32ef21
·
1 Parent(s): f369b0b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -60
README.md CHANGED
@@ -143,45 +143,9 @@ This model is ideal for **researchers, developers, and organizations** who need
143
 
144
  ---
145
 
146
- ## 🎯 Goals
147
-
148
- 1. **Multimodal Intelligence:** Understand and reason over images and text.
149
- 2. **Efficiency:** Run on modest GPUs using 8-bit quantization.
150
- 3. **Accessibility:** Open-source availability for research and applications.
151
- 4. **Cultural Relevance:** Optimized for Turkish language and context while remaining multilingual.
152
-
153
- ---
154
-
155
- ## ✨ Key Features
156
-
157
- | Feature | Description |
158
- | --------------------------------- | ----------------------------------------------------------------------- |
159
- | 🔋 Efficient Architecture | Optimized for low VRAM; supports 8-bit quantization for consumer GPUs. |
160
- | 🖼️ Vision-Language Capable | Understands images, captions them, and performs visual reasoning tasks. |
161
- | 🇹🇷 Multilingual & Turkish-Ready | Handles complex Turkish text with high accuracy. |
162
- | 🧠 Advanced Reasoning | Supports logical and analytical reasoning for both text and images. |
163
- | 📊 Consistent & Reliable Outputs | Reproducible responses across multiple runs. |
164
- | 🌍 Open Source | Transparent, community-driven, and research-friendly. |
165
-
166
- ---
167
-
168
- ## 📐 Model Specifications
169
-
170
- | Specification | Details |
171
- | ------------------ | ---------------------------------------------------------------------------------- |
172
- | Base Model | Gemma 3 |
173
- | Parameter Count | 4 Billion |
174
- | Architecture | Transformer, causal LLM + Vision Encoder |
175
- | Fine-Tuning Method | Instruction & multimodal fine-tuning (SFT) on Turkish and multilingual datasets |
176
- | Optimizations | Q8_0, F16, F32 quantizations for low VRAM and high VRAM usage |
177
- | Modalities | Text & Image |
178
- | Use Cases | Image captioning, multimodal QA, text generation, reasoning, creative storytelling |
179
-
180
- ---
181
-
182
  ## 🚀 Installation & Usage
183
 
184
- ### Load the model (with vision).
185
 
186
  ```python
187
  from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
@@ -189,26 +153,23 @@ from PIL import Image
189
  import torch
190
 
191
  model_id = "Lamapi/next-4b"
 
192
  model = AutoModelForCausalLM.from_pretrained(model_id)
193
  processor = AutoProcessor.from_pretrained(model_id) # For vision.
194
  tokenizer = AutoTokenizer.from_pretrained(model_id)
195
- ```
196
-
197
- ### Using the vision.
198
 
199
- ```python
200
  # Read image
201
  image = Image.open("image.jpg")
202
 
203
  # Create a message in chat format
204
  messages = [
205
- {
206
- "role": "user",
207
- "content": [
208
- {"type": "image", "image": image},
209
- {"type": "text", "text": "Who is in this image?"}
210
- ]
211
- }
212
  ]
213
 
214
  # Prepare input with Tokenizer
@@ -221,28 +182,86 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
221
 
222
 
223
  ```
224
- <div style='background-color:#222220;box-shadow:0px 0px 40px #222220;border-radius:16px;width:700px;height:100px; '>
225
- <div style='background-color:rgba(15,15,15,0.7);top:10px;right:3px;border-radius:16px;border-bottom-right-radius:0px;padding:1px 10px;width:fit-content;max-width:400px;position:absolute;'>
226
- <img src=''>
227
  Who is in this image?
228
  </div>
229
- <div style='background-color:rgba(0,140,255,0.5);top:28px;right:300px;border-radius:16px;border-bottom-left-radius:0px;padding:1px 10px;width:fit-content;max-width:400px;position:absolute;'>
230
  The image shows <strong>Mustafa Kemal Atatürk</strong>, the founder and first President of the Republic of Turkey.
231
  </div>
232
  </div>
233
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
 
235
  ---
236
 
237
- ### 💡 Usage Examples
238
 
239
- | Category | Example Prompt |
240
- | -------------------- | ------------------------------------------------------------ |
241
- | 🖼️ Image Captioning | "Generate a detailed caption for this image in Turkish." |
242
- | 🗣️ Conversation | "Explain the relationship between the objects in the image." |
243
- | 📊 Analytical | "Analyze this chart and summarize key points." |
244
- | ✍️ Creative | "Write a story based on the image content." |
245
- | 🎓 Cultural | "Describe historical or cultural elements in the image." |
 
 
246
 
247
  ---
248
 
 
143
 
144
  ---
145
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
  ## 🚀 Installation & Usage
147
 
148
+ ### Use with vision:
149
 
150
  ```python
151
  from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
 
153
  import torch
154
 
155
  model_id = "Lamapi/next-4b"
156
+
157
  model = AutoModelForCausalLM.from_pretrained(model_id)
158
  processor = AutoProcessor.from_pretrained(model_id) # For vision.
159
  tokenizer = AutoTokenizer.from_pretrained(model_id)
 
 
 
160
 
 
161
  # Read image
162
  image = Image.open("image.jpg")
163
 
164
  # Create a message in chat format
165
  messages = [
166
+ {"role": "system","content": [{"type": "text", "text": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."}]},
167
+
168
+ {
169
+ "role": "user","content": [{"type": "image", "image": image},
170
+ {"type": "text", "text": "Who is in this image?"}
171
+ ]
172
+ }
173
  ]
174
 
175
  # Prepare input with Tokenizer
 
182
 
183
 
184
  ```
185
+ <div style='width:700px;'>
186
+ <img src='/Lamapi/next-4b/resolve/main/assets/image.jpg' style='height:192px;border-radius:16px;margin-left:225px;'>
187
+ <div style='background-color:rgba(0,140,255,0.5);border-radius:16px;border-bottom-right-radius:0px;padding:3px 10px;width:fit-content;max-width:400px;margin-left:250px;margin-top:-25px;margin-bottom:10px;'>
188
  Who is in this image?
189
  </div>
190
+ <div style='background-color:rgba(42,42,40,0.7);border-radius:16px;border-bottom-left-radius:0px;padding:3px 10px;width:fit-content;max-width:400px;'>
191
  The image shows <strong>Mustafa Kemal Atatürk</strong>, the founder and first President of the Republic of Turkey.
192
  </div>
193
  </div>
194
 
195
+ ### Use without vision:
196
+
197
+ ```python
198
+ from transformers import AutoTokenizer, AutoModelForCausalLM
199
+ import torch
200
+
201
+ model_id = "Lamapi/next-4b"
202
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
203
+ model = AutoModelForCausalLM.from_pretrained(model_id)
204
+
205
+ # Chat message
206
+ messages = [
207
+ {"role": "system", "content": "You are Next-X1, a smart and concise AI assistant trained by Lamapi. Always respond in the user's language. Proudly made in Turkey."},
208
+ {"role": "user", "content": "Hello, how are you?"}
209
+ ]
210
+
211
+ # Prepare input with Tokenizer
212
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
213
+ inputs = tokenizer(prompt, return_tensors="pt")
214
+
215
+ # Output from the model
216
+ output = model.generate(**inputs, max_new_tokens=50)
217
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
218
+
219
+ ```
220
+
221
+ <div style='width:700px;'>
222
+ <div style='background-color:rgba(0,140,255,0.5);border-radius:16px;border-bottom-right-radius:0px;padding:3px 10px;width:fit-content;max-width:400px;margin-left:250px;margin-top:-15px;margin-bottom:10px;'>
223
+ Hello, how are you?
224
+ </div>
225
+ <div style='background-color:rgba(42,42,40,0.7);border-radius:16px;border-bottom-left-radius:0px;padding:3px 10px;width:fit-content;max-width:400px;'>
226
+ I'm fine, thank you. How are you?
227
+ </div>
228
+ </div>
229
+
230
+ ---
231
+
232
+ ## 🎯 Goals
233
+
234
+ 1. **Multimodal Intelligence:** Understand and reason over images and text.
235
+ 2. **Efficiency:** Run on modest GPUs using 8-bit quantization.
236
+ 3. **Accessibility:** Open-source availability for research and applications.
237
+ 4. **Cultural Relevance:** Optimized for Turkish language and context while remaining multilingual.
238
+
239
+ ---
240
+
241
+ ## ✨ Key Features
242
+
243
+ | Feature | Description |
244
+ | --------------------------------- | ----------------------------------------------------------------------- |
245
+ | 🔋 Efficient Architecture | Optimized for low VRAM; supports 8-bit quantization for consumer GPUs. |
246
+ | 🖼️ Vision-Language Capable | Understands images, captions them, and performs visual reasoning tasks. |
247
+ | 🇹🇷 Multilingual & Turkish-Ready | Handles complex Turkish text with high accuracy. |
248
+ | 🧠 Advanced Reasoning | Supports logical and analytical reasoning for both text and images. |
249
+ | 📊 Consistent & Reliable Outputs | Reproducible responses across multiple runs. |
250
+ | 🌍 Open Source | Transparent, community-driven, and research-friendly. |
251
 
252
  ---
253
 
254
+ ## 📐 Model Specifications
255
 
256
+ | Specification | Details |
257
+ | ------------------ | ---------------------------------------------------------------------------------- |
258
+ | Base Model | Gemma 3 |
259
+ | Parameter Count | 4 Billion |
260
+ | Architecture | Transformer, causal LLM + Vision Encoder |
261
+ | Fine-Tuning Method | Instruction & multimodal fine-tuning (SFT) on Turkish and multilingual datasets |
262
+ | Optimizations | Q8_0, F16, F32 quantizations for low VRAM and high VRAM usage |
263
+ | Modalities | Text & Image |
264
+ | Use Cases | Image captioning, multimodal QA, text generation, reasoning, creative storytelling |
265
 
266
  ---
267