Lamapi commited on
Commit
41b6a98
·
1 Parent(s): 337b29c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -40
README.md CHANGED
@@ -1,5 +1,57 @@
1
  ---
2
- language: tr
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  license: mit
4
  tags:
5
  - turkish
@@ -23,22 +75,36 @@ tags:
23
  - machine-learning
24
  - ai-research
25
  - natural-language-processing
 
 
 
26
  - nlp
27
  - finetuned
28
  - lightweight
29
  - creative
30
  - summarization
31
  - question-answering
32
- - chat-model
33
  - generative-ai
34
- - optimized-model
35
  - unsloth
36
  - trl
37
  - sft
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  pipeline_tag: text-generation
39
- metrics:
40
- - bleu
41
- - accuracy
42
  ---
43
 
44
  # 🚀 Next 4B
@@ -51,13 +117,26 @@ metrics:
51
 
52
  ---
53
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ## 📖 Overview
55
 
56
  **Next 4B** is a **4-billion parameter multimodal Vision-Language Model (VLM)** based on **Gemma 3**, fine-tuned to handle **both text and images** efficiently. It is **Türkiye’s first open-source vision-language model**, designed for:
57
 
58
  * Understanding and generating **text and image descriptions**.
59
  * Efficient reasoning and context-aware multimodal outputs.
60
- * Native Turkish support with multilingual capabilities.
61
  * Low-resource deployment using **8-bit quantization** for consumer-grade GPUs.
62
 
63
  This model is ideal for **researchers, developers, and organizations** who need a **high-performance multimodal AI** capable of **visual understanding, reasoning, and creative generation**.
@@ -102,35 +181,56 @@ This model is ideal for **researchers, developers, and organizations** who need
102
 
103
  ## 🚀 Installation & Usage
104
 
105
- ### Python Example
106
 
107
  ```python
108
- from unsloth import FastModel
109
- from transformers import TextStreamer
110
  from PIL import Image
 
111
 
112
- model_path = "Lamapi/next-x1-v-7b"
 
 
 
 
 
 
113
 
114
- # Load 4-bit model for low VRAM
115
- model, tokenizer = FastModel.from_pretrained(model_path, load_in_4bit=True)
 
116
 
117
- # Example multimodal prompt
118
  messages = [
119
- {"role": "system", "content": "You are a creative, reasoning-focused vision-language assistant."},
120
- {"role": "user", "content": "Describe the content of this image and its possible context."},
 
 
 
 
 
121
  ]
122
 
123
- image = Image.open("example.jpg") # Your input image
 
 
124
 
125
- # Prepare prompt
126
- prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
127
- streamer = TextStreamer(tokenizer, skip_prompt=True)
128
 
129
- inputs = tokenizer(prompt, images=[image], return_tensors="pt").to(model.device)
130
 
131
- # Generate multimodal output
132
- _ = model.generate(**inputs, streamer=streamer, max_new_tokens=300, temperature=0.7, top_p=0.9)
133
  ```
 
 
 
 
 
 
 
 
 
 
134
 
135
  ---
136
 
@@ -144,23 +244,6 @@ _ = model.generate(**inputs, streamer=streamer, max_new_tokens=300, temperature=
144
  | ✍️ Creative | "Write a story based on the image content." |
145
  | 🎓 Cultural | "Describe historical or cultural elements in the image." |
146
 
147
- ---
148
-
149
- ## 📊 Performance & Benchmarks
150
-
151
- Next-X1-V 7B has been evaluated for **text and image understanding**, reasoning, and generation:
152
-
153
- * **Perplexity (Turkish text):** ~12–15
154
- * **Tokens/sec on 4-bit consumer GPUs:** 500–1200
155
- * **Image captioning accuracy:** High fidelity for complex scenes
156
- * **Multimodal reasoning:** Consistent and coherent across images and text
157
-
158
- > Indicates competitive performance for a **7B multimodal model**, deployable on standard GPUs with low latency.
159
-
160
-
161
-
162
-
163
-
164
  ---
165
 
166
  ## 📄 License
 
1
  ---
2
+ language:
3
+ - tr
4
+ - en
5
+ - de
6
+ - ka
7
+ - el
8
+ - ku
9
+ - es
10
+ - sl
11
+ - sk
12
+ - af
13
+ - da
14
+ - nl
15
+ - fa
16
+ - fi
17
+ - fr
18
+ - ga
19
+ - hi
20
+ - hu
21
+ - hy
22
+ - ja
23
+ - kg
24
+ - kk
25
+ - ko
26
+ - ky
27
+ - la
28
+ - lb
29
+ - id
30
+ - it
31
+ - is
32
+ - za
33
+ - zh
34
+ - zu
35
+ - cs
36
+ - vi
37
+ - be
38
+ - bg
39
+ - bs
40
+ - ne
41
+ - mn
42
+ - rm
43
+ - ro
44
+ - ru
45
+ - te
46
+ - th
47
+ - tk
48
+ - tt
49
+ - uk
50
+ - uz
51
+ - ug
52
+ - pl
53
+ - pt
54
+ - 'no'
55
  license: mit
56
  tags:
57
  - turkish
 
75
  - machine-learning
76
  - ai-research
77
  - natural-language-processing
78
+ - language
79
+ - multilingual
80
+ - multimodal
81
  - nlp
82
  - finetuned
83
  - lightweight
84
  - creative
85
  - summarization
86
  - question-answering
87
+ - chat
88
  - generative-ai
89
+ - optimized
90
  - unsloth
91
  - trl
92
  - sft
93
+ - chemistry
94
+ - code
95
+ - biology
96
+ - finance
97
+ - legal
98
+ - music
99
+ - art
100
+ - climate
101
+ - medical
102
+ - agent
103
+ - text-generation-inference
104
+ - merge
105
+ - dense
106
  pipeline_tag: text-generation
107
+ library_name: transformers
 
 
108
  ---
109
 
110
  # 🚀 Next 4B
 
117
 
118
  ---
119
 
120
+ ## 📊 Performance & Benchmarks
121
+
122
+ Next-X1-V 7B has been evaluated for **text and image understanding**, reasoning, and generation:
123
+
124
+ * **Perplexity (Turkish text):** ~12–15
125
+ * **Tokens/sec on 4-bit consumer GPUs:** 500–1200
126
+ * **Image captioning accuracy:** High fidelity for complex scenes
127
+ * **Multimodal reasoning:** Consistent and coherent across images and text
128
+
129
+ > Indicates competitive performance for a **4B multimodal model**, deployable on standard GPUs with **very low latency**.
130
+
131
+ ---
132
+
133
  ## 📖 Overview
134
 
135
  **Next 4B** is a **4-billion parameter multimodal Vision-Language Model (VLM)** based on **Gemma 3**, fine-tuned to handle **both text and images** efficiently. It is **Türkiye’s first open-source vision-language model**, designed for:
136
 
137
  * Understanding and generating **text and image descriptions**.
138
  * Efficient reasoning and context-aware multimodal outputs.
139
+ * Turkish support with multilingual capabilities.
140
  * Low-resource deployment using **8-bit quantization** for consumer-grade GPUs.
141
 
142
  This model is ideal for **researchers, developers, and organizations** who need a **high-performance multimodal AI** capable of **visual understanding, reasoning, and creative generation**.
 
181
 
182
  ## 🚀 Installation & Usage
183
 
184
+ ### Load the model (with vision).
185
 
186
  ```python
187
+ from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
 
188
  from PIL import Image
189
+ import torch
190
 
191
+ model_id = "Lamapi/next-4b"
192
+ model = AutoModelForCausalLM.from_pretrained(model_id)
193
+ processor = AutoProcessor.from_pretrained(model_id) # For vision.
194
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
195
+ ```
196
+
197
+ ### Using the vision.
198
 
199
+ ```python
200
+ # Read image
201
+ image = Image.open("image.jpg")
202
 
203
+ # Create a message in chat format
204
  messages = [
205
+ {
206
+ "role": "user",
207
+ "content": [
208
+ {"type": "image", "image": image},
209
+ {"type": "text", "text": "Who is in this image?"}
210
+ ]
211
+ }
212
  ]
213
 
214
+ # Prepare input with Tokenizer
215
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
216
+ inputs = processor(text=prompt, images=[image], return_tensors="pt")
217
 
218
+ # Output from the model
219
+ output = model.generate(**inputs, max_new_tokens=50)
220
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
221
 
 
222
 
 
 
223
  ```
224
+ <div style='background-color:#222220;box-shadow:0px 0px 40px #222220;border-radius:16px;width:700px;height:100px; '>
225
+ <div style='background-color:rgba(15,15,15,0.7);top:10px;right:3px;border-radius:16px;border-bottom-right-radius:0px;padding:1px 10px;width:fit-content;max-width:400px;position:absolute;'>
226
+ <img src=''>
227
+ Who is in this image?
228
+ </div>
229
+ <div style='background-color:rgba(0,140,255,0.5);top:28px;right:300px;border-radius:16px;border-bottom-left-radius:0px;padding:1px 10px;width:fit-content;max-width:400px;position:absolute;'>
230
+ The image shows <strong>Mustafa Kemal Atatürk</strong>, the founder and first President of the Republic of Turkey.
231
+ </div>
232
+ </div>
233
+
234
 
235
  ---
236
 
 
244
  | ✍️ Creative | "Write a story based on the image content." |
245
  | 🎓 Cultural | "Describe historical or cultural elements in the image." |
246
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
247
  ---
248
 
249
  ## 📄 License