geoffmunn
/

Qwen3-8B

@@ -3,6 +3,10 @@ license: apache-2.0
 tags:
   - gguf
   - qwen
   - llama.cpp
   - quantized
   - text-generation
@@ -14,7 +18,7 @@ base_model: Qwen/Qwen3-8B
 author: geoffmunn
 ---
-# Qwen3-8B-Q3_K_M
 Quantized version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) at **Q3_K_M** level, derived from **f16** base weights.
@@ -28,12 +32,11 @@ Quantized version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) at **
 ## Quality & Performance
-| Metric | Value |
-|-------|-------|
-| **Quality** | Low-Medium |
-| **Speed** | ⚡ Fast |
-| **RAM Required** | ~3.6 GB |
-| **Recommendation** | Acceptable for basic chat on older CPUs. Do not expect coherent logic. |
 ## Prompt Template (ChatML)
@@ -54,13 +57,13 @@ Set this in your app (LM Studio, OpenWebUI, etc.) for best results.
 ### Thinking Mode (Recommended for Logic)
 Use when solving math, coding, or logical problems.
-| Parameter | Value |
-|---------|-------|
-| Temperature | 0.6 |
-| Top-P | 0.95 |
-| Top-K | 20 |
-| Min-P | 0.0 |
-| Repeat Penalty | 1.1 |
 > ❗ DO NOT use greedy decoding — it causes infinite loops.
@@ -71,13 +74,13 @@ Enable via:
 ### Non-Thinking Mode (Fast Dialogue)
 For casual chat and quick replies.
-| Parameter | Value |
-|---------|-------|
-| Temperature | 0.7 |
-| Top-P | 0.8 |
-| Top-K | 20 |
-| Min-P | 0.0 |
-| Repeat Penalty | 1.1 |
 Enable via:
 - `enable_thinking=False`
@@ -116,6 +119,44 @@ Stop sequences: `<|im_end|>`, `<|im_start|>`
 > 🧰 **Agent Ready**
 > Works with Qwen-Agent, MCP servers, and custom tools.
 ## 🖥️ CLI Example Using Ollama or TGI Server
 Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server (e.g., Ollama, Text Generation Inference).

 tags:
   - gguf
   - qwen
+  - qwen3-8b
+  - qwen3-8b-q3
+  - qwen3-8b-q3_k_m
+  - qwen3-8b-q3_k_m-gguf
   - llama.cpp
   - quantized
   - text-generation
 author: geoffmunn
 ---
+# Qwen3-8B:Q3_K_M
 Quantized version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) at **Q3_K_M** level, derived from **f16** base weights.
 ## Quality & Performance
+| Metric             | Value                                                                               |
+|--------------------|-------------------------------------------------------------------------------------|
+| **Speed**          | ⚡ Fast                                                                              |
+| **RAM Required**   | ~3.6 GB                                                                             |
+| **Recommendation** | 🥇 **Best overall model.** Was a top 3 finisher for all questions except the haiku.  |
 ## Prompt Template (ChatML)
 ### Thinking Mode (Recommended for Logic)
 Use when solving math, coding, or logical problems.
+| Parameter      | Value |
+|----------------|-------|
+| Temperature    | 0.6   |
+| Top-P          | 0.95  |
+| Top-K          | 20    |
+| Min-P          | 0.0   |
+| Repeat Penalty | 1.1   |
 > ❗ DO NOT use greedy decoding — it causes infinite loops.
 ### Non-Thinking Mode (Fast Dialogue)
 For casual chat and quick replies.
+| Parameter      | Value |
+|----------------|-------|
+| Temperature    | 0.7   |
+| Top-P          | 0.8   |
+| Top-K          | 20    |
+| Min-P          | 0.0   |
+| Repeat Penalty | 1.1   |
 Enable via:
 - `enable_thinking=False`
 > 🧰 **Agent Ready**
 > Works with Qwen-Agent, MCP servers, and custom tools.
+## Customisation & Troubleshooting
+Importing directly into Ollama should work, but you might encounter this error: `Error: invalid character '<' looking for beginning of value`.
+In this case try these steps:
+1. `wget https://huggingface.co/geoffmunn/Qwen3-8B/resolve/main/Qwen3-8B-f16%3AQ3_K_M.gguf`
+2. `nano Modelfile` and enter these details:
+```text
+FROM ./Qwen3-8B-f16:Q3_K_M.gguf
+# Chat template using ChatML (used by Qwen)
+SYSTEM You are a helpful assistant
+TEMPLATE "{{ if .System }}<|im_start|>system
+{{ .System }}<|im_end|>{{ end }}<|im_start|>user
+{{ .Prompt }}<|im_end|>
+<|im_start|>assistant
+"
+PARAMETER stop <|im_start|>
+PARAMETER stop <|im_end|>
+# Default sampling
+PARAMETER temperature 0.6
+PARAMETER top_p 0.95
+PARAMETER top_k 20
+PARAMETER min_p 0.0
+PARAMETER repeat_penalty 1.1
+PARAMETER num_ctx 4096
+```
+The `num_ctx` value has been dropped to increase speed significantly.
+3. Then run this command: `ollama create Qwen3-8B-f16:Q3_K_M -f Modelfile`
+You will now see "Qwen3-8B-f16:Q3_K_M" in your Ollama model list.
+These import steps are also useful if you want to customise the default parameters or system prompt.
 ## 🖥️ CLI Example Using Ollama or TGI Server
 Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server (e.g., Ollama, Text Generation Inference).