Update Qwen3-8B-Q3_K_M/README.md
Browse files- Qwen3-8B-Q3_K_M/README.md +62 -21
    	
        Qwen3-8B-Q3_K_M/README.md
    CHANGED
    
    | @@ -3,6 +3,10 @@ license: apache-2.0 | |
| 3 | 
             
            tags:
         | 
| 4 | 
             
              - gguf
         | 
| 5 | 
             
              - qwen
         | 
|  | |
|  | |
|  | |
|  | |
| 6 | 
             
              - llama.cpp
         | 
| 7 | 
             
              - quantized
         | 
| 8 | 
             
              - text-generation
         | 
| @@ -14,7 +18,7 @@ base_model: Qwen/Qwen3-8B | |
| 14 | 
             
            author: geoffmunn
         | 
| 15 | 
             
            ---
         | 
| 16 |  | 
| 17 | 
            -
            # Qwen3-8B | 
| 18 |  | 
| 19 | 
             
            Quantized version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) at **Q3_K_M** level, derived from **f16** base weights.
         | 
| 20 |  | 
| @@ -28,12 +32,11 @@ Quantized version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) at ** | |
| 28 |  | 
| 29 | 
             
            ## Quality & Performance
         | 
| 30 |  | 
| 31 | 
            -
            | Metric | 
| 32 | 
            -
             | 
| 33 | 
            -
            | ** | 
| 34 | 
            -
            | ** | 
| 35 | 
            -
            | ** | 
| 36 | 
            -
            | **Recommendation** | Acceptable for basic chat on older CPUs. Do not expect coherent logic. |
         | 
| 37 |  | 
| 38 | 
             
            ## Prompt Template (ChatML)
         | 
| 39 |  | 
| @@ -54,13 +57,13 @@ Set this in your app (LM Studio, OpenWebUI, etc.) for best results. | |
| 54 | 
             
            ### Thinking Mode (Recommended for Logic)
         | 
| 55 | 
             
            Use when solving math, coding, or logical problems.
         | 
| 56 |  | 
| 57 | 
            -
            | Parameter | 
| 58 | 
            -
             | 
| 59 | 
            -
            | Temperature | 
| 60 | 
            -
            | Top-P | 
| 61 | 
            -
            | Top-K | 
| 62 | 
            -
            | Min-P | 
| 63 | 
            -
            | Repeat Penalty | 1.1 | 
| 64 |  | 
| 65 | 
             
            > ❗ DO NOT use greedy decoding — it causes infinite loops.
         | 
| 66 |  | 
| @@ -71,13 +74,13 @@ Enable via: | |
| 71 | 
             
            ### Non-Thinking Mode (Fast Dialogue)
         | 
| 72 | 
             
            For casual chat and quick replies.
         | 
| 73 |  | 
| 74 | 
            -
            | Parameter | 
| 75 | 
            -
             | 
| 76 | 
            -
            | Temperature | 
| 77 | 
            -
            | Top-P | 
| 78 | 
            -
            | Top-K | 
| 79 | 
            -
            | Min-P | 
| 80 | 
            -
            | Repeat Penalty | 1.1 | 
| 81 |  | 
| 82 | 
             
            Enable via:
         | 
| 83 | 
             
            - `enable_thinking=False`
         | 
| @@ -116,6 +119,44 @@ Stop sequences: `<|im_end|>`, `<|im_start|>` | |
| 116 | 
             
            > 🧰 **Agent Ready**  
         | 
| 117 | 
             
            > Works with Qwen-Agent, MCP servers, and custom tools.
         | 
| 118 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 119 | 
             
            ## 🖥️ CLI Example Using Ollama or TGI Server
         | 
| 120 |  | 
| 121 | 
             
            Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server (e.g., Ollama, Text Generation Inference).
         | 
|  | |
| 3 | 
             
            tags:
         | 
| 4 | 
             
              - gguf
         | 
| 5 | 
             
              - qwen
         | 
| 6 | 
            +
              - qwen3-8b
         | 
| 7 | 
            +
              - qwen3-8b-q3
         | 
| 8 | 
            +
              - qwen3-8b-q3_k_m
         | 
| 9 | 
            +
              - qwen3-8b-q3_k_m-gguf
         | 
| 10 | 
             
              - llama.cpp
         | 
| 11 | 
             
              - quantized
         | 
| 12 | 
             
              - text-generation
         | 
|  | |
| 18 | 
             
            author: geoffmunn
         | 
| 19 | 
             
            ---
         | 
| 20 |  | 
| 21 | 
            +
            # Qwen3-8B:Q3_K_M
         | 
| 22 |  | 
| 23 | 
             
            Quantized version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) at **Q3_K_M** level, derived from **f16** base weights.
         | 
| 24 |  | 
|  | |
| 32 |  | 
| 33 | 
             
            ## Quality & Performance
         | 
| 34 |  | 
| 35 | 
            +
            | Metric             | Value                                                                               |
         | 
| 36 | 
            +
            |--------------------|-------------------------------------------------------------------------------------|
         | 
| 37 | 
            +
            | **Speed**          | ⚡ Fast                                                                              |
         | 
| 38 | 
            +
            | **RAM Required**   | ~3.6 GB                                                                             |
         | 
| 39 | 
            +
            | **Recommendation** | 🥇 **Best overall model.** Was a top 3 finisher for all questions except the haiku.  |
         | 
|  | |
| 40 |  | 
| 41 | 
             
            ## Prompt Template (ChatML)
         | 
| 42 |  | 
|  | |
| 57 | 
             
            ### Thinking Mode (Recommended for Logic)
         | 
| 58 | 
             
            Use when solving math, coding, or logical problems.
         | 
| 59 |  | 
| 60 | 
            +
            | Parameter      | Value |
         | 
| 61 | 
            +
            |----------------|-------|
         | 
| 62 | 
            +
            | Temperature    | 0.6   |
         | 
| 63 | 
            +
            | Top-P          | 0.95  |
         | 
| 64 | 
            +
            | Top-K          | 20    |
         | 
| 65 | 
            +
            | Min-P          | 0.0   |
         | 
| 66 | 
            +
            | Repeat Penalty | 1.1   |
         | 
| 67 |  | 
| 68 | 
             
            > ❗ DO NOT use greedy decoding — it causes infinite loops.
         | 
| 69 |  | 
|  | |
| 74 | 
             
            ### Non-Thinking Mode (Fast Dialogue)
         | 
| 75 | 
             
            For casual chat and quick replies.
         | 
| 76 |  | 
| 77 | 
            +
            | Parameter      | Value |
         | 
| 78 | 
            +
            |----------------|-------|
         | 
| 79 | 
            +
            | Temperature    | 0.7   |
         | 
| 80 | 
            +
            | Top-P          | 0.8   |
         | 
| 81 | 
            +
            | Top-K          | 20    |
         | 
| 82 | 
            +
            | Min-P          | 0.0   |
         | 
| 83 | 
            +
            | Repeat Penalty | 1.1   |
         | 
| 84 |  | 
| 85 | 
             
            Enable via:
         | 
| 86 | 
             
            - `enable_thinking=False`
         | 
|  | |
| 119 | 
             
            > 🧰 **Agent Ready**  
         | 
| 120 | 
             
            > Works with Qwen-Agent, MCP servers, and custom tools.
         | 
| 121 |  | 
| 122 | 
            +
            ## Customisation & Troubleshooting
         | 
| 123 | 
            +
             | 
| 124 | 
            +
            Importing directly into Ollama should work, but you might encounter this error: `Error: invalid character '<' looking for beginning of value`.
         | 
| 125 | 
            +
            In this case try these steps:
         | 
| 126 | 
            +
             | 
| 127 | 
            +
            1. `wget https://huggingface.co/geoffmunn/Qwen3-8B/resolve/main/Qwen3-8B-f16%3AQ3_K_M.gguf`
         | 
| 128 | 
            +
            2. `nano Modelfile` and enter these details:
         | 
| 129 | 
            +
            ```text
         | 
| 130 | 
            +
            FROM ./Qwen3-8B-f16:Q3_K_M.gguf
         | 
| 131 | 
            +
             
         | 
| 132 | 
            +
            # Chat template using ChatML (used by Qwen)
         | 
| 133 | 
            +
            SYSTEM You are a helpful assistant
         | 
| 134 | 
            +
             | 
| 135 | 
            +
            TEMPLATE "{{ if .System }}<|im_start|>system
         | 
| 136 | 
            +
            {{ .System }}<|im_end|>{{ end }}<|im_start|>user
         | 
| 137 | 
            +
            {{ .Prompt }}<|im_end|>
         | 
| 138 | 
            +
            <|im_start|>assistant
         | 
| 139 | 
            +
            "
         | 
| 140 | 
            +
            PARAMETER stop <|im_start|>
         | 
| 141 | 
            +
            PARAMETER stop <|im_end|>
         | 
| 142 | 
            +
             | 
| 143 | 
            +
            # Default sampling
         | 
| 144 | 
            +
            PARAMETER temperature 0.6
         | 
| 145 | 
            +
            PARAMETER top_p 0.95
         | 
| 146 | 
            +
            PARAMETER top_k 20
         | 
| 147 | 
            +
            PARAMETER min_p 0.0
         | 
| 148 | 
            +
            PARAMETER repeat_penalty 1.1
         | 
| 149 | 
            +
            PARAMETER num_ctx 4096
         | 
| 150 | 
            +
            ```
         | 
| 151 | 
            +
             | 
| 152 | 
            +
            The `num_ctx` value has been dropped to increase speed significantly.
         | 
| 153 | 
            +
             | 
| 154 | 
            +
            3. Then run this command: `ollama create Qwen3-8B-f16:Q3_K_M -f Modelfile`
         | 
| 155 | 
            +
             | 
| 156 | 
            +
            You will now see "Qwen3-8B-f16:Q3_K_M" in your Ollama model list.
         | 
| 157 | 
            +
             | 
| 158 | 
            +
            These import steps are also useful if you want to customise the default parameters or system prompt.
         | 
| 159 | 
            +
             | 
| 160 | 
             
            ## 🖥️ CLI Example Using Ollama or TGI Server
         | 
| 161 |  | 
| 162 | 
             
            Here’s how you can query this model via API using `curl` and `jq`. Replace the endpoint with your local server (e.g., Ollama, Text Generation Inference).
         | 
