--- library_name: gguf license: apache-2.0 base_model: - Ellbendls/Qwen-3-4b-Text_to_SQL - Qwen/Qwen3-4B-Instruct-2507 tags: - gguf - llama.cpp - qwen - text-to-sql - sql - instruct language: - eng - zho - fra - spa - por - deu - ita - rus - jpn - kor - vie - tha - ara pipeline_tag: text-generation --- # Ellbendls/Qwen-3-4b-Text_to_SQL-GGUF Quantized GGUF builds of `Ellbendls/Qwen-3-4b-Text_to_SQL` for fast CPU/GPU inference with llama.cpp-compatible runtimes. - **Base model**. Fine-tuned from **Qwen/Qwen3-4B-Instruct-2507** for Text-to-SQL. - **License**. Apache-2.0 (inherits from base). Keep attribution. - **Purpose**. Turn natural language into SQL. When schema is missing, the model can infer a simple schema then produce SQL. ## Files Base and quantized variants: - `Qwen-3-4b-Text_to_SQL-F16.gguf` — reference float16 export - `Qwen-3-4b-Text_to_SQL-q2_k.gguf` - `Qwen-3-4b-Text_to_SQL-q3_k_m.gguf` - `Qwen-3-4b-Text_to_SQL-q4_k_s.gguf` - `Qwen-3-4b-Text_to_SQL-q4_k_m.gguf` ← good default - `Qwen-3-4b-Text_to_SQL-q5_k_m.gguf` - `Qwen-3-4b-Text_to_SQL-q6_k.gguf` - `Qwen-3-4b-Text_to_SQL-q8_0.gguf` ← near-lossless, larger Conversion and quantization done with `llama.cpp`. ## Recommended pick - **Q4_K_M**. Best balance of speed and quality for laptops and small servers. - **Q5_K_M**. Higher quality, a bit more RAM/VRAM. - **Q8_0**. Highest quality among quants. Use if you have headroom. ## Approximate memory needs These are ballpark for a 4B model. Real usage varies by runtime and context length. - Q4_K_M: 3–4 GB RAM/VRAM - Q5_K_M: 4–5 GB - Q8_0: 6–8 GB - F16: 10–12 GB ## Quick start ### llama.cpp (CLI) CPU only: ```bash ./llama-cli -m Qwen-3-4b-Text_to_SQL-q4_k_m.gguf \ -p "Generate SQL to get average salary by department in 2024." \ -n 256 -t 6 ```` NVIDIA GPU offload (build with `-DLLAMA_CUBLAS=ON`): ```bash ./llama-cli -m Qwen-3-4b-Text_to_SQL-q4_k_m.gguf \ -p "Generate SQL to get average salary by department in 2024." \ -n 256 -ngl 999 -t 6 ``` ### Python (llama-cpp-python) ```python from llama_cpp import Llama llm = Llama(model_path="Qwen-3-4b-Text_to_SQL-q4_k_m.gguf", n_ctx=4096, n_gpu_layers=35) # set 0 for CPU-only prompt = "Generate SQL to list total orders and revenue by month for 2024." out = llm(prompt, max_tokens=256, temperature=0.2, top_p=0.9) print(out["choices"][0]["text"].strip()) ``` ### LM Studio / Kobold / text-generation-webui * Select the `.gguf` file and load. * Set temperature 0.1–0.3 for deterministic SQL. * Use a system prompt to anchor behavior. ## Model details * **Base**. `Qwen/Qwen3-4B-Instruct-2507` (32k context, multilingual). * **Fine-tune**. Trained on `gretelai/synthetic_text_to_sql`. * **Task**. NL → SQL. Capable of simple schema inference when needed. * **Languages**. Works best in English. Can follow prompts in several languages from the base model. ## Conversion reproducibility Export used: ```bash python convert_hf_to_gguf.py /path/to/hf_model --outtype f16 --outfile Qwen-3-4b-Text_to_SQL-F16.gguf ``` Quantization used: ```bash ./llama-quantize Qwen-3-4b-Text_to_SQL-F16.gguf Qwen-3-4b-Text_to_SQL-q4_k_m.gguf Q4_K_M # likewise for q2_k, q3_k_m, q5_k_m, q8_0 ``` ## Intended use and limits * **Use**. Analytics, reporting, dashboards, data exploration, SQL prototyping. * **Limits**. No database connectivity. It only generates SQL text. Validate and test queries before use in production. Provide real schema for best accuracy. ## Attribution * Base model: [`Qwen/Qwen3-4B-Instruct-2507`](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) * Fine-tuned model: [`Ellbendls/Qwen-3-4b-Text_to_SQL`](https://huggingface.co/Ellbendls/Qwen-3-4b-Text_to_SQL) ## License Apache-2.0. Include license and NOTICE from upstream when redistributing the weights. Do not imply endorsement from Qwen or original authors. ## Changelog * 2025-09-17. Initial GGUF release. Added q2\_k, q3\_k\_m, q4\_k\_m, q5\_k\_m, q8\_0, and F16. ``` ::contentReference[oaicite:0]{index=0} ```