deepseek-coder-6.7b-instruct — RKLLM build for RK3588 boards

Author: @jamescallander
Source model: meta-llama/CodeLlama-7b-Instruct-hf · Hugging Face
Target: Rockchip RK3588 NPU via RKNN-LLM Runtime

This repository hosts a conversion of deepseek-coder-6.7b-instruct for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the RKNN-LLM toolkit

Conversion details

  • RKLLM-Toolkit version: v1.2.1
  • NPU driver: v0.9.8
  • Python: 3.12
  • Quantization: w8a8_g128
  • Output: single-file .rkllm artifact
  • Tokenizer: not required at runtime (UI handles prompt I/O)

⚠️ Code generation disclaimer

🛑 This model may produce incorrect or insecure code.

  • It is intended for research, educational, and experimental purposes only.
  • Always review, test, and validate code outputs before using them in real projects.
  • Do not rely on outputs for production, security-sensitive, or safety-critical systems.
  • Use responsibly and in compliance with the source model’s license and restrictions.

Intended use

  • On-device coding assistant / code generation on RK3588 SBCs.
  • deepseek-coder-6.7b-instruct is tuned for software development and programming tasks, making it suitable for edge deployment where privacy and low power use are priorities.

Limitations

  • Requires 9.1GB free memory
  • Quantized build (w8a8_g128) may show small quality differences vs. full-precision upstream.
  • Tested on Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
  • Generated code should always be reviewed before use in production systems.

Quick start (RK3588)

1) Install runtime

The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from airockchip's github page.

Download and install the required packages as per the toolkit's instructions.

2) Simple Flask server deployment

The simplest way the deploy the .rkllm converted model is using an example script provided in the toolkit in this directory: rknn-llm/examples/rkllm_server_demo

python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
  --rkllm_model_path <MODEL_PATH>/deepseek-coder-6.7b-instruct_w8a8_g128_rk3588.rkllm \
  --target_platform rk3588

3) Sending a request

A basic format for message request is:

{
    "model":"deepseek-coder-6.7b-instruct",
    "messages":[{
        "role":"user",
        "content":"<YOUR_PROMPT_HERE>"}],
    "stream":false
}

Example request using curl:

curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
    -H 'Content-Type: application/json' \
    -d '{"model":"CodeLlama-7b-Instruct-hf","messages":[{"role":"user","content":"Create a python function to calculate factorials using recursive method."}],"stream":false}'

The response is formated in the following way:

{
    "choices":[{
        "finish_reason":"stop",
        "index":0,
        "logprobs":null,
        "message":{
            "content":"<MODEL_REPLY_HERE">,
            "role":"assistant"}}],
        "created":null,
        "id":"rkllm_chat",
        "object":"rkllm_chat",
        "usage":{
            "completion_tokens":null,
            "prompt_tokens":null,
            "total_tokens":null}
}

Example response:

{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Sure, here is the Python code for calculating factorial of a number using recursion: ```python def factorial(n): if n == 0 or n == 1: # base case return 1 else: return n * factorial(n-1) ``` This function works by repeatedly calling itself with the argument `n - 1`, until it reaches a point where `n` is either `0` or `1`. At this point, it returns `1` and the recursion ends. The product of all these returned values gives us the factorial of the original input number.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}

4) UI compatibility

This server exposes an OpenAI-compatible Chat Completions API.

You can connect it to any OpenAI-compatible client or UI (for example: Open WebUI)

  • Configure your client with the API base: http://<SERVER_IP_ADDRESS>:8080 and use the endpoint: /rkllm_chat
  • Make sure the model field matches the converted model’s name, for example:
{
 "model": "deepseek-coder-6.7b-instruct",
 "messages": [{"role":"user","content":"Hello!"}],
 "stream": false
}

License

This conversion follows the license of the source model: LICENSE · deepseek-ai/deepseek-coder-6.7b-instruct at main

  • -Required notice: see NOTICE
Downloads last month
308
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jamescallander/deepseek-coder-6.7b-instruct_w8a8_g128_rk3588.rkllm

Finetuned
(52)
this model

Collections including jamescallander/deepseek-coder-6.7b-instruct_w8a8_g128_rk3588.rkllm