Update README.md

a5715d3 verified about 2 months ago

5.45 kB

	---
	library_name: rkllm
	license: other
	license_name: deepseek
	license_link: >-
	https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/blob/main/LICENSE
	language:
	- en
	base_model:
	- deepseek-ai/deepseek-coder-6.7b-instruct
	pipeline_tag: text-generation
	tags:
	- rkllm
	- rk3588
	- rockchip
	- code
	- edge-ai
	- llm
	---
	# deepseek-coder-6.7b-instruct — RKLLM build for RK3588 boards

	Author: @jamescallander
	Source model: [meta-llama/CodeLlama-7b-Instruct-hf · Hugging Face](https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf)
	Target: Rockchip RK3588 NPU via RKNN-LLM Runtime

	> This repository hosts a conversion of `deepseek-coder-6.7b-instruct` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)

	#### Conversion details

	- RKLLM-Toolkit version: v1.2.1
	- NPU driver: v0.9.8
	- Python: 3.12
	- Quantization: `w8a8_g128`
	- Output: single-file `.rkllm` artifact
	- Tokenizer: not required at runtime (UI handles prompt I/O)

	## ⚠️ Code generation disclaimer

	🛑 This model may produce incorrect or insecure code.

	- It is intended for research, educational, and experimental purposes only.
	- Always review, test, and validate code outputs before using them in real projects.
	- Do not rely on outputs for production, security-sensitive, or safety-critical systems.
	- Use responsibly and in compliance with the source model’s license and restrictions.

	## Intended use

	- On-device coding assistant / code generation on RK3588 SBCs.
	- deepseek-coder-6.7b-instruct is tuned for software development and programming tasks, making it suitable for edge deployment where privacy and low power use are priorities.

	## Limitations

	- Requires 9.1GB free memory
	- Quantized build (`w8a8_g128`) may show small quality differences vs. full-precision upstream.
	- Tested on Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
	- Generated code should always be reviewed before use in production systems.

	## Quick start (RK3588)

	### 1) Install runtime

	The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).

	Download and install the required packages as per the toolkit's instructions.

	### 2) Simple Flask server deployment

	The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`

	```bash
	python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
	--rkllm_model_path <MODEL_PATH>/deepseek-coder-6.7b-instruct_w8a8_g128_rk3588.rkllm \
	--target_platform rk3588
	```

	### 3) Sending a request

	A basic format for message request is:

	```json
	{
	"model":"deepseek-coder-6.7b-instruct",
	"messages":[{
	"role":"user",
	"content":"<YOUR_PROMPT_HERE>"}],
	"stream":false
	}
	```

	Example request using `curl`:

	```bash
	curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
	-H 'Content-Type: application/json' \
	-d '{"model":"CodeLlama-7b-Instruct-hf","messages":[{"role":"user","content":"Create a python function to calculate factorials using recursive method."}],"stream":false}'
	```

	The response is formated in the following way:

	```json
	{
	"choices":[{
	"finish_reason":"stop",
	"index":0,
	"logprobs":null,
	"message":{
	"content":"<MODEL_REPLY_HERE">,
	"role":"assistant"}}],
	"created":null,
	"id":"rkllm_chat",
	"object":"rkllm_chat",
	"usage":{
	"completion_tokens":null,
	"prompt_tokens":null,
	"total_tokens":null}
	}
	```

	Example response:

	```json
	{"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Sure, here is the Python code for calculating factorial of a number using recursion: ```python def factorial(n): if n == 0 or n == 1: # base case return 1 else: return n * factorial(n-1) ``` This function works by repeatedly calling itself with the argument `n - 1`, until it reaches a point where `n` is either `0` or `1`. At this point, it returns `1` and the recursion ends. The product of all these returned values gives us the factorial of the original input number.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
	```

	### 4) UI compatibility

	This server exposes an OpenAI-compatible Chat Completions API.

	You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com))

	- Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat`
	- Make sure the `model` field matches the converted model’s name, for example:

	```json
	{
	"model": "deepseek-coder-6.7b-instruct",
	"messages": [{"role":"user","content":"Hello!"}],
	"stream": false
	}
	```

	# License

	This conversion follows the license of the source model: [LICENSE · deepseek-ai/deepseek-coder-6.7b-instruct at main](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/blob/main/LICENSE)
	- -Required notice: see [`NOTICE`](NOTICE)