jamescallander commited on
Commit
ec2b3da
·
verified ·
1 Parent(s): e571a82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -6
README.md CHANGED
@@ -1,6 +1,145 @@
1
- ---
2
- license: other
3
- license_name: deepseek
4
- license_link: >-
5
- https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/blob/main/LICENSE
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: rkllm
3
+ license: other
4
+ license_name: deepseek
5
+ license_link: >-
6
+ https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/blob/main/LICENSE
7
+ language:
8
+ - en
9
+ base_model:
10
+ - deepseek-ai/deepseek-coder-6.7b-instruct
11
+ pipeline_tag: text-generation
12
+ tags:
13
+ - rkllm
14
+ - rk3588
15
+ - rockchip
16
+ - code
17
+ - edge-ai
18
+ - llm
19
+ ---
20
+ # deepseek-coder-6.7b-instruct — RKLLM build for RK3588 boards
21
+
22
+ **Author:** @jamescallander
23
+ **Source model:** [meta-llama/CodeLlama-7b-Instruct-hf · Hugging Face](https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf)
24
+ **Target:** Rockchip RK3588 NPU via RKNN-LLM Runtime
25
+
26
+ > This repository hosts a **conversion** of `deepseek-coder-6.7b-instruct` for use on Rockchip RK3588 single-board computers (Orange Pi 5 plus, Radxa Rock 5b+, Banana Pi M7, etc.). Conversion was performed using the [RKNN-LLM toolkit](https://github.com/airockchip/rknn-llm?utm_source=chatgpt.com)
27
+
28
+ #### Conversion details
29
+
30
+ - RKLLM-Toolkit version: v1.2.1
31
+ - NPU driver: v0.9.8
32
+ - Python: 3.12
33
+ - Quantization: `w8a8_g128`
34
+ - Output: single-file `.rkllm` artifact
35
+ - Tokenizer: not required at runtime (UI handles prompt I/O)
36
+
37
+ ## ⚠️ Code generation disclaimer
38
+
39
+ 🛑 **This model may produce incorrect or insecure code.**
40
+
41
+ - It is intended for **research, educational, and experimental purposes only**.
42
+ - Always **review, test, and validate code outputs** before using them in real projects.
43
+ - Do not rely on outputs for production, security-sensitive, or safety-critical systems.
44
+ - Use responsibly and in compliance with the source model’s license and restrictions.
45
+
46
+ ## Intended use
47
+
48
+ - On-device coding assistant / code generation on RK3588 SBCs.
49
+ - deepseek-coder-6.7b-instruct is tuned for software development and programming tasks, making it suitable for **edge deployment** where privacy and low power use are priorities.
50
+
51
+ ## Limitations
52
+
53
+ - Requires 9GB free memory
54
+ - Quantized build (`w8a8_g128`) may show small quality differences vs. full-precision upstream.
55
+ - Tested on Radxa Rock 5B+; other devices may require different drivers/toolkit versions.
56
+ - Generated code should always be reviewed before use in production systems.
57
+
58
+ ## Quick start (RK3588)
59
+
60
+ ### 1) Install runtime
61
+
62
+ The RKNN-LLM toolkit and instructions can be found on the specific development board's manufacturer website or from [airockchip's github page](https://github.com/airockchip).
63
+
64
+ Download and install the required packages as per the toolkit's instructions.
65
+
66
+ ### 2) Simple Flask server deployment
67
+
68
+ The simplest way the deploy the `.rkllm` converted model is using an example script provided in the toolkit in this directory: `rknn-llm/examples/rkllm_server_demo`
69
+
70
+ ```bash
71
+ python3 <TOOLKIT_PATH>/rknn-llm/examples/rkllm_server_demo/flask_server.py \
72
+ --rkllm_model_path <MODEL_PATH>/deepseek-coder-6.7b-instruct_w8a8_g128_rk3588.rkllm \
73
+ --target_platform rk3588
74
+ ```
75
+
76
+ ### 3) Sending a request
77
+
78
+ A basic format for message request is:
79
+
80
+ ```json
81
+ {
82
+ "model":"deepseek-coder-6.7b-instruct",
83
+ "messages":[{
84
+ "role":"user",
85
+ "content":"<YOUR_PROMPT_HERE>"}],
86
+ "stream":false
87
+ }
88
+ ```
89
+
90
+ Example request using `curl`:
91
+
92
+ ```bash
93
+ curl -s -X POST <SERVER_IP_ADDRESS>:8080/rkllm_chat \
94
+ -H 'Content-Type: application/json' \
95
+ -d '{"model":"CodeLlama-7b-Instruct-hf","messages":[{"role":"user","content":"Create a python function to calculate factorials using recursive method."}],"stream":false}'
96
+ ```
97
+
98
+ The response is formated in the following way:
99
+
100
+ ```json
101
+ {
102
+ "choices":[{
103
+ "finish_reason":"stop",
104
+ "index":0,
105
+ "logprobs":null,
106
+ "message":{
107
+ "content":"<MODEL_REPLY_HERE">,
108
+ "role":"assistant"}}],
109
+ "created":null,
110
+ "id":"rkllm_chat",
111
+ "object":"rkllm_chat",
112
+ "usage":{
113
+ "completion_tokens":null,
114
+ "prompt_tokens":null,
115
+ "total_tokens":null}
116
+ }
117
+ ```
118
+
119
+ Example response:
120
+
121
+ ```json
122
+ {"choices":[{"finish_reason":"stop","index":0,"logprobs":null,"message":{"content":"Sure, here is the Python code for calculating factorial of a number using recursion: ```python def factorial(n): if n == 0 or n == 1: # base case return 1 else: return n * factorial(n-1) ``` This function works by repeatedly calling itself with the argument `n - 1`, until it reaches a point where `n` is either `0` or `1`. At this point, it returns `1` and the recursion ends. The product of all these returned values gives us the factorial of the original input number.","role":"assistant"}}],"created":null,"id":"rkllm_chat","object":"rkllm_chat","usage":{"completion_tokens":null,"prompt_tokens":null,"total_tokens":null}}
123
+ ```
124
+
125
+ ### 4) UI compatibility
126
+
127
+ This server exposes an **OpenAI-compatible Chat Completions API**.
128
+
129
+ You can connect it to any OpenAI-compatible client or UI (for example: [Open WebUI](https://github.com/open-webui/open-webui?utm_source=chatgpt.com))
130
+
131
+ - Configure your client with the API base: `http://<SERVER_IP_ADDRESS>:8080` and use the endpoint: `/rkllm_chat`
132
+ - Make sure the `model` field matches the converted model’s name, for example:
133
+
134
+ ```json
135
+ {
136
+ "model": "deepseek-coder-6.7b-instruct",
137
+ "messages": [{"role":"user","content":"Hello!"}],
138
+ "stream": false
139
+ }
140
+ ```
141
+
142
+ # License
143
+
144
+ This conversion follows the license of the source model: [LICENSE · deepseek-ai/deepseek-coder-6.7b-instruct at main](https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-instruct/blob/main/LICENSE)
145
+ - -**Required notice:** see [`NOTICE`](NOTICE)