Text Generation
Transformers
Safetensors
qwen2
conversational
text-generation-inference
inikitin commited on
Commit
80caf3e
·
verified ·
1 Parent(s): e297e60

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +209 -1
README.md CHANGED
@@ -6,4 +6,212 @@ base_model:
6
  - Qwen/Qwen2.5-Coder-14B-Instruct
7
  pipeline_tag: text-generation
8
  library_name: transformers
9
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - Qwen/Qwen2.5-Coder-14B-Instruct
7
  pipeline_tag: text-generation
8
  library_name: transformers
9
+ ---
10
+
11
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63aeda3a2314b93f9e706a68/I6WwY8U7I5V8lc138UmGt.jpeg)
12
+
13
+ # Strand-Rust-Coder-14B-v1
14
+
15
+ ## Overview
16
+
17
+ **Strand-Rust-Coder-14B-v1** is the first domain-specialized Rust language model created through **Fortytwo’s Swarm Inference Network**, a decentralized AI architecture where multiple models collaboratively generate, validate, and rank outputs through peer consensus.
18
+
19
+ The model fine-tunes **Qwen2.5-Coder-14B** for Rust-specific programming tasks using a **191K-example synthetic dataset** built via multi-agent generation and peer-reviewed validation.
20
+ It achieves **43–50% accuracy** on Rust-specific benchmarks – surpassing much larger proprietary models like GPT-5 Codex on Rust tasks – while maintaining competitive general coding performance.
21
+
22
+ ## Key Features
23
+
24
+ - **Rust-specialized fine-tuning** on 15 diverse programming task categories
25
+ - **Peer-validated synthetic dataset** (191,008 verified examples, 94.3% compile rate)
26
+ - **LoRA-based fine-tuning** for efficient adaptation
27
+ - **Benchmarked across Rust-specific suites:**
28
+ - **RustEvo²**
29
+ - **Evaluation on Hold-Out Set**
30
+ - **Deployed in the Fortytwo decentralized inference network** for collective AI reasoning
31
+
32
+ ---
33
+
34
+ ## Performance Summary
35
+
36
+ | **Model** | **Hold-Out Set** | **RustEvo²** |
37
+ |------------|------------------|---------------|
38
+ | **Fortytwo-Rust-One-14B (Ours)** | **48.00%** | **43.00%** |
39
+ | openai/gpt-5-codex | 47.00% | 28.00% |
40
+ | anthropic/claude-sonnet-4.5 | 46.00% | 21.00% |
41
+ | anthropic/claude-3.7-sonnet | 42.00% | 31.00% |
42
+ | qwen/qwen3-max | 42.00% | 40.00% |
43
+ | qwen/qwen3-coder-plus | 41.00% | 22.00% |
44
+ | x-ai/grok-4 | 39.00% | 37.00% |
45
+ | deepseek/deepseek-v3.1-terminus | 37.00% | 33.00% |
46
+ | Qwen3-Coder-30B-A3B-Instruct | 36.00% | 20.00% |
47
+ | openai/gpt-4o-latest | 34.00% | 39.00% |
48
+ | deepseek/deepseek-chat | 34.00% | 41.00% |
49
+ | google/gemini-2.5-flash | 33.00% | 7.00% |
50
+ | Qwen2.5-Coder-14B-Instruct (Base) | 29.00% | 30.00% |
51
+ | Qwen2.5-Coder-32B-Instruct | 29.00% | 31.00% |
52
+ | google/gemini-2.5-pro | 28.00% | 22.00% |
53
+ | qwen/qwen-2.5-72b | 28.00% | 32.00% |
54
+ | Tesslate/Tessa-Rust-T1-7B | 23.00% | 19.00% |
55
+
56
+ *Benchmarks measured using unit-test pass rate@1 in Docker-isolated Rust 1.86.0 environment.*
57
+
58
+ ---
59
+
60
+ ## Task Breakdown
61
+
62
+ | Task | Base | Strand-14B |
63
+ |------|------|-------------|
64
+ | test_generation | 0.00 | 0.51 |
65
+ | api_usage_prediction | 0.27 | 0.71 |
66
+ | function_naming | 0.53 | 0.87 |
67
+ | code_refactoring | 0.04 | 0.19–0.20 |
68
+ | variable_naming | 0.87 | 1.00 |
69
+ | code_generation | 0.40 | 0.49 |
70
+
71
+ Largest improvements appear in *test generation*, *API usage prediction*, and *refactoring* – areas demanding strong semantic reasoning about Rust’s ownership and lifetime rules.
72
+
73
+ ---
74
+
75
+ ## Dataset
76
+
77
+ **Fortytwo-Network/Strandset-Rust-v1 (191,008 examples, 15 categories)**
78
+ Built through Fortytwo’s *Swarm Inference* pipeline, where multiple SLMs generate and cross-validate examples with peer review consensus and output aggregation.
79
+
80
+ - 94.3% compile success rate
81
+ - 73.2% consensus acceptance
82
+ - Coverage of 89% of Rust language features
83
+ - Tasks include:
84
+ - `code_generation`, `code_completion`, `bug_detection`, `refactoring`, `optimization`
85
+ - `docstring_generation`, `code_review`, `summarization`, `test_generation`
86
+ - `naming`, `API usage prediction`, `search`
87
+
88
+ Dataset construction involved 2,383 crates from crates.io, automatic compilation tests, and semantic validation of ownership and lifetime correctness.
89
+
90
+ Dataset: [Fortytwo-Network/Strandset-Rust-v1](https://huggingface.co/datasets/Fortytwo-Network/Strandset-Rust-v1)
91
+
92
+ ---
93
+
94
+ ## Training Configuration
95
+
96
+ | Setting | Value |
97
+ |----------|-------|
98
+ | Base model | Qwen2.5-Coder-14B-Instruct |
99
+ | Method | LoRA (r=64, α=16) |
100
+ | Learning rate | 5e-5 |
101
+ | Batch size | 128 |
102
+ | Epochs | 3 |
103
+ | Optimizer | AdamW |
104
+ | Precision | bfloat16 |
105
+ | Objective | Completion-only loss |
106
+ | Context length | 32,768 |
107
+ | Framework | PyTorch + FSDP + Flash Attention 2 |
108
+ | Hardware | 8× H200 GPUs |
109
+
110
+ ---
111
+
112
+ ## Model Architecture
113
+
114
+ - **Base:** Qwen2.5-Coder (14 B parameters, GQA attention, extended RoPE embeddings)
115
+ - **Tokenizer:** 151 k vocabulary optimized for Rust syntax
116
+ - **Context:** 32 k tokens
117
+ - **Fine-tuning:** Parameter-efficient LoRA adapters (≈1% of parameters updated)
118
+ - **Deployment:** Compatible with local deployment and Fortytwo Capsule runtime for distributed swarm inference
119
+
120
+ ---
121
+
122
+ ## Evaluation Protocol
123
+
124
+ - All evaluations executed in Docker-isolated Rust 1.86.0 environment
125
+ - **Code tasks:** measured via unit test pass rate
126
+ - **Documentation & naming tasks:** scored via LLM-based correctness (Claude Sonnet 4 judge)
127
+ - **Code completion & API tasks:** syntax-weighted Levenshtein similarity
128
+ - **Comment generation:** compilation success metric
129
+
130
+ ---
131
+
132
+ ## Why It Matters
133
+
134
+ Rust is a high-safety, low-level language with complex ownership semantics that make it uniquely challenging for general-purpose LLMs.
135
+ At the same time, there is simply **not enough high-quality training data on Rust**, as it remains a relatively modern and rapidly evolving language.
136
+ This scarcity of large, reliable Rust datasets – combined with the language’s intricate borrow checker and type system – makes it an ideal benchmark for evaluating true model understanding and reasoning precision.
137
+
138
+ **Strand-Rust-Coder** demonstrates how **specialized models** can outperform giant centralized models – achieving domain mastery with a fraction of the compute.
139
+ Through **Fortytwo’s Swarm Inference**, the network was able to generate an **extremely accurate synthetic dataset**, enabling a **state-of-the-art Rust model** to be built through an efficient **LoRA fine-tune** rather than full retraining.
140
+
141
+ This work validates Fortytwo’s thesis: **intelligence can scale horizontally through networked specialization rather than centralized scale.**
142
+
143
+ ---
144
+
145
+ ## 🔬 Research & References
146
+
147
+ - [Self-Supervised Inference of Agents in Trustless Environments](https://arxiv.org/abs/2409.08386) – *High-level overview of Fortytwo architecture*
148
+
149
+ ---
150
+
151
+ ## Intended Use
152
+
153
+ - Rust code generation, completion, and documentation
154
+ - Automated refactoring and test generation
155
+ - Integration into code copilots and multi-agent frameworks
156
+ - Research on domain-specialized model training and evaluation
157
+
158
+ ### Limitations
159
+ - May underperform on purely algorithmic or multi-language tasks (e.g., HumanEval-style puzzles).
160
+ - Not suitable for generating unverified production code without compilation and test validation.
161
+
162
+ ---
163
+
164
+ ## Integration with Fortytwo Network
165
+
166
+ Strand-Rust-Coder models are integrated into **Fortytwo’s decentralized Swarm Inference Network**, where specialized models collaborate and rank each other’s outputs.
167
+ This structure enables **peer-reviewed inference**, improving reliability while reducing hallucinations and cost.
168
+
169
+ To run a Fortytwo node or contribute your own models and fine-tunes, visit: [fortytwo.network](https://fortytwo.network)
170
+
171
+ ---
172
+
173
+ ## Inference Examples
174
+
175
+ ### Using `pipeline`
176
+
177
+ ```python
178
+ from transformers import pipeline
179
+
180
+ pipe = pipeline("text-generation", model="Fortytwo-Network/Strand-Rust-Coder-14B-v1")
181
+ messages = [
182
+ {"role": "user", "content": "Write a Rust function that finds the first string longer than 10 characters in a vector."},
183
+ ]
184
+ pipe(messages)
185
+ ```
186
+
187
+ ### Using Transformers Directly
188
+
189
+ ```python
190
+ # Load model directly
191
+ from transformers import AutoTokenizer, AutoModelForCausalLM
192
+
193
+ tokenizer = AutoTokenizer.from_pretrained("Fortytwo-Network/Strand-Rust-Coder-14B-v1")
194
+ model = AutoModelForCausalLM.from_pretrained("Fortytwo-Network/Strand-Rust-Coder-14B-v1")
195
+
196
+ messages = [
197
+ {"role": "user", "content": "Write a Rust function that finds the first string longer than 10 characters in a vector."},
198
+ ]
199
+
200
+ inputs = tokenizer.apply_chat_template(
201
+ messages,
202
+ add_generation_prompt=True,
203
+ tokenize=True,
204
+ return_dict=True,
205
+ return_tensors="pt",
206
+ ).to(model.device)
207
+
208
+ outputs = model.generate(**inputs, max_new_tokens=40)
209
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
210
+ ```
211
+
212
+ ---
213
+
214
+ **Fortytwo – An open, networked intelligence shaped collectively by its participants**
215
+ Join the swarm: [fortytwo.network](https://fortytwo.network)
216
+
217
+ X: [@fortytwonetwork](https://x.com/fortytwonetwork)