Can111 nielsr HF Staff commited on
Commit
00cef0a
·
verified ·
1 Parent(s): c8077fa

Add usage example and explicit project page link (#3)

Browse files

- Add usage example and explicit project page link (daeff8c8ac5e7d5447a6c8a10127b4fdcb531733)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +152 -102
README.md CHANGED
@@ -1,103 +1,153 @@
1
- ---
2
- base_model: Qwen/Qwen2.5-32B-Instruct
3
- language:
4
- - zho
5
- - eng
6
- - fra
7
- - spa
8
- - por
9
- - deu
10
- - ita
11
- - rus
12
- - jpn
13
- - kor
14
- - vie
15
- - tha
16
- - ara
17
- library_name: transformers
18
- license: apache-2.0
19
- tags:
20
- - multi-agent systems
21
- - multiagent-collaboration
22
- - reasoning
23
- - mathematics
24
- - code
25
- pipeline_tag: text-generation
26
- model-index:
27
- - name: m1-32b
28
- results: []
29
- ---
30
-
31
- [Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning](https://arxiv.org/pdf/2504.09772)
32
-
33
- **M1-32B** is a 32B-parameter large language model fine-tuned from [Qwen2.5-32B-Instruct](https://arxiv.org/pdf/2412.15115) on the **M500** dataset—an interdisciplinary multi-agent collaborative reasoning dataset. M1-32B is optimized for improved reasoning, discussion, and decision-making in multi-agent systems (MAS), including frameworks such as [AgentVerse](https://github.com/OpenBMB/AgentVerse).
34
-
35
- Code: [https://github.com/jincan333/MAS-TTS](https://github.com/jincan333/MAS-TTS)
36
-
37
- ---
38
-
39
- ## 🚀 Key Features
40
-
41
- - 🧠 **Enhanced Collaborative Reasoning**
42
- Trained on real multi-agent traces involving diverse roles like Expert Recruiter, Problem Solvers, and Evaluator.
43
-
44
- - 🗣️ **Role-Aware Dialogue Generation**
45
- Learns to reason and respond from different expert perspectives based on structured prompts.
46
-
47
- - ⚙️ **Optimized for Multi-Agent Systems**
48
- Performs well as a MAS agent with adaptive collaboration and token budgeting.
49
-
50
- ---
51
-
52
- ## 🏗️ Model Training
53
-
54
- - **Base Model:** Qwen2.5-32B-Instruct
55
- - **Dataset:** [M500](https://huggingface.co/datasets/Can111/M500) (500 curated multi-agent reasoning traces)
56
- - **Objective:** Supervised Fine-Tuning (SFT) on role-conditioned prompts
57
- - **Training Setup:**
58
- - 8 × A100 GPUs
59
- - 5 epochs
60
- - Learning rate: 1e-5
61
- - Frameworks: DeepSpeed, FlashAttention, LLaMA-Factory
62
-
63
- ---
64
-
65
- ## 📊 Performance
66
-
67
- | **Model** | **General Understanding** | | **Mathematical Reasoning** | | **Coding** | |
68
- |--------------------------|---------------------------|----------------|-----------------------------|------------|----------------|-----------|
69
- | | **GPQA** | **Commongen** | **AIME2024** | **MATH-500** | **HumanEval** | **MBPP-S**|
70
- | **Non-Reasoning Models** | | | | | | |
71
- | Qwen2.5 | 50.2 | 96.7 | 21.1 | 84.4 | 89.0 | 80.2 |
72
- | DeepSeek-V3 | **58.6** | **98.6** | **33.3** | **88.6** | 89.6 | 83.9 |
73
- | GPT-4o | 49.2 | 97.8 | 7.8 | 81.3 | **90.9** | **85.4** |
74
- | **Reasoning Models** | | | | | | |
75
- | s1.1-32B | 58.3 | 94.1 | 53.3 | 90.6 | 82.3 | 77.4 |
76
- | DeepSeek-R1 | **75.5** | 97.2 | 78.9 | **96.2** | **98.2** | 91.7 |
77
- | o3-mini | 71.3 | **99.1** | **84.4** | 95.3 | 97.0 | **93.6** |
78
- | M1-32B (Ours) | 61.1 | 96.9 | 60.0 | 95.1 | 92.8 | 89.1 |
79
- | M1-32B w. CEO (Ours) | 62.1 | 97.4 | 62.2 | 95.8 | 93.9 | 90.5 |
80
-
81
- **Table Caption:**
82
- Performance comparison on general understanding, mathematical reasoning, and coding tasks using strong reasoning and non-reasoning models within the AgentVerse framework. Our method achieves substantial improvements over Qwen2.5 and s1.1-32B on all tasks, and attains performance comparable to o3-mini and DeepSeek-R1 on MATH-500 and MBPP-S, demonstrating its effectiveness in enhancing collaborative reasoning in MAS. Note that the results of s1.1-32B are obtained without using budget forcing.
83
-
84
- ---
85
-
86
- ## 💬 Intended Use
87
-
88
- M1-32B is intended for research on Multi-agent reasoning and collaboration in MAS
89
-
90
- ---
91
-
92
- ## Citation
93
-
94
- If you use this model, please cite the relevant papers:
95
-
96
- ```bibtex
97
- @article{jin2025two,
98
- title={Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning},
99
- author={Jin, Can and Peng, Hongwu and Zhang, Qixin and Tang, Yujin and Metaxas, Dimitris N and Che, Tong},
100
- journal={arXiv preprint arXiv:2504.09772},
101
- year={2025}
102
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
  ```
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-32B-Instruct
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ library_name: transformers
18
+ license: apache-2.0
19
+ pipeline_tag: text-generation
20
+ tags:
21
+ - multi-agent systems
22
+ - multiagent-collaboration
23
+ - reasoning
24
+ - mathematics
25
+ - code
26
+ model-index:
27
+ - name: m1-32b
28
+ results: []
29
+ ---
30
+
31
+ [Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning](https://arxiv.org/pdf/2504.09772)
32
+
33
+ **M1-32B** is a 32B-parameter large language model fine-tuned from [Qwen2.5-32B-Instruct](https://arxiv.org/pdf/2412.15115) on the **M500** dataset—an interdisciplinary multi-agent collaborative reasoning dataset. M1-32B is optimized for improved reasoning, discussion, and decision-making in multi-agent systems (MAS), including frameworks such as [AgentVerse](https://github.com/OpenBMB/AgentVerse).
34
+
35
+ Code: [https://github.com/jincan333/MAS-TTS](https://github.com/jincan333/MAS-TTS)
36
+ Project page: [https://github.com/jincan333/MAS-TTS](https://github.com/jincan333/MAS-TTS)
37
+
38
+ ---
39
+
40
+ ## How to Use with 🤗 Transformers
41
+
42
+ You can use this model directly with the `transformers` library for text generation.
43
+
44
+ ```python
45
+ from transformers import AutoModelForCausalLM, AutoTokenizer
46
+ import torch
47
+
48
+ model_id = "Can111/m1-32b"
49
+
50
+ # Load tokenizer and model
51
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
52
+ model = AutoModelForCausalLM.from_pretrained(
53
+ model_id,
54
+ torch_dtype=torch.bfloat16, # Use bfloat16 for optimal performance if supported
55
+ device_map="auto" # Automatically distribute model across available devices
56
+ )
57
+ model.eval() # Set model to evaluation mode
58
+
59
+ # Define your conversation messages
60
+ messages = [
61
+ {"role": "user", "content": "Explain multi-agent collaborative reasoning and its benefits."},
62
+ ]
63
+
64
+ # Apply chat template and tokenize inputs
65
+ text = tokenizer.apply_chat_template(
66
+ messages,
67
+ tokenize=False,
68
+ add_generation_prompt=True
69
+ )
70
+
71
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
72
+
73
+ # Generate response
74
+ generated_ids = model.generate(
75
+ model_inputs.input_ids,
76
+ max_new_tokens=256,
77
+ do_sample=True,
78
+ temperature=0.7,
79
+ top_p=0.9
80
+ )
81
+
82
+ # Decode and print the generated text
83
+ decoded_output = tokenizer.batch_decode(generated_ids[:, model_inputs.input_ids.shape[1]:], skip_special_tokens=True)[0]
84
+ print(decoded_output)
85
+ ```
86
+
87
+ ---
88
+
89
+ ## 🚀 Key Features
90
+
91
+ - 🧠 **Enhanced Collaborative Reasoning**
92
+ Trained on real multi-agent traces involving diverse roles like Expert Recruiter, Problem Solvers, and Evaluator.
93
+
94
+ - 🗣️ **Role-Aware Dialogue Generation**
95
+ Learns to reason and respond from different expert perspectives based on structured prompts.
96
+
97
+ - ⚙️ **Optimized for Multi-Agent Systems**
98
+ Performs well as a MAS agent with adaptive collaboration and token budgeting.
99
+
100
+ ---
101
+
102
+ ## 🏗️ Model Training
103
+
104
+ - **Base Model:** Qwen2.5-32B-Instruct
105
+ - **Dataset:** [M500](https://huggingface.co/datasets/Can111/M500) (500 curated multi-agent reasoning traces)
106
+ - **Objective:** Supervised Fine-Tuning (SFT) on role-conditioned prompts
107
+ - **Training Setup:**
108
+ - 8 × A100 GPUs
109
+ - 5 epochs
110
+ - Learning rate: 1e-5
111
+ - Frameworks: DeepSpeed, FlashAttention, LLaMA-Factory
112
+
113
+ ---
114
+
115
+ ## 📊 Performance
116
+
117
+ | **Model** | **General Understanding** | | **Mathematical Reasoning** | | **Coding** | |
118
+ |--------------------------|---------------------------|----------------|-----------------------------|------------|----------------|-----------|
119
+ | | **GPQA** | **Commongen** | **AIME2024** | **MATH-500** | **HumanEval** | **MBPP-S**|
120
+ | **Non-Reasoning Models** | | | | | | |
121
+ | Qwen2.5 | 50.2 | 96.7 | 21.1 | 84.4 | 89.0 | 80.2 |
122
+ | DeepSeek-V3 | **58.6** | **98.6** | **33.3** | **88.6** | 89.6 | 83.9 |
123
+ | GPT-4o | 49.2 | 97.8 | 7.8 | 81.3 | **90.9** | **85.4** |
124
+ | **Reasoning Models** | | | | | | |
125
+ | s1.1-32B | 58.3 | 94.1 | 53.3 | 90.6 | 82.3 | 77.4 |
126
+ | DeepSeek-R1 | **75.5** | 97.2 | 78.9 | **96.2** | **98.2** | 91.7 |
127
+ | o3-mini | 71.3 | **99.1** | **84.4** | 95.3 | 97.0 | **93.6** |
128
+ | M1-32B (Ours) | 61.1 | 96.9 | 60.0 | 95.1 | 92.8 | 89.1 |
129
+ | M1-32B w. CEO (Ours) | 62.1 | 97.4 | 62.2 | 95.8 | 93.9 | 90.5 |
130
+
131
+ **Table Caption:**
132
+ Performance comparison on general understanding, mathematical reasoning, and coding tasks using strong reasoning and non-reasoning models within the AgentVerse framework. Our method achieves substantial improvements over Qwen2.5 and s1.1-32B on all tasks, and attains performance comparable to o3-mini and DeepSeek-R1 on MATH-500 and MBPP-S, demonstrating its effectiveness in enhancing collaborative reasoning in MAS. Note that the results of s1.1-32B are obtained without using budget forcing.
133
+
134
+ ---
135
+
136
+ ## 💬 Intended Use
137
+
138
+ M1-32B is intended for research on Multi-agent reasoning and collaboration in MAS
139
+
140
+ ---
141
+
142
+ ## Citation
143
+
144
+ If you use this model, please cite the relevant papers:
145
+
146
+ ```bibtex
147
+ @article{jin2025two,
148
+ title={Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning},
149
+ author={Jin, Can and Peng, Hongwu and Zhang, Qixin and Tang, Yujin and Metaxas, Dimitris N and Che, Tong},
150
+ journal={arXiv preprint arXiv:2504.09772},
151
+ year={2025}
152
+ }
153
  ```