Image-Text-to-Text
Transformers
Safetensors
qwen2
text-generation
conversational
text-generation-inference
nielsr HF Staff commited on
Commit
4a51188
·
verified ·
1 Parent(s): a1cff6a

Improve model card: Add pipeline tag, library name, GitHub link, and sample usage

Browse files

This PR enhances the model card by:
- Adding the `pipeline_tag: image-text-to-text` to accurately describe the model's functionality (generating text/code from visual feedback and text instructions).
- Specifying `library_name: transformers` based on the `config.json` showing `Qwen2ForCausalLM` and `qwen2` model type, indicating compatibility with the Hugging Face Transformers library.
- Adding a direct link to the GitHub repository: https://github.com/mnluzimu/WebGen-Agent.
- Including a "Sample Usage" section with a `bash` code snippet for single inference, directly extracted from the project's GitHub README, to guide users on how to run the model.

Please review these additions and improvements.

Files changed (1) hide show
  1. README.md +36 -12
README.md CHANGED
@@ -1,16 +1,20 @@
1
  ---
2
- license: mit
 
3
  datasets:
4
  - luzimu/webgen-agent_train_step-grpo
5
  - luzimu/webgen-agent_train_sft
6
- base_model:
7
- - Qwen/Qwen2.5-Coder-7B-Instruct
 
8
  ---
9
 
10
  # WebGen-Agent
11
 
12
  WebGen-Agent is an advanced website generation agent designed to autonomously create websites from natural language instructions. It was introduced in the paper [WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning](https://arxiv.org/pdf/2509.22644v1).
13
 
 
 
14
  ## Project Overview
15
 
16
  WebGen-Agent combines state-of-the-art language models with specialized training techniques to create a powerful website generation tool. The agent can understand natural language instructions specifying appearance and functional requirements, iteratively generate website codebases, and refine them using visual and functional feedback.
@@ -35,26 +39,46 @@ Links to the data and model parameters are as follows:
35
 
36
  WebGen-Agent follows an iterative, multi-step paradigm for website generation:
37
 
38
- 1. **Code Generation**: The agent generates code to create or edit website files based on natural language instructions
39
- 2. **Code Execution**: Dependencies are installed and the website service is started
40
- 3. **Feedback Gathering**:
41
- - A screenshot of the website is captured
42
- - A Visual Language Model (VLM) provides appearance feedback and scores
43
- - A GUI-agent tests the website functionality and provides functional feedback
44
- 4. **Refinement**: Based on the feedback, the agent continues to improve the website until it meets requirements
45
 
46
  ![WebGen-Agent Workflow](fig/webgen-agent.png)
47
 
48
  ## Step-GRPO with Screenshot and GUI-agent Feedback
49
 
50
  The Step-GRPO with Screenshot and GUI-agent Feedback approach uses the screenshot and GUI-agent scores inherently produced in the WebGen-Agent workflow as step-level rewards:
51
- - **Screenshot Score**: Quantifies the visual appeal and aesthetics of the website
52
- - **GUI-agent Score**: Measures how well the website meets functional requirements
53
 
54
  These dual rewards provide dense, reliable process supervision that significantly improves the model's ability to generate high-quality websites.
55
 
56
  ![Step-GRPO with Screenshot and GUI-agent Feedback](fig/step-grpo.png)
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ## Citation
59
 
60
  If you find our project useful, please cite:
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-Coder-7B-Instruct
4
  datasets:
5
  - luzimu/webgen-agent_train_step-grpo
6
  - luzimu/webgen-agent_train_sft
7
+ license: mit
8
+ pipeline_tag: image-text-to-text
9
+ library_name: transformers
10
  ---
11
 
12
  # WebGen-Agent
13
 
14
  WebGen-Agent is an advanced website generation agent designed to autonomously create websites from natural language instructions. It was introduced in the paper [WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning](https://arxiv.org/pdf/2509.22644v1).
15
 
16
+ Code: https://github.com/mnluzimu/WebGen-Agent
17
+
18
  ## Project Overview
19
 
20
  WebGen-Agent combines state-of-the-art language models with specialized training techniques to create a powerful website generation tool. The agent can understand natural language instructions specifying appearance and functional requirements, iteratively generate website codebases, and refine them using visual and functional feedback.
 
39
 
40
  WebGen-Agent follows an iterative, multi-step paradigm for website generation:
41
 
42
+ 1. **Code Generation**: The agent generates code to create or edit website files based on natural language instructions
43
+ 2. **Code Execution**: Dependencies are installed and the website service is started
44
+ 3. **Feedback Gathering**:
45
+ - A screenshot of the website is captured
46
+ - A Visual Language Model (VLM) provides appearance feedback and scores
47
+ - A GUI-agent tests the website functionality and provides functional feedback
48
+ 4. **Refinement**: Based on the feedback, the agent continues to improve the website until it meets requirements
49
 
50
  ![WebGen-Agent Workflow](fig/webgen-agent.png)
51
 
52
  ## Step-GRPO with Screenshot and GUI-agent Feedback
53
 
54
  The Step-GRPO with Screenshot and GUI-agent Feedback approach uses the screenshot and GUI-agent scores inherently produced in the WebGen-Agent workflow as step-level rewards:
55
+ - **Screenshot Score**: Quantifies the visual appeal and aesthetics of the website
56
+ - **GUI-agent Score**: Measures how well the website meets functional requirements
57
 
58
  These dual rewards provide dense, reliable process supervision that significantly improves the model's ability to generate high-quality websites.
59
 
60
  ![Step-GRPO with Screenshot and GUI-agent Feedback](fig/step-grpo.png)
61
 
62
+ ## Sample Usage
63
+
64
+ Before running inference, you should rename `.env.template` to `.env` and set the base urls and api keys for the agent-engine LLM and feedback VLM. They can be obtained from any openai-compatible providers such as [openrouter](https://openrouter.ai/), [modelscope](https://www.modelscope.cn/my/overview), [bailian](https://bailian.console.aliyun.com/#/home), and [llmprovider](https://llmprovider.ai/).
65
+
66
+ You can also deploy open-source VLMs and LLMs by running `src/scripts/deploy_qwenvl_32b.sh` and `src/scripts/deploy.sh`. Scripts for single inference and batch inference can be found at `src/scripts/infer_single.sh` and `src/scripts/infer_batch.sh`.
67
+
68
+ ### Single Inference
69
+
70
+ ```bash
71
+ python src/infer_single.py \
72
+ --model deepseek-chat \
73
+ --vlm_model Qwen/Qwen2.5-VL-32B-Instruct \
74
+ --instruction "Please implement a wheel of fortune website." \
75
+ --workspace-dir workspaces_root/test \
76
+ --log-dir service_logs/test \
77
+ --max-iter 20 \
78
+ --overwrite \
79
+ --error-limit 5
80
+ ```
81
+
82
  ## Citation
83
 
84
  If you find our project useful, please cite: