Update app.py
Browse files
app.py
CHANGED
|
@@ -463,11 +463,11 @@ def main():
|
|
| 463 |
|
| 464 |
**🚨 Performance Warning**
|
| 465 |
|
| 466 |
-
This demo is running on **CPU-only** mode. A single inference may take **
|
| 467 |
|
| 468 |
**Recommendations for faster inference:**
|
| 469 |
-
- Use smaller models (Libra-v1.0-3B is faster than 7B models) The model has already been loaded ⏬
|
| 470 |
-
- Please do not attempt to load other models, as this may cause a runtime error
|
| 471 |
- Reduce `Max New Tokens` to 64-128 (default: 128)
|
| 472 |
- Disable baseline comparison
|
| 473 |
- For GPU acceleration, please [run the demo locally](https://github.com/X-iZhang/CCD#gradio-web-interface)
|
|
|
|
| 463 |
|
| 464 |
**🚨 Performance Warning**
|
| 465 |
|
| 466 |
+
This demo is running on **CPU-only** mode. A single inference may take **25-30 minutes** depending on the model and parameters.
|
| 467 |
|
| 468 |
**Recommendations for faster inference:**
|
| 469 |
+
- Use smaller models (Libra-v1.0-3B is faster than 7B models) **The model has already been loaded** ⏬
|
| 470 |
+
- Please do not attempt to load other models, as this may cause a **runtime error**: "Workload evicted, storage limit exceeded (50G)"
|
| 471 |
- Reduce `Max New Tokens` to 64-128 (default: 128)
|
| 472 |
- Disable baseline comparison
|
| 473 |
- For GPU acceleration, please [run the demo locally](https://github.com/X-iZhang/CCD#gradio-web-interface)
|