Spaces:
Running
refactor(ui): separate model and provider selection
Browse files- [docs] Update `README` for new model/provider format and examples (README.md)
- [refactor] Remove `parse_model_and_provider` import (chat_handler.py:17)
- [refactor] Add `provider_override` parameter to `chat_respond` (chat_handler.py:32)
- [refactor] Enforce explicit provider selection in `chat_respond` (chat_handler.py:57-58)
- [refactor] Update `handle_chat_submit` and `handle_chat_retry` signatures to include `provider` (chat_handler.py:179,214)
- [ui] Modify `chat_model_name` textbox placeholder and info (ui_components.py:47-48)
- [ui] Add `chat_provider` dropdown for explicit provider selection (ui_components.py:50-55)
- [ui] Update `chat_submit.click`, `chat_input.submit`, and `chatbot_display.retry` inputs with `chat_provider` (ui_components.py:85,98,122)
- [docs] Update `create_chat_tips` and `create_footer` markdown for provider selection (ui_components.py:137-159,663)
- [remove] Delete `parse_model_and_provider` function (utils.py:196-204)
- [refactor] Consolidate and expand provider lists into `PROVIDERS_UNIFIED` (utils.py:41-63)
- README.md +9 -10
- chat_handler.py +10 -7
- ui_components.py +17 -16
- utils.py +22 -13
|
@@ -70,7 +70,7 @@ The app requires:
|
|
| 70 |
4. **Automatic Rotation**: HF-Inferoxy handles token rotation and error management
|
| 71 |
|
| 72 |
### Chat Assistant
|
| 73 |
-
1. **Model Selection**: Choose any HuggingFace model
|
| 74 |
2. **Conversation**: Engage in natural conversations with streaming responses
|
| 75 |
3. **Customization**: Adjust the AI's personality with system messages and parameters
|
| 76 |
|
|
@@ -123,16 +123,11 @@ The application automatically works with all Hugging Face inference providers:
|
|
| 123 |
|
| 124 |
### π‘ How It Works
|
| 125 |
|
| 126 |
-
1. **Model Format**:
|
| 127 |
-
2. **
|
| 128 |
3. **Fallback System**: If one provider fails, the system automatically tries alternatives
|
| 129 |
4. **Token Management**: HF-Inferoxy handles token rotation and quota management automatically
|
| 130 |
|
| 131 |
-
**Examples:**
|
| 132 |
-
- `openai/gpt-oss-20b` (auto provider selection)
|
| 133 |
-
- `openai/gpt-oss-20b:fireworks-ai` (specific provider)
|
| 134 |
-
- `Qwen/Qwen-Image:fal-ai` (image model with specific provider)
|
| 135 |
-
|
| 136 |
## π¨ Usage Examples
|
| 137 |
|
| 138 |
### Chat Assistant
|
|
@@ -147,9 +142,11 @@ The application automatically works with all Hugging Face inference providers:
|
|
| 147 |
```
|
| 148 |
# Auto provider (default - let HF choose best)
|
| 149 |
Model Name: openai/gpt-oss-20b
|
|
|
|
| 150 |
|
| 151 |
# Specific provider
|
| 152 |
-
Model Name: openai/gpt-oss-20b
|
|
|
|
| 153 |
System Message: You are a helpful coding assistant specializing in Python.
|
| 154 |
```
|
| 155 |
|
|
@@ -223,10 +220,12 @@ System Message: You are a helpful coding assistant specializing in Python.
|
|
| 223 |
```
|
| 224 |
# Using auto provider (default)
|
| 225 |
Model: openai/gpt-oss-20b
|
|
|
|
| 226 |
Prompt: "Explain quantum computing in simple terms"
|
| 227 |
|
| 228 |
# Using specific provider
|
| 229 |
-
Model: openai/gpt-oss-20b
|
|
|
|
| 230 |
Prompt: "Help me debug this Python code: [paste code]"
|
| 231 |
|
| 232 |
# Other example prompts:
|
|
|
|
| 70 |
4. **Automatic Rotation**: HF-Inferoxy handles token rotation and error management
|
| 71 |
|
| 72 |
### Chat Assistant
|
| 73 |
+
1. **Model Selection**: Choose any HuggingFace model and select a provider from the dropdown (default: Auto)
|
| 74 |
2. **Conversation**: Engage in natural conversations with streaming responses
|
| 75 |
3. **Customization**: Adjust the AI's personality with system messages and parameters
|
| 76 |
|
|
|
|
| 123 |
|
| 124 |
### π‘ How It Works
|
| 125 |
|
| 126 |
+
1. **Model Format**: Enter the model name only (e.g., `openai/gpt-oss-20b`)
|
| 127 |
+
2. **Provider**: Select the provider from the dropdown (default: Auto)
|
| 128 |
3. **Fallback System**: If one provider fails, the system automatically tries alternatives
|
| 129 |
4. **Token Management**: HF-Inferoxy handles token rotation and quota management automatically
|
| 130 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
## π¨ Usage Examples
|
| 132 |
|
| 133 |
### Chat Assistant
|
|
|
|
| 142 |
```
|
| 143 |
# Auto provider (default - let HF choose best)
|
| 144 |
Model Name: openai/gpt-oss-20b
|
| 145 |
+
Provider: auto
|
| 146 |
|
| 147 |
# Specific provider
|
| 148 |
+
Model Name: openai/gpt-oss-20b
|
| 149 |
+
Provider: fireworks-ai
|
| 150 |
System Message: You are a helpful coding assistant specializing in Python.
|
| 151 |
```
|
| 152 |
|
|
|
|
| 220 |
```
|
| 221 |
# Using auto provider (default)
|
| 222 |
Model: openai/gpt-oss-20b
|
| 223 |
+
Provider: auto
|
| 224 |
Prompt: "Explain quantum computing in simple terms"
|
| 225 |
|
| 226 |
# Using specific provider
|
| 227 |
+
Model: openai/gpt-oss-20b
|
| 228 |
+
Provider: fireworks-ai
|
| 229 |
Prompt: "Help me debug this Python code: [paste code]"
|
| 230 |
|
| 231 |
# Other example prompts:
|
|
@@ -14,7 +14,6 @@ from requests.exceptions import ConnectionError, Timeout, RequestException
|
|
| 14 |
from hf_token_utils import get_proxy_token, report_token_status
|
| 15 |
from utils import (
|
| 16 |
validate_proxy_key,
|
| 17 |
-
parse_model_and_provider,
|
| 18 |
format_error_message,
|
| 19 |
check_org_access,
|
| 20 |
format_access_denied_message,
|
|
@@ -30,6 +29,7 @@ def chat_respond(
|
|
| 30 |
history: list[dict[str, str]],
|
| 31 |
system_message,
|
| 32 |
model_name,
|
|
|
|
| 33 |
max_tokens,
|
| 34 |
temperature,
|
| 35 |
top_p,
|
|
@@ -52,8 +52,9 @@ def chat_respond(
|
|
| 52 |
token, token_id = get_proxy_token(api_key=proxy_api_key)
|
| 53 |
print(f"β
Chat: Got token: {token_id}")
|
| 54 |
|
| 55 |
-
#
|
| 56 |
-
model
|
|
|
|
| 57 |
|
| 58 |
print(f"π€ Chat: Using model='{model}', provider='{provider if provider else 'auto'}'")
|
| 59 |
|
|
@@ -168,14 +169,14 @@ def chat_respond(
|
|
| 168 |
yield format_error_message("Unexpected Error", f"An unexpected error occurred: {error_msg}")
|
| 169 |
|
| 170 |
|
| 171 |
-
def handle_chat_submit(message, history, system_msg, model_name, max_tokens, temperature, top_p, hf_token: gr.OAuthToken = None):
|
| 172 |
"""
|
| 173 |
Handle chat submission and manage conversation history with streaming.
|
| 174 |
"""
|
| 175 |
if not message.strip():
|
| 176 |
yield history, ""
|
| 177 |
return
|
| 178 |
-
|
| 179 |
# Enforce org-based access control via HF OAuth token
|
| 180 |
access_token = getattr(hf_token, "token", None) if hf_token is not None else None
|
| 181 |
is_allowed, access_msg, _username, _matched = check_org_access(access_token)
|
|
@@ -194,7 +195,8 @@ def handle_chat_submit(message, history, system_msg, model_name, max_tokens, tem
|
|
| 194 |
message,
|
| 195 |
history[:-1], # Don't include the current message in history for the function
|
| 196 |
system_msg,
|
| 197 |
-
model_name,
|
|
|
|
| 198 |
max_tokens,
|
| 199 |
temperature,
|
| 200 |
top_p
|
|
@@ -209,7 +211,7 @@ def handle_chat_submit(message, history, system_msg, model_name, max_tokens, tem
|
|
| 209 |
yield current_history, ""
|
| 210 |
|
| 211 |
|
| 212 |
-
def handle_chat_retry(history, system_msg, model_name, max_tokens, temperature, top_p, hf_token: gr.OAuthToken = None, retry_data=None):
|
| 213 |
"""
|
| 214 |
Retry the assistant response for the selected message.
|
| 215 |
Works with gr.Chatbot.retry() which provides retry_data.index for the message.
|
|
@@ -268,6 +270,7 @@ def handle_chat_retry(history, system_msg, model_name, max_tokens, temperature,
|
|
| 268 |
prior_history,
|
| 269 |
system_msg,
|
| 270 |
model_name,
|
|
|
|
| 271 |
max_tokens,
|
| 272 |
temperature,
|
| 273 |
top_p
|
|
|
|
| 14 |
from hf_token_utils import get_proxy_token, report_token_status
|
| 15 |
from utils import (
|
| 16 |
validate_proxy_key,
|
|
|
|
| 17 |
format_error_message,
|
| 18 |
check_org_access,
|
| 19 |
format_access_denied_message,
|
|
|
|
| 29 |
history: list[dict[str, str]],
|
| 30 |
system_message,
|
| 31 |
model_name,
|
| 32 |
+
provider_override,
|
| 33 |
max_tokens,
|
| 34 |
temperature,
|
| 35 |
top_p,
|
|
|
|
| 52 |
token, token_id = get_proxy_token(api_key=proxy_api_key)
|
| 53 |
print(f"β
Chat: Got token: {token_id}")
|
| 54 |
|
| 55 |
+
# Enforce explicit provider selection via dropdown
|
| 56 |
+
model = model_name
|
| 57 |
+
provider = provider_override or "auto"
|
| 58 |
|
| 59 |
print(f"π€ Chat: Using model='{model}', provider='{provider if provider else 'auto'}'")
|
| 60 |
|
|
|
|
| 169 |
yield format_error_message("Unexpected Error", f"An unexpected error occurred: {error_msg}")
|
| 170 |
|
| 171 |
|
| 172 |
+
def handle_chat_submit(message, history, system_msg, model_name, provider, max_tokens, temperature, top_p, hf_token: gr.OAuthToken = None):
|
| 173 |
"""
|
| 174 |
Handle chat submission and manage conversation history with streaming.
|
| 175 |
"""
|
| 176 |
if not message.strip():
|
| 177 |
yield history, ""
|
| 178 |
return
|
| 179 |
+
|
| 180 |
# Enforce org-based access control via HF OAuth token
|
| 181 |
access_token = getattr(hf_token, "token", None) if hf_token is not None else None
|
| 182 |
is_allowed, access_msg, _username, _matched = check_org_access(access_token)
|
|
|
|
| 195 |
message,
|
| 196 |
history[:-1], # Don't include the current message in history for the function
|
| 197 |
system_msg,
|
| 198 |
+
model_name,
|
| 199 |
+
provider,
|
| 200 |
max_tokens,
|
| 201 |
temperature,
|
| 202 |
top_p
|
|
|
|
| 211 |
yield current_history, ""
|
| 212 |
|
| 213 |
|
| 214 |
+
def handle_chat_retry(history, system_msg, model_name, provider, max_tokens, temperature, top_p, hf_token: gr.OAuthToken = None, retry_data=None):
|
| 215 |
"""
|
| 216 |
Retry the assistant response for the selected message.
|
| 217 |
Works with gr.Chatbot.retry() which provides retry_data.index for the message.
|
|
|
|
| 270 |
prior_history,
|
| 271 |
system_msg,
|
| 272 |
model_name,
|
| 273 |
+
provider,
|
| 274 |
max_tokens,
|
| 275 |
temperature,
|
| 276 |
top_p
|
|
@@ -44,7 +44,14 @@ def create_chat_tab(handle_chat_submit_fn, handle_chat_retry_fn=None):
|
|
| 44 |
chat_model_name = gr.Textbox(
|
| 45 |
value=DEFAULT_CHAT_MODEL,
|
| 46 |
label="Model Name",
|
| 47 |
-
placeholder="e.g., openai/gpt-oss-20b
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
)
|
| 49 |
chat_system_message = gr.Textbox(
|
| 50 |
value=CHAT_CONFIG["system_message"],
|
|
@@ -82,7 +89,7 @@ def create_chat_tab(handle_chat_submit_fn, handle_chat_retry_fn=None):
|
|
| 82 |
chat_send_event = chat_submit.click(
|
| 83 |
fn=handle_chat_submit_fn,
|
| 84 |
inputs=[chat_input, chatbot_display, chat_system_message, chat_model_name,
|
| 85 |
-
chat_max_tokens, chat_temperature, chat_top_p],
|
| 86 |
outputs=[chatbot_display, chat_input]
|
| 87 |
)
|
| 88 |
|
|
@@ -97,7 +104,7 @@ def create_chat_tab(handle_chat_submit_fn, handle_chat_retry_fn=None):
|
|
| 97 |
chat_enter_event = chat_input.submit(
|
| 98 |
fn=handle_chat_submit_fn,
|
| 99 |
inputs=[chat_input, chatbot_display, chat_system_message, chat_model_name,
|
| 100 |
-
chat_max_tokens, chat_temperature, chat_top_p],
|
| 101 |
outputs=[chatbot_display, chat_input]
|
| 102 |
)
|
| 103 |
|
|
@@ -119,7 +126,7 @@ def create_chat_tab(handle_chat_submit_fn, handle_chat_retry_fn=None):
|
|
| 119 |
chatbot_display.retry(
|
| 120 |
fn=handle_chat_retry_fn,
|
| 121 |
inputs=[chatbot_display, chat_system_message, chat_model_name,
|
| 122 |
-
chat_max_tokens, chat_temperature, chat_top_p],
|
| 123 |
outputs=chatbot_display
|
| 124 |
)
|
| 125 |
|
|
@@ -132,8 +139,8 @@ def create_chat_tips():
|
|
| 132 |
### π‘ Chat Tips
|
| 133 |
|
| 134 |
**Model Format:**
|
| 135 |
-
-
|
| 136 |
-
-
|
| 137 |
|
| 138 |
**Popular Models:**
|
| 139 |
- `openai/gpt-oss-20b` - Fast general purpose
|
|
@@ -146,16 +153,10 @@ def create_chat_tips():
|
|
| 146 |
gr.Markdown("""
|
| 147 |
### π Popular Providers
|
| 148 |
|
| 149 |
-
-
|
| 150 |
-
- **fireworks-ai** - Fast and reliable
|
| 151 |
-
- **cerebras** - High performance
|
| 152 |
-
- **groq** - Ultra-fast inference
|
| 153 |
-
- **together** - Wide model support
|
| 154 |
-
- **cohere** - Advanced language models
|
| 155 |
|
| 156 |
-
**
|
| 157 |
-
- `openai/gpt-oss-20b
|
| 158 |
-
- `openai/gpt-oss-20b:fireworks-ai` (specific provider)
|
| 159 |
""")
|
| 160 |
|
| 161 |
|
|
@@ -662,7 +663,7 @@ def create_footer():
|
|
| 662 |
|
| 663 |
**Chat Tab:**
|
| 664 |
- Enter your message and customize the AI's behavior with system messages
|
| 665 |
-
-
|
| 666 |
- Adjust temperature for creativity and top-p for response diversity
|
| 667 |
|
| 668 |
**Image Tab:**
|
|
|
|
| 44 |
chat_model_name = gr.Textbox(
|
| 45 |
value=DEFAULT_CHAT_MODEL,
|
| 46 |
label="Model Name",
|
| 47 |
+
placeholder="e.g., openai/gpt-oss-20b (provider via dropdown)",
|
| 48 |
+
info="Do not include :provider in model name"
|
| 49 |
+
)
|
| 50 |
+
chat_provider = gr.Dropdown(
|
| 51 |
+
choices=IMAGE_PROVIDERS,
|
| 52 |
+
value="auto",
|
| 53 |
+
label="Provider",
|
| 54 |
+
interactive=True
|
| 55 |
)
|
| 56 |
chat_system_message = gr.Textbox(
|
| 57 |
value=CHAT_CONFIG["system_message"],
|
|
|
|
| 89 |
chat_send_event = chat_submit.click(
|
| 90 |
fn=handle_chat_submit_fn,
|
| 91 |
inputs=[chat_input, chatbot_display, chat_system_message, chat_model_name,
|
| 92 |
+
chat_provider, chat_max_tokens, chat_temperature, chat_top_p],
|
| 93 |
outputs=[chatbot_display, chat_input]
|
| 94 |
)
|
| 95 |
|
|
|
|
| 104 |
chat_enter_event = chat_input.submit(
|
| 105 |
fn=handle_chat_submit_fn,
|
| 106 |
inputs=[chat_input, chatbot_display, chat_system_message, chat_model_name,
|
| 107 |
+
chat_provider, chat_max_tokens, chat_temperature, chat_top_p],
|
| 108 |
outputs=[chatbot_display, chat_input]
|
| 109 |
)
|
| 110 |
|
|
|
|
| 126 |
chatbot_display.retry(
|
| 127 |
fn=handle_chat_retry_fn,
|
| 128 |
inputs=[chatbot_display, chat_system_message, chat_model_name,
|
| 129 |
+
chat_provider, chat_max_tokens, chat_temperature, chat_top_p],
|
| 130 |
outputs=chatbot_display
|
| 131 |
)
|
| 132 |
|
|
|
|
| 139 |
### π‘ Chat Tips
|
| 140 |
|
| 141 |
**Model Format:**
|
| 142 |
+
- Model only: `openai/gpt-oss-20b`
|
| 143 |
+
- Select provider via the Provider dropdown (default: `auto`)
|
| 144 |
|
| 145 |
**Popular Models:**
|
| 146 |
- `openai/gpt-oss-20b` - Fast general purpose
|
|
|
|
| 153 |
gr.Markdown("""
|
| 154 |
### π Popular Providers
|
| 155 |
|
| 156 |
+
- Select from dropdown. Default is **auto**.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
|
| 158 |
+
**Example:**
|
| 159 |
+
- Model: `openai/gpt-oss-20b`, Provider: `groq`
|
|
|
|
| 160 |
""")
|
| 161 |
|
| 162 |
|
|
|
|
| 663 |
|
| 664 |
**Chat Tab:**
|
| 665 |
- Enter your message and customize the AI's behavior with system messages
|
| 666 |
+
- Enter model and select provider from the dropdown (default: `auto`)
|
| 667 |
- Adjust temperature for creativity and top-p for response diversity
|
| 668 |
|
| 669 |
**Image Tab:**
|
|
@@ -35,9 +35,28 @@ IMAGE_CONFIG = {
|
|
| 35 |
"negative_prompt": "blurry, low quality, distorted, deformed, ugly, bad anatomy"
|
| 36 |
}
|
| 37 |
|
| 38 |
-
# Supported providers
|
| 39 |
-
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
# Popular models for quick access
|
| 43 |
POPULAR_CHAT_MODELS = [
|
|
@@ -196,16 +215,6 @@ def validate_proxy_url():
|
|
| 196 |
return True, ""
|
| 197 |
|
| 198 |
|
| 199 |
-
def parse_model_and_provider(model_name):
|
| 200 |
-
"""
|
| 201 |
-
Parse model name and provider from a string like 'model:provider'.
|
| 202 |
-
Returns (model, provider) tuple. Provider is None if not specified.
|
| 203 |
-
"""
|
| 204 |
-
if ":" in model_name:
|
| 205 |
-
model, provider = model_name.split(":", 1)
|
| 206 |
-
return model, provider
|
| 207 |
-
else:
|
| 208 |
-
return model_name, None
|
| 209 |
|
| 210 |
|
| 211 |
def format_error_message(error_type, error_message):
|
|
|
|
| 35 |
"negative_prompt": "blurry, low quality, distorted, deformed, ugly, bad anatomy"
|
| 36 |
}
|
| 37 |
|
| 38 |
+
# Supported providers (unified across tasks)
|
| 39 |
+
PROVIDERS_UNIFIED = [
|
| 40 |
+
"auto",
|
| 41 |
+
"cerebras",
|
| 42 |
+
"cohere",
|
| 43 |
+
"fal-ai",
|
| 44 |
+
"featherless-ai",
|
| 45 |
+
"fireworks-ai",
|
| 46 |
+
"groq",
|
| 47 |
+
"hf-inference",
|
| 48 |
+
"hyperbolic",
|
| 49 |
+
"nebius",
|
| 50 |
+
"novita",
|
| 51 |
+
"nscale",
|
| 52 |
+
"replicate",
|
| 53 |
+
"sambanova",
|
| 54 |
+
"together",
|
| 55 |
+
]
|
| 56 |
+
|
| 57 |
+
# Backwards compatibility exported lists
|
| 58 |
+
CHAT_PROVIDERS = PROVIDERS_UNIFIED
|
| 59 |
+
IMAGE_PROVIDERS = PROVIDERS_UNIFIED
|
| 60 |
|
| 61 |
# Popular models for quick access
|
| 62 |
POPULAR_CHAT_MODELS = [
|
|
|
|
| 215 |
return True, ""
|
| 216 |
|
| 217 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 218 |
|
| 219 |
|
| 220 |
def format_error_message(error_type, error_message):
|