Spaces:

nazdridoy
/

inferoxy-hub

Running

App Files Files Community

nazdridoy commited on Aug 22

Commit

dd78fc1

verified ·

1 Parent(s): a6a8ac0

refactor(ui): separate model and provider selection

Browse files

- [docs] Update `README` for new model/provider format and examples (README.md)
- [refactor] Remove `parse_model_and_provider` import (chat_handler.py:17)
- [refactor] Add `provider_override` parameter to `chat_respond` (chat_handler.py:32)
- [refactor] Enforce explicit provider selection in `chat_respond` (chat_handler.py:57-58)
- [refactor] Update `handle_chat_submit` and `handle_chat_retry` signatures to include `provider` (chat_handler.py:179,214)
- [ui] Modify `chat_model_name` textbox placeholder and info (ui_components.py:47-48)
- [ui] Add `chat_provider` dropdown for explicit provider selection (ui_components.py:50-55)
- [ui] Update `chat_submit.click`, `chat_input.submit`, and `chatbot_display.retry` inputs with `chat_provider` (ui_components.py:85,98,122)
- [docs] Update `create_chat_tips` and `create_footer` markdown for provider selection (ui_components.py:137-159,663)
- [remove] Delete `parse_model_and_provider` function (utils.py:196-204)
- [refactor] Consolidate and expand provider lists into `PROVIDERS_UNIFIED` (utils.py:41-63)

Files changed (4) hide show

README.md +9 -10
chat_handler.py +10 -7
ui_components.py +17 -16
utils.py +22 -13

README.md CHANGED Viewed

@@ -70,7 +70,7 @@ The app requires:
 4. **Automatic Rotation**: HF-Inferoxy handles token rotation and error management
 ### Chat Assistant
-1. **Model Selection**: Choose any HuggingFace model with optional provider specification
 2. **Conversation**: Engage in natural conversations with streaming responses
 3. **Customization**: Adjust the AI's personality with system messages and parameters
@@ -123,16 +123,11 @@ The application automatically works with all Hugging Face inference providers:
 ### 💡 How It Works
-1. **Model Format**: Use `model_name` or `model_name:provider` format
-2. **Auto Provider**: When no provider is specified, HF-Inferoxy automatically selects the best available provider
 3. **Fallback System**: If one provider fails, the system automatically tries alternatives
 4. **Token Management**: HF-Inferoxy handles token rotation and quota management automatically
-**Examples:**
-- `openai/gpt-oss-20b` (auto provider selection)
-- `openai/gpt-oss-20b:fireworks-ai` (specific provider)
-- `Qwen/Qwen-Image:fal-ai` (image model with specific provider)
 ## 🎨 Usage Examples
 ### Chat Assistant
@@ -147,9 +142,11 @@ The application automatically works with all Hugging Face inference providers:
 ```
 # Auto provider (default - let HF choose best)
 Model Name: openai/gpt-oss-20b
 # Specific provider
-Model Name: openai/gpt-oss-20b:fireworks-ai
 System Message: You are a helpful coding assistant specializing in Python.
 ```
@@ -223,10 +220,12 @@ System Message: You are a helpful coding assistant specializing in Python.
 ```
 # Using auto provider (default)
 Model: openai/gpt-oss-20b
 Prompt: "Explain quantum computing in simple terms"
 # Using specific provider
-Model: openai/gpt-oss-20b:fireworks-ai
 Prompt: "Help me debug this Python code: [paste code]"
 # Other example prompts:

 4. **Automatic Rotation**: HF-Inferoxy handles token rotation and error management
 ### Chat Assistant
+1. **Model Selection**: Choose any HuggingFace model and select a provider from the dropdown (default: Auto)
 2. **Conversation**: Engage in natural conversations with streaming responses
 3. **Customization**: Adjust the AI's personality with system messages and parameters
 ### 💡 How It Works
+1. **Model Format**: Enter the model name only (e.g., `openai/gpt-oss-20b`)
+2. **Provider**: Select the provider from the dropdown (default: Auto)
 3. **Fallback System**: If one provider fails, the system automatically tries alternatives
 4. **Token Management**: HF-Inferoxy handles token rotation and quota management automatically
 ## 🎨 Usage Examples
 ### Chat Assistant
 ```
 # Auto provider (default - let HF choose best)
 Model Name: openai/gpt-oss-20b
+Provider: auto
 # Specific provider
+Model Name: openai/gpt-oss-20b
+Provider: fireworks-ai
 System Message: You are a helpful coding assistant specializing in Python.
 ```
 ```
 # Using auto provider (default)
 Model: openai/gpt-oss-20b
+Provider: auto
 Prompt: "Explain quantum computing in simple terms"
 # Using specific provider
+Model: openai/gpt-oss-20b
+Provider: fireworks-ai
 Prompt: "Help me debug this Python code: [paste code]"
 # Other example prompts:

chat_handler.py CHANGED Viewed

@@ -14,7 +14,6 @@ from requests.exceptions import ConnectionError, Timeout, RequestException
 from hf_token_utils import get_proxy_token, report_token_status
 from utils import (
     validate_proxy_key,
-    parse_model_and_provider,
     format_error_message,
     check_org_access,
     format_access_denied_message,
@@ -30,6 +29,7 @@ def chat_respond(
     history: list[dict[str, str]],
     system_message,
     model_name,
     max_tokens,
     temperature,
     top_p,
@@ -52,8 +52,9 @@ def chat_respond(
         token, token_id = get_proxy_token(api_key=proxy_api_key)
         print(f"✅ Chat: Got token: {token_id}")
-        # Parse model name and provider if specified
-        model, provider = parse_model_and_provider(model_name)
         print(f"🤖 Chat: Using model='{model}', provider='{provider if provider else 'auto'}'")
@@ -168,14 +169,14 @@ def chat_respond(
         yield format_error_message("Unexpected Error", f"An unexpected error occurred: {error_msg}")
-def handle_chat_submit(message, history, system_msg, model_name, max_tokens, temperature, top_p, hf_token: gr.OAuthToken = None):
     """
     Handle chat submission and manage conversation history with streaming.
     """
     if not message.strip():
         yield history, ""
         return
     # Enforce org-based access control via HF OAuth token
     access_token = getattr(hf_token, "token", None) if hf_token is not None else None
     is_allowed, access_msg, _username, _matched = check_org_access(access_token)
@@ -194,7 +195,8 @@ def handle_chat_submit(message, history, system_msg, model_name, max_tokens, tem
         message,
         history[:-1],  # Don't include the current message in history for the function
         system_msg,
-        model_name,
         max_tokens,
         temperature,
         top_p
@@ -209,7 +211,7 @@ def handle_chat_submit(message, history, system_msg, model_name, max_tokens, tem
         yield current_history, ""
-def handle_chat_retry(history, system_msg, model_name, max_tokens, temperature, top_p, hf_token: gr.OAuthToken = None, retry_data=None):
     """
     Retry the assistant response for the selected message.
     Works with gr.Chatbot.retry() which provides retry_data.index for the message.
@@ -268,6 +270,7 @@ def handle_chat_retry(history, system_msg, model_name, max_tokens, temperature,
         prior_history,
         system_msg,
         model_name,
         max_tokens,
         temperature,
         top_p

 from hf_token_utils import get_proxy_token, report_token_status
 from utils import (
     validate_proxy_key,
     format_error_message,
     check_org_access,
     format_access_denied_message,
     history: list[dict[str, str]],
     system_message,
     model_name,
+    provider_override,
     max_tokens,
     temperature,
     top_p,
         token, token_id = get_proxy_token(api_key=proxy_api_key)
         print(f"✅ Chat: Got token: {token_id}")
+        # Enforce explicit provider selection via dropdown
+        model = model_name
+        provider = provider_override or "auto"
         print(f"🤖 Chat: Using model='{model}', provider='{provider if provider else 'auto'}'")
         yield format_error_message("Unexpected Error", f"An unexpected error occurred: {error_msg}")
+def handle_chat_submit(message, history, system_msg, model_name, provider, max_tokens, temperature, top_p, hf_token: gr.OAuthToken = None):
     """
     Handle chat submission and manage conversation history with streaming.
     """
     if not message.strip():
         yield history, ""
         return
     # Enforce org-based access control via HF OAuth token
     access_token = getattr(hf_token, "token", None) if hf_token is not None else None
     is_allowed, access_msg, _username, _matched = check_org_access(access_token)
         message,
         history[:-1],  # Don't include the current message in history for the function
         system_msg,
+        model_name,
+        provider,
         max_tokens,
         temperature,
         top_p
         yield current_history, ""
+def handle_chat_retry(history, system_msg, model_name, provider, max_tokens, temperature, top_p, hf_token: gr.OAuthToken = None, retry_data=None):
     """
     Retry the assistant response for the selected message.
     Works with gr.Chatbot.retry() which provides retry_data.index for the message.
         prior_history,
         system_msg,
         model_name,
+        provider,
         max_tokens,
         temperature,
         top_p

ui_components.py CHANGED Viewed

@@ -44,7 +44,14 @@ def create_chat_tab(handle_chat_submit_fn, handle_chat_retry_fn=None):
                 chat_model_name = gr.Textbox(
                     value=DEFAULT_CHAT_MODEL,
                     label="Model Name",
-                    placeholder="e.g., openai/gpt-oss-20b or openai/gpt-oss-20b:fireworks-ai"
                 )
                 chat_system_message = gr.Textbox(
                     value=CHAT_CONFIG["system_message"],
@@ -82,7 +89,7 @@ def create_chat_tab(handle_chat_submit_fn, handle_chat_retry_fn=None):
         chat_send_event = chat_submit.click(
             fn=handle_chat_submit_fn,
             inputs=[chat_input, chatbot_display, chat_system_message, chat_model_name,
-                   chat_max_tokens, chat_temperature, chat_top_p],
             outputs=[chatbot_display, chat_input]
         )
@@ -97,7 +104,7 @@ def create_chat_tab(handle_chat_submit_fn, handle_chat_retry_fn=None):
         chat_enter_event = chat_input.submit(
             fn=handle_chat_submit_fn,
             inputs=[chat_input, chatbot_display, chat_system_message, chat_model_name,
-                   chat_max_tokens, chat_temperature, chat_top_p],
             outputs=[chatbot_display, chat_input]
         )
@@ -119,7 +126,7 @@ def create_chat_tab(handle_chat_submit_fn, handle_chat_retry_fn=None):
             chatbot_display.retry(
                 fn=handle_chat_retry_fn,
                 inputs=[chatbot_display, chat_system_message, chat_model_name,
-                        chat_max_tokens, chat_temperature, chat_top_p],
                 outputs=chatbot_display
             )
@@ -132,8 +139,8 @@ def create_chat_tips():
             ### 💡 Chat Tips
             **Model Format:**
-            - Single model: `openai/gpt-oss-20b` (uses auto provider)
-            - With provider: `openai/gpt-oss-20b:fireworks-ai`
             **Popular Models:**
             - `openai/gpt-oss-20b` - Fast general purpose
@@ -146,16 +153,10 @@ def create_chat_tips():
             gr.Markdown("""
             ### 🚀 Popular Providers
-            - **auto** - Let HF choose best provider (default)
-            - **fireworks-ai** - Fast and reliable
-            - **cerebras** - High performance
-            - **groq** - Ultra-fast inference
-            - **together** - Wide model support
-            - **cohere** - Advanced language models
-            **Examples:**
-            - `openai/gpt-oss-20b` (auto provider)
-            - `openai/gpt-oss-20b:fireworks-ai` (specific provider)
             """)
@@ -662,7 +663,7 @@ def create_footer():
     **Chat Tab:**
     - Enter your message and customize the AI's behavior with system messages
-    - Choose models and providers using the format `model:provider`
     - Adjust temperature for creativity and top-p for response diversity
     **Image Tab:**

                 chat_model_name = gr.Textbox(
                     value=DEFAULT_CHAT_MODEL,
                     label="Model Name",
+                    placeholder="e.g., openai/gpt-oss-20b (provider via dropdown)",
+                    info="Do not include :provider in model name"
+                )
+                chat_provider = gr.Dropdown(
+                    choices=IMAGE_PROVIDERS,
+                    value="auto",
+                    label="Provider",
+                    interactive=True
                 )
                 chat_system_message = gr.Textbox(
                     value=CHAT_CONFIG["system_message"],
         chat_send_event = chat_submit.click(
             fn=handle_chat_submit_fn,
             inputs=[chat_input, chatbot_display, chat_system_message, chat_model_name,
+                   chat_provider, chat_max_tokens, chat_temperature, chat_top_p],
             outputs=[chatbot_display, chat_input]
         )
         chat_enter_event = chat_input.submit(
             fn=handle_chat_submit_fn,
             inputs=[chat_input, chatbot_display, chat_system_message, chat_model_name,
+                   chat_provider, chat_max_tokens, chat_temperature, chat_top_p],
             outputs=[chatbot_display, chat_input]
         )
             chatbot_display.retry(
                 fn=handle_chat_retry_fn,
                 inputs=[chatbot_display, chat_system_message, chat_model_name,
+                        chat_provider, chat_max_tokens, chat_temperature, chat_top_p],
                 outputs=chatbot_display
             )
             ### 💡 Chat Tips
             **Model Format:**
+            - Model only: `openai/gpt-oss-20b`
+            - Select provider via the Provider dropdown (default: `auto`)
             **Popular Models:**
             - `openai/gpt-oss-20b` - Fast general purpose
             gr.Markdown("""
             ### 🚀 Popular Providers
+            - Select from dropdown. Default is **auto**.
+            **Example:**
+            - Model: `openai/gpt-oss-20b`, Provider: `groq`
             """)
     **Chat Tab:**
     - Enter your message and customize the AI's behavior with system messages
+    - Enter model and select provider from the dropdown (default: `auto`)
     - Adjust temperature for creativity and top-p for response diversity
     **Image Tab:**

utils.py CHANGED Viewed

@@ -35,9 +35,28 @@ IMAGE_CONFIG = {
     "negative_prompt": "blurry, low quality, distorted, deformed, ugly, bad anatomy"
 }
-# Supported providers
-CHAT_PROVIDERS = ["auto", "fireworks-ai", "cerebras", "groq", "together", "cohere"]
-IMAGE_PROVIDERS = ["hf-inference", "fal-ai", "nebius", "nscale", "replicate", "together"]
 # Popular models for quick access
 POPULAR_CHAT_MODELS = [
@@ -196,16 +215,6 @@ def validate_proxy_url():
     return True, ""
-def parse_model_and_provider(model_name):
-    """
-    Parse model name and provider from a string like 'model:provider'.
-    Returns (model, provider) tuple. Provider is None if not specified.
-    """
-    if ":" in model_name:
-        model, provider = model_name.split(":", 1)
-        return model, provider
-    else:
-        return model_name, None
 def format_error_message(error_type, error_message):

     "negative_prompt": "blurry, low quality, distorted, deformed, ugly, bad anatomy"
 }
+# Supported providers (unified across tasks)
+PROVIDERS_UNIFIED = [
+    "auto",
+    "cerebras",
+    "cohere",
+    "fal-ai",
+    "featherless-ai",
+    "fireworks-ai",
+    "groq",
+    "hf-inference",
+    "hyperbolic",
+    "nebius",
+    "novita",
+    "nscale",
+    "replicate",
+    "sambanova",
+    "together",
+]
+# Backwards compatibility exported lists
+CHAT_PROVIDERS = PROVIDERS_UNIFIED
+IMAGE_PROVIDERS = PROVIDERS_UNIFIED
 # Popular models for quick access
 POPULAR_CHAT_MODELS = [
     return True, ""
 def format_error_message(error_type, error_message):