Spaces:

Agents-MCP-Hackathon
/

Galita

Runtime error

App Files Files Community

mokrane25 commited on Jun 11

Commit

c9ed76f

verified ·

1 Parent(s): 1f848f3

Update README.md

Browse files

Files changed (1) hide show

README.md +322 -0

README.md CHANGED Viewed

@@ -10,5 +10,327 @@ pinned: false
 license: mit
 short_description: GALITA is a self-evolving generalist AI agent
 ---
 An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

 license: mit
 short_description: GALITA is a self-evolving generalist AI agent
 ---
+Made by:
+Mohammed Dahbani
+Anas Ezzakri
+Adam Lagssaibi
+Mouhcine Zahdi
+Mahmoud Mokrane
+# Gradio-hackathon : Generalist self-evolving ai agent inspired by Alita
+This is my team project for the gradio hackathon 2025
+This Project is inspired by research paper : `https://arxiv.org/abs/2505.20286`
+# 📁 Structure du projet
+```bash
+alita_agent/
+│
+├── main.py                           # Point d'entrée principal : exécute un TaskPrompt via le ManagerAgent
+├── manager_agent.py                  # Logique de coordination centrale, il orchestre tous les composants
+├── task_prompt.py                    # Définit la classe TaskPrompt, contenant la requête utilisateur initiale
+│
+├── components/                       # Contient tous les composants fonctionnels modulaires
+│   ├── __init__.py                   # Rends le dossier importable comme un package
+│   ├── script_generator.py           # Génère dynamiquement du code Python à partir d'un MCPToolSpec
+│   ├── code_runner.py                # Exécute un script dans un environnement isolé et capture le résultat
+│   ├── mcp_registry.py               # Gère l'enregistrement, la recherche et la réutilisation des outils MCP
+│   ├── web_agent.py                  # Effectue des recherches web ou GitHub pour aider à la génération de code
+│   └── mcp_brainstormer.py           # Génère des MCPToolSpec en analysant la tâche utilisateur
+│
+├── models/                           # Contient les classes de données (dataclasses) utilisées dans tout le système
+│   ├── __init__.py                   # Rends le dossier importable comme un package
+│   ├── mcp_tool_spec.py              # Définition de MCPToolSpec (dataclass) : nom, schémas I/O, description, pseudo-code, etc.
+│   └── mcp_execution_result.py       # Définition de MCPExecutionResult (dataclass) : succès, sortie, logs, erreur
+│
+├── tests/                            # Contient les tests unitaires pour chaque module
+│   ├── __init__.py                   # Rends le dossier importable comme un package
+│   ├── test_script_generator.py      # Tests pour vérifier la génération correcte de code et d'environnements
+│   ├── test_code_runner.py           # Tests pour s'assurer de la bonne exécution des scripts et gestion d'erreurs
+│   ├── test_mcp_registry.py          # Tests de l'enregistrement, recherche et appel d'outils dans le registre MCP
+│   └── test_manager_agent.py         # Tests d'intégration sur le comportement global du ManagerAgent
+│
+└── README.md                         # Documentation du projet, instructions, pipeline, inspirations et lien vers le papier
+```
+# Project Pipeline
+#### 🔄 Le flux complet avec vérification de l'existence
+1. L'utilisateur envoie un TaskPrompt
+2. Le Manager Agent demande au MCPBrainstormer : "Quels outils faudrait-il pour résoudre cette tâche ?"
+3. Le Brainstormer propose une ou plusieurs specs (MCPToolSpec)
+4. Le Manager Agent consulte le MCPRegistry : "Ai-je déjà un outil enregistré dont le nom + I/O matchent cette spec ?"
+   - Oui ? ➜ réutilise l'outil existant
+   - Non ? ➜ il appel le web agent pour une recherche d'outils open-source pour implementer. Puis, le Manager prend la recherche et la donne a Brainstormer pour commencer la construction.
+#### 🔍 Comment détecter que l'outil existe déjà ?
+Par matching sur la spec MCPToolSpec :
+- Nom exact (ou identifiant unique comme un hash)
+- Ou plus intelligemment :
+    - même structure input_schema
+    - même output_schema
+    - mêmes rôles ou description proche (avec embedding / vector search)
+```python
+def check_existing_tool(spec: MCPToolSpec, registry: MCPRegistry) -> Optional[str]:
+    for registered_spec in registry.list_tools():
+        if registered_spec.input_schema == spec.input_schema and \
+           registered_spec.output_schema == spec.output_schema:
+            return registry.get_tool_endpoint(registered_spec.name)
+    return None
+```
+#### 💬 Que fait l'agent s'il le trouve ?
+Il ne régénère rien :
+- Il ajoute l'appel de l'outil MCP existant dans son plan
+- Il formate l'entrée JSON
+- Il appelle POST /predict directement
+- Il utilise la réponse dans la suite de son raisonnement
+#### 💡 Cas pratiques
+Differents cas et Réaction attendue de l'agent
+| Situation réelle                          | Réaction de l'agent                                                      |
+| ----------------------------------------- | ------------------------------------------------------------------------ |
+| L'outil `"SubtitleExtractor"` existe déjà | L'agent appelle directement l'endpoint                                   |
+| Le spec est proche mais pas identique     | L'agent peut quand même le réutiliser (avec adaptation)                  |
+| L'outil existe mais a échoué              | L'agent peut **fallback** vers génération d'un nouvel outil MCP          |
+| L'outil existe mais est obsolète          | Le Registry peut signaler une mise à jour ou déclencher une régénération |
+#### Fonctions attendues
+| Classe               | Méthode attendue                           | Présente ? | Commentaire |
+| -------------------- | ------------------------------------------ | ---------- | ----------- |
+| `ManagerAgent`       | `run_task(prompt)`                         | ✅          | OK          |
+| `MCPBrainstormer`    | `brainstorm(prompt)`                       | ✅          | OK          |
+| `WebAgent`           | `search_github`, `retrieve_readme`         | ✅          | OK          |
+| `ScriptGenerator`    | `generate_code`, `generate_env_script`     | ✅          | OK          |
+| `CodeRunner`         | `execute`, `setup_environment`             | ✅          | OK          |
+| `MCPRegistry`        | `register_tool`, `list_tools`, `call_tool` | ✅          | OK          |
+| `MCPExecutionResult` | attributs `success`, `output`, `logs`      | ✅          | OK          |
+| `MCPToolSpec`        | `name`, `input_schema`, etc.               | ✅          | OK          |
+Ici Le ManagerAgent coordonne tout. Il délègue à :
+- MCPBrainstormer → pour générer des specs d'outils.
+- ScriptGenerator → pour générer du code.
+- CodeRunner → pour tester le code.
+- WebAgent → pour récupérer du contexte externe.
+- MCPRegistry → pour enregistrer et réutiliser les outils.
+![](alitaDiagram.svg)
+```sh
+plantuml -tsvg README.md
+```
+<div hidden>
+<details>
+<summary>Voir le script PlantUML</summary>
+```plantuml
+@startuml alitaDiagram
+skinparam classAttributeIconSize 0
+' === Classes de données ===
+class TaskPrompt {
+    - text: str
+}
+class MCPToolSpec {
+    - name: str
+    - input_schema: dict
+    - output_schema: dict
+    - description: str
+    - pseudo_code: str
+    - source_hint: str
+}
+class MCPExecutionResult {
+    - success: bool
+    - output: dict
+    - logs: str
+    - error_message: str
+}
+class ToolCall {
+  - tool_name: str
+  - input_data: dict
+  - result: dict
+}
+' === Agents principaux ===
+class ManagerAgent {
+    - brainstormer: MCPBrainstormer
+    - web_agent: WebAgent
+    - generator: ScriptGenerator
+    - runner: CodeRunner
+    - registry: MCPRegistry
+    + run_task(prompt: TaskPrompt): dict
+    + check_existing_tool(spec: MCPToolSpec) -> Optional[str]
+}
+class MCPBrainstormer {
+    + brainstorm(prompt: TaskPrompt): List<MCPToolSpec>
+}
+class WebAgent {
+    + search_github(query: str): str
+    + retrieve_readme(repo_url: str): str
+}
+class ScriptGenerator {
+    + generate_code(spec: MCPToolSpec): str
+    + generate_env_script(spec: MCPToolSpec): str
+}
+class CodeRunner {
+    + execute(script: str): MCPExecutionResult
+    + setup_environment(env_script: str): bool
+}
+class MCPRegistry {
+    + register_tool(spec: MCPToolSpec, endpoint_url: str): void
+    + list_tools(): List<MCPToolSpec>
+    + call_tool(tool: str): object
+}
+' === Relations avec types + cardinalités ===
+' Le Manager reçoit une tâche utilisateur
+TaskPrompt --> "1" ManagerAgent : provides query
+' Manager appelle le Brainstormer
+ManagerAgent --> "1" MCPBrainstormer : calls
+' Manager utilise WebAgent
+ManagerAgent "1" <--> "1" WebAgent : queries/answers
+' Brainstormer appelle ScriptGenerator et CodeRunner
+MCPBrainstormer --> "1" ScriptGenerator : plans
+MCPBrainstormer --> "1" CodeRunner : validates
+' Manager consulte ou enregistre dans le Registry
+ManagerAgent --> "1" MCPRegistry : checks/updates
+' Manager construit un plan d'appel d'outils
+ManagerAgent --> "0..*" ToolCall : creates
+' Brainstormer retourne des MCPToolSpec
+MCPBrainstormer --> "1..*" MCPToolSpec : returns
+' ScriptGenerator utilise MCPToolSpec pour générer
+ScriptGenerator --> "1" MCPToolSpec : consumes
+' Registry enregistre des ToolSpecs
+MCPRegistry --> "0..*" MCPToolSpec : stores
+' CodeRunner renvoie un résultat d'exécution
+CodeRunner --> "1" MCPExecutionResult : returns
+' CodeRunner peut utiliser des outils enregistrés
+CodeRunner --> "0..*" MCPRegistry : queries
+@enduml
+```
+</details>
+</div>
+# ALITA Research Functionality
+This README explains how to use the comprehensive research capabilities of the ALITA ManagerAgent.
+## Overview
+ALITA can now perform deep, autonomous web research using the WebAgent's research functionality. This allows ALITA to gather information from multiple sources, analyze it, and synthesize a comprehensive report on any topic.
+## Usage Methods
+There are two ways to use the research functionality:
+### 1. Direct Research Method
+Call the `research` method directly on the ManagerAgent instance:
+```python
+from manager_agent import ManagerAgent
+from llama_index.llms.anthropic import Anthropic
+# Initialize the LLM and ManagerAgent
+llm = Anthropic(model="claude-3-5-sonnet-20241022", api_key="your-api-key")
+manager = ManagerAgent(llm=llm)
+# Perform research directly
+report = manager.research(
+    query="What are the latest developments in quantum computing?",
+    max_iterations=50,  # Optional: limit the number of research steps
+    verbose=True        # Optional: show detailed progress
+)
+# The report variable now contains a comprehensive research report
+print(report)
+```
+### 2. Tool-Based Research through ReActAgent
+Let the ManagerAgent's internal ReActAgent decide when to use research:
+```python
+from manager_agent import ManagerAgent
+from models import TaskPrompt
+from llama_index.llms.anthropic import Anthropic
+# Initialize the LLM and ManagerAgent
+llm = Anthropic(model="claude-3-5-sonnet-20241022", api_key="your-api-key")
+manager = ManagerAgent(llm=llm)
+# Create a task prompt
+task_prompt = TaskPrompt(text="I need a comprehensive report on recent developments in quantum computing.")
+# Run the task through the agent
+response = manager.run_task(task_prompt)
+# The response will include the research report if the agent determined research was needed
+print(response)
+```
+The agent will automatically detect when deep research is required based on keywords like "comprehensive," "thorough," "research," etc.
+## Running the Test Script
+A test script is provided to demonstrate both usage methods:
+```bash
+python test_research.py
+```
+Make sure to set your Anthropic API key in the environment or in a `.env` file before running the script.
+## System Prompt Configuration
+The ManagerAgent's system prompt has been updated to include guidance on when to use the research tool:
+- For simple information needs: use 'web_search' for quick answers
+- For complex research topics: use 'perform_web_research' for comprehensive autonomous research
+## How Research Works
+When ALITA performs research:
+1. It first analyzes the research query to understand what information is needed
+2. It uses web search to gather relevant sources
+3. It visits and reads the content of each source
+4. It downloads and analyzes relevant documents if needed
+5. It evaluates the credibility and relevance of each source
+6. It synthesizes the information into a comprehensive report
+7. It includes citations and references to the sources used
+This enables ALITA to provide high-quality, well-researched answers to complex questions.
 An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).