Spaces:

neural-thinker
/

cidadao.ai-models

Sleeping

File size: 13,436 Bytes

b95e73a

# 🔄 PLANO DE MIGRAÇÃO ML: BACKEND → MODELS

> **Documento de Planejamento da Migração**  
> **Status**: Em Execução - Janeiro 2025  
> **Objetivo**: Separar responsabilidades ML do sistema multi-agente

---

## 📊 ANÁLISE PRÉ-MIGRAÇÃO

### **CÓDIGO ML NO BACKEND ATUAL**
- **Total**: 7.004 linhas em 13 módulos `src/ml/`
- **Funcionalidade**: Pipeline completo ML funcional
- **Integração**: Importado diretamente pelos 16 agentes
- **Status**: Production-ready, mas acoplado ao backend

### **CIDADAO.AI-MODELS STATUS**
- **Repositório**: Criado com documentação MLOps completa
- **Código**: Apenas main.py placeholder (16 linhas)  
- **Documentação**: 654 linhas de especificação técnica
- **Pronto**: Para receber migração ML

---

## 🎯 ESTRATÉGIA DE MIGRAÇÃO

### **ABORDAGEM: MIGRAÇÃO PROGRESSIVA**
1. ✅ **Não quebrar funcionamento atual** do backend
2. ✅ **Migrar código gradualmente** testando a cada etapa  
3. ✅ **Manter compatibilidade** durante transição
4. ✅ **Implementar fallback** local se models indisponível

---

## 📋 FASE 1: ESTRUTURAÇÃO (HOJE)

### **1.1 Criar Estrutura Base**
```bash
cidadao.ai-models/
├── src/
│   ├── __init__.py
│   ├── models/                  # Core ML models
│   │   ├── __init__.py
│   │   ├── anomaly_detection/   # Anomaly detection pipeline
│   │   ├── pattern_analysis/    # Pattern recognition
│   │   ├── spectral_analysis/   # Frequency domain analysis
│   │   └── core/               # Base classes and utilities
│   ├── training/               # Training infrastructure
│   │   ├── __init__.py
│   │   ├── pipelines/          # Training pipelines
│   │   ├── configs/            # Training configurations
│   │   └── utils/              # Training utilities
│   ├── inference/              # Model serving
│   │   ├── __init__.py
│   │   ├── api_server.py       # FastAPI inference server
│   │   ├── batch_processor.py  # Batch inference
│   │   └── streaming.py        # Real-time inference
│   └── deployment/             # Deployment tools
│       ├── __init__.py
│       ├── huggingface/        # HF Hub integration
│       ├── docker/             # Containerization
│       └── monitoring/         # ML monitoring
├── tests/
│   ├── __init__.py
│   ├── unit/                   # Unit tests
│   ├── integration/            # Integration tests
│   └── e2e/                    # End-to-end tests
├── configs/                    # Model configurations
├── notebooks/                  # Jupyter experiments
├── datasets/                   # Dataset management
├── requirements.txt            # Dependencies
├── setup.py                    # Package setup
└── README.md                   # Documentation
```

### **1.2 Configurar Dependências**
```python
# requirements.txt
torch>=2.0.0
transformers>=4.36.0
scikit-learn>=1.3.2
pandas>=2.1.4
numpy>=1.26.3
fastapi>=0.104.0
uvicorn>=0.24.0
huggingface-hub>=0.19.0
mlflow>=2.8.0
wandb>=0.16.0
```

---

## 📋 FASE 2: MIGRAÇÃO MÓDULOS (PRÓXIMA SEMANA)

### **2.1 Mapeamento de Migração**
```python
# Migração de arquivos backend → models
MIGRATION_MAP = {
    # Core ML modules
    "src/ml/anomaly_detector.py": "src/models/anomaly_detection/detector.py",
    "src/ml/pattern_analyzer.py": "src/models/pattern_analysis/analyzer.py", 
    "src/ml/spectral_analyzer.py": "src/models/spectral_analysis/analyzer.py",
    "src/ml/models.py": "src/models/core/base_models.py",
    
    # Training pipeline
    "src/ml/training_pipeline.py": "src/training/pipelines/training.py",
    "src/ml/advanced_pipeline.py": "src/training/pipelines/advanced.py",
    "src/ml/data_pipeline.py": "src/training/pipelines/data.py",
    
    # HuggingFace integration
    "src/ml/hf_cidadao_model.py": "src/models/core/hf_model.py",
    "src/ml/hf_integration.py": "src/deployment/huggingface/integration.py",
    "src/ml/cidadao_model.py": "src/models/core/cidadao_model.py",
    
    # API and serving
    "src/ml/model_api.py": "src/inference/api_server.py",
    "src/ml/transparency_benchmark.py": "src/models/evaluation/benchmark.py"
}
```

### **2.2 Refatoração de Imports**
```python
# Antes (backend atual)
from src.ml.anomaly_detector import AnomalyDetector
from src.ml.pattern_analyzer import PatternAnalyzer

# Depois (models repo)
from cidadao_models.models.anomaly_detection import AnomalyDetector
from cidadao_models.models.pattern_analysis import PatternAnalyzer
```

### **2.3 Configurar Package**
```python
# setup.py
from setuptools import setup, find_packages

setup(
    name="cidadao-ai-models",
    version="1.0.0",
    description="ML models for Cidadão.AI transparency analysis",
    packages=find_packages(where="src"),
    package_dir={"": "src"},
    install_requires=[
        "torch>=2.0.0",
        "transformers>=4.36.0",
        "scikit-learn>=1.3.2",
        # ... outras dependências
    ],
    python_requires=">=3.11",
)
```

---

## 📋 FASE 3: SERVIDOR DE INFERÊNCIA (SEMANA 2)

### **3.1 API Server Dedicado**
```python
# src/inference/api_server.py
from fastapi import FastAPI, HTTPException
from cidadao_models.models.anomaly_detection import AnomalyDetector
from cidadao_models.models.pattern_analysis import PatternAnalyzer

app = FastAPI(title="Cidadão.AI Models API")

# Initialize models
anomaly_detector = AnomalyDetector()
pattern_analyzer = PatternAnalyzer()

@app.post("/v1/detect-anomalies")
async def detect_anomalies(contracts: List[Contract]):
    """Detect anomalies in government contracts"""
    try:
        results = await anomaly_detector.analyze(contracts)
        return {"anomalies": results, "model_version": "1.0.0"}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/v1/analyze-patterns")
async def analyze_patterns(data: Dict[str, Any]):
    """Analyze patterns in government data"""
    try:
        patterns = await pattern_analyzer.analyze(data)
        return {"patterns": patterns, "confidence": 0.87}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/health")
async def health_check():
    return {"status": "healthy", "models_loaded": True}
```

### **3.2 Client no Backend**
```python
# backend/src/tools/models_client.py
import httpx
from typing import Optional, List, Dict, Any

class ModelsClient:
    """Client for cidadao.ai-models API"""
    
    def __init__(self, base_url: str = "http://localhost:8001"):
        self.base_url = base_url
        self.client = httpx.AsyncClient(timeout=30.0)
        
    async def detect_anomalies(self, contracts: List[Dict]) -> Dict[str, Any]:
        """Call anomaly detection API"""
        try:
            response = await self.client.post(
                f"{self.base_url}/v1/detect-anomalies",
                json={"contracts": contracts}
            )
            response.raise_for_status()
            return response.json()
        except httpx.RequestError:
            # Fallback to local processing if models API unavailable
            return await self._local_anomaly_detection(contracts)
    
    async def _local_anomaly_detection(self, contracts: List[Dict]) -> Dict[str, Any]:
        """Fallback local processing"""
        # Import local ML if models API unavailable
        from src.ml.anomaly_detector import AnomalyDetector
        detector = AnomalyDetector()
        return detector.analyze(contracts)
```

---

## 📋 FASE 4: INTEGRAÇÃO AGENTES (SEMANA 3)

### **4.1 Atualizar Agente Zumbi**
```python
# backend/src/agents/zumbi.py - ANTES
from src.ml.anomaly_detector import AnomalyDetector
from src.ml.spectral_analyzer import SpectralAnalyzer

class InvestigatorAgent(BaseAgent):
    def __init__(self):
        self.anomaly_detector = AnomalyDetector()
        self.spectral_analyzer = SpectralAnalyzer()

# backend/src/agents/zumbi.py - DEPOIS  
from src.tools.models_client import ModelsClient

class InvestigatorAgent(BaseAgent):
    def __init__(self):
        self.models_client = ModelsClient()
        # Fallback local se necessário
        self._local_detector = None
        
    async def investigate(self, contracts):
        # Tenta usar models API primeiro
        try:
            results = await self.models_client.detect_anomalies(contracts)
            return results
        except Exception:
            # Fallback para processamento local
            if not self._local_detector:
                from src.ml.anomaly_detector import AnomalyDetector
                self._local_detector = AnomalyDetector()
            return self._local_detector.analyze(contracts)
```

### **4.2 Configuração Híbrida**
```python
# backend/src/core/config.py - Adicionar
class Settings(BaseSettings):
    # ... existing settings ...
    
    # Models API configuration
    models_api_enabled: bool = Field(default=True, description="Enable models API")
    models_api_url: str = Field(default="http://localhost:8001", description="Models API URL")
    models_api_timeout: int = Field(default=30, description="API timeout seconds")
    models_fallback_local: bool = Field(default=True, description="Use local ML as fallback")
```

---

## 📋 FASE 5: DEPLOYMENT (SEMANA 4)

### **5.1 Docker Models**
```dockerfile
# cidadao.ai-models/Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy source code
COPY src/ ./src/
COPY setup.py .
RUN pip install -e .

# Expose port
EXPOSE 8001

# Run inference server
CMD ["uvicorn", "src.inference.api_server:app", "--host", "0.0.0.0", "--port", "8001"]
```

### **5.2 Docker Compose Integration**
```yaml
# docker-compose.yml (no backend)
version: '3.8'

services:
  cidadao-backend:
    build: .
    ports:
      - "8000:8000"
    depends_on:
      - cidadao-models
    environment:
      - MODELS_API_URL=http://cidadao-models:8001
      
  cidadao-models:
    build: ../cidadao.ai-models
    ports:
      - "8001:8001"
    environment:
      - MODEL_CACHE_SIZE=1000
```

### **5.3 HuggingFace Spaces**
```python
# cidadao.ai-models/spaces_app.py
import gradio as gr
from src.models.anomaly_detection import AnomalyDetector
from src.models.pattern_analysis import PatternAnalyzer

detector = AnomalyDetector()
analyzer = PatternAnalyzer()

def analyze_contract(contract_text):
    """Analyze contract for anomalies"""
    result = detector.analyze_text(contract_text)
    return {
        "anomaly_score": result.score,
        "risk_level": result.risk_level,
        "explanation": result.explanation
    }

# Gradio interface
with gr.Blocks(title="Cidadão.AI Models Demo") as demo:
    gr.Markdown("# 🤖 Cidadão.AI - Modelos de Transparência")
    
    with gr.Row():
        input_text = gr.Textbox(
            label="Texto do Contrato",
            placeholder="Cole aqui o texto do contrato para análise..."
        )
        
    analyze_btn = gr.Button("Analisar Anomalias")
    
    with gr.Row():
        output = gr.JSON(label="Resultado da Análise")
    
    analyze_btn.click(analyze_contract, inputs=input_text, outputs=output)

if __name__ == "__main__":
    demo.launch()
```

---

## 🔄 INTEGRAÇÃO ENTRE REPOSITÓRIOS

### **COMUNICAÇÃO API-BASED**
```python
# Fluxo: Backend → Models
1. Backend Agent precisa análise ML
2. Chama Models API via HTTP
3. Models processa e retorna resultado  
4. Backend integra resultado na resposta
5. Fallback local se Models indisponível
```

### **VERSIONAMENTO INDEPENDENTE**
```python
# cidadao.ai-models releases
v1.0.0: "Initial anomaly detection model"
v1.1.0: "Pattern analysis improvements" 
v1.2.0: "New corruption detection model"

# cidadao.ai-backend usa models
requirements.txt:
  cidadao-ai-models>=1.0.0,<2.0.0
```

---

## 📊 CRONOGRAMA EXECUÇÃO

### **SEMANA 1: Setup & Estrutura**
- [ ] Criar estrutura completa cidadao.ai-models
- [ ] Configurar requirements e setup.py
- [ ] Migrar primeiro módulo (anomaly_detector.py)
- [ ] Testar importação e funcionamento básico

### **SEMANA 2: Migração Core**
- [ ] Migrar todos os 13 módulos ML
- [ ] Refatorar imports e dependências
- [ ] Implementar API server básico
- [ ] Criar client no backend

### **SEMANA 3: Integração Agentes**
- [ ] Atualizar Zumbi para usar Models API
- [ ] Implementar fallback local
- [ ] Testar integração completa
- [ ] Atualizar documentação

### **SEMANA 4: Deploy & Production**
- [ ] Containerização Docker  
- [ ] Deploy HuggingFace Spaces
- [ ] Monitoramento e métricas
- [ ] Testes de carga e performance

---

## ✅ CRITÉRIOS DE SUCESSO

### **FUNCIONAIS**
- [ ] Backend continua funcionando sem interrupção
- [ ] Models API responde <500ms
- [ ] Fallback local funciona se API indisponível  
- [ ] Todos agentes usam nova arquitetura

### **NÃO-FUNCIONAIS**
- [ ] Performance igual ou melhor que atual
- [ ] Deploy independente dos repositórios
- [ ] Documentação atualizada
- [ ] Testes cobrindo >80% código migrado

---

## 🎯 PRÓXIMO PASSO IMEDIATO

**COMEÇAR FASE 1 AGORA**: Criar estrutura base no cidadao.ai-models e migrar primeiro módulo para validar approach.

Vamos começar?