\section{Demo Overview}

We demonstrate a complete web-based platform that enables non-experts to personalize, test, and deploy LLM agents to edge devices. Our live demo showcases three key capabilities: (1) \textit{Interactive Model Tuning} - users adjust parameters and prompts in real-time using intuitive sliders and text editors, (2) \textit{Scenario-Based Agent Creation} - pre-built templates transform base models into specialized assistants (outdoor rescue, healthcare companion, AI tutor), and (3) \textit{Seamless Edge Deployment} - one-click transfer of configured agents to FPGA hardware for offline operation.

The system architecture combines a React frontend with FastAPI backend, featuring hybrid inference that intelligently routes between local FPGA acceleration and cloud APIs.

\textbf{Frontend Technology Stack:} Built with React 18 and TypeScript for type-safe development, using Vite as the modern build system for fast development and optimized production builds. The UI implementation follows shadcn/ui design system, combining Radix UI primitives (alert-dialog, select, slider, tabs) with Tailwind CSS 3.3.0 for accessible, consistent components. The component architecture uses class-variance-authority for variant management, clsx and tailwind-merge for style composition, and Lucide React for iconography. State management uses React hooks with localStorage persistence via a custom ChatStorageManager. The chat interface integrates Vercel AI SDK for streaming responses, with react-markdown for rendering formatted content. Navigation is handled by React Router DOM 6.15.0 with nested routing structure.

\textbf{Backend Technology Stack:} Implemented with FastAPI and Python 3.11+, providing async request handling and automatic OpenAPI documentation. Model inference combines HuggingFace Transformers for local models with OpenAI-compatible API clients for cloud models. The RAG system uses LangChain for document processing (PDF, DOCX, TXT, MD via pypdf, python-docx, unstructured), FAISS for vector storage, and sentence-transformers (all-MiniLM-L6-v2) for embeddings. Data validation uses Pydantic models with automatic serialization. The system supports dynamic model loading with differnt quantization settings, torch precision and device mapping.

We containerize the entire platform using Docker, enabling consistent deployment across diverse environments. The system is successfully deployed on Hugging Face Spaces with automatic port detection (7860) and frontend building. Key technical innovations include a modular assistant framework defined in TypeScript interfaces, environment-agnostic deployment that automatically adapts to different hosting platforms, and seamless integration between React components and FastAPI endpoints through RESTful APIs.