llmOS-Agent / README.md
tech-envision
Add interactive CLI and session listing endpoint
bf3a897
|
raw
history blame
6.07 kB
# llm-backend
This project provides a simple async interface to interact with an Ollama model
and demonstrates basic tool usage. Chat histories are stored in a local SQLite
database using Peewee. Histories are persisted per user and session so
conversations can be resumed with context. One example tool is included:
* **execute_terminal** – Executes a shell command inside a persistent Linux VM
with network access. Use it to read uploaded documents under ``/data``, fetch
web content via tools like ``curl`` or run any other commands. The assistant
must invoke this tool to search online when unsure about a response. Output
from ``stdout`` and ``stderr`` is captured when each command finishes.
The output string is capped at the last 10,000 characters so very long
results are truncated. A short notice is prepended whenever data is hidden.
Execution happens asynchronously so the assistant can continue responding
while the command runs.
The VM is created when a chat session starts and reused for all subsequent
tool calls. When ``PERSIST_VMS`` is enabled (default), each user keeps the
same container across multiple chat sessions and across application restarts,
so any installed packages and filesystem changes remain available. The
environment includes Python and ``pip`` so complex tasks can be scripted using
Python directly inside the terminal.
Sessions share state through an in-memory registry so that only one generation
can run at a time. Messages sent while a response is being produced are
ignored unless the assistant is waiting for a tool result—in that case the
pending response is cancelled and replaced with the new request.
The application injects a robust system prompt on each request. The prompt
guides the model to plan tool usage, execute commands sequentially and
verify results before replying. When the assistant is uncertain, it is directed
to search the internet with ``execute_terminal`` before giving a final answer.
The prompt is **not** stored in the chat history but is provided at runtime so
the assistant can orchestrate tool calls in sequence to fulfil the user's
request reliably. It also directs the assistant to avoid technical jargon so
responses are easy for anyone to understand. If a user message ends with
``/think`` it simply selects an internal reasoning mode and should be stripped
from the prompt before processing.
## Usage
```bash
python run.py
```
The script will instruct the model to run a simple shell command and print the result. Conversations are automatically persisted to `chat.db` and are now associated with a user and session.
Uploaded files are stored under the `uploads` directory and mounted inside the VM at `/data`. Call ``upload_document`` on the chat session to make a file available to the model:
```python
async with ChatSession() as chat:
path_in_vm = chat.upload_document("path/to/file.pdf")
async for part in chat.chat_stream(f"Summarize {path_in_vm}"):
print(part)
```
When using the Discord bot, attach one or more text files to a message to
upload them automatically. The bot responds with the location of each document
inside the VM so they can be referenced in subsequent prompts.
## Discord Bot
Create a `.env` file with your Discord token:
```bash
DISCORD_TOKEN="your-token"
```
Then start the bot:
```bash
python -m bot.discord_bot
```
Any attachments sent to the bot are uploaded to the VM and the bot replies with
their paths so they can be used in later messages.
## VM Configuration
The Linux VM used for tool execution runs inside a Docker container. By default
it pulls the image defined by the ``VM_IMAGE`` environment variable, falling
back to ``python:3.11-slim``. This base image includes Python and ``pip`` so
packages can be installed immediately. The container has network access enabled
which allows fetching additional dependencies as needed.
When ``PERSIST_VMS`` is ``1`` (default), containers are kept around and reused
across application restarts. Each user is assigned a stable container name, so
packages installed or files created inside the VM remain available the next
time the application starts. Set ``VM_STATE_DIR`` to specify the host directory
used for per-user persistent storage mounted inside the VM at ``/state``.
Set ``PERSIST_VMS=0`` to revert to the previous behaviour where containers are
stopped once no sessions are using them.
To use a fully featured Ubuntu environment, build a custom Docker image and set
``VM_IMAGE`` to that image. An example ``docker/Dockerfile.vm`` is provided:
```Dockerfile
FROM ubuntu:22.04
# Install core utilities and Python
RUN apt-get update && \
apt-get install -y --no-install-recommends \
python3 \
python3-pip \
sudo \
curl \
git \
build-essential \
&& rm -rf /var/lib/apt/lists/*
CMD ["sleep", "infinity"]
```
Build and run with:
```bash
docker build -t llm-vm -f docker/Dockerfile.vm .
export VM_IMAGE=llm-vm
python run.py
```
The custom VM includes typical utilities like ``sudo`` and ``curl`` so it behaves
more like a standard Ubuntu installation.
## REST API
Start the API server using ``uvicorn``:
```bash
uvicorn src.api:app --host 0.0.0.0 --port 8000
```
### Endpoints
- ``POST /chat/stream`` – Stream the assistant's response as plain text.
- ``POST /upload`` – Upload a document so it can be referenced in chats.
- ``GET /sessions/{user}`` – List available session names for ``user``.
Example request:
```bash
curl -N -X POST http://localhost:8000/chat/stream \
-H 'Content-Type: application/json' \
-d '{"user":"demo","session":"default","prompt":"Hello"}'
```
## CLI
An interactive command line interface is provided for Windows and other
platforms. Install the dependencies and run:
```bash
python -m src.cli --user yourname
```
The tool lists your existing chat sessions and lets you select one or create a
new session. Type messages and the assistant's streamed replies will appear
immediately. Enter ``exit`` or press ``Ctrl+D`` to quit.