LLM Provider System

The runtime's reasoning engine can operate in two modes: rule-based (zero dependencies) or LLM-augmented (deeper analysis). The LLMClient class provides a provider-agnostic interface with automatic fallback.

Provider Priority

Auto-detection tries providers in this order:

Ollama    — local, free, sovereign (recommended)
NVIDIA NIM — free-tier cloud
OpenAI    — paid cloud, last resort
None      — rule-based fallback

def _resolve_provider(self, provider: str) -> str:
    if provider != "auto":
        return provider
    # Try Ollama first
    try:
        r = httpx.get(f"{OLLAMA_URL}/api/tags", timeout=2)
        if r.status_code == 200:
            return "ollama"
    except Exception:
        pass
    if NVIDIA_API_KEY:
        return "nvidia"
    if OPENAI_API_KEY:
        return "openai"
    return "none"

Configuration

All settings via environment variables:

Variable	Default	Description
`LLM_PROVIDER`	`auto`	`auto`, `ollama`, `nvidia`, `openai`
`LLM_MODEL`	(auto)	Override default model
`LLM_TIMEOUT_S`	`30`	Request timeout in seconds
`LLM_MAX_TOKENS`	`1024`	Max tokens per response
`OLLAMA_URL`	`http://localhost:11434`	Ollama API base URL
`NVIDIA_API_KEY`	(empty)	NVIDIA NIM API key
`OPENAI_API_KEY`	(empty)	OpenAI API key

Default Models

Provider	Default Model
Ollama	`llama3.2`
NVIDIA NIM	`meta/llama-3.3-70b-instruct`
OpenAI	`gpt-4o-mini`

Setup

cp .env.template .env

Option 1: Ollama (Recommended)

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.2

# .env
LLM_PROVIDER=ollama
OLLAMA_URL=http://localhost:11434

Option 2: NVIDIA NIM

# .env
LLM_PROVIDER=nvidia
NVIDIA_API_KEY=nvapi-your-key-here

Option 3: OpenAI

# .env
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-your-key-here

LLMResponse

Every call returns a structured response:

@dataclass
class LLMResponse:
    content: str = ""       # The generated text
    success: bool = False   # Whether the call succeeded
    error: str = ""         # Error message if failed
    provider: str = ""      # Which provider was used
    model: str = ""         # Which model was used
    latency_ms: float = 0.0 # Round-trip time

Reasoning System Prompt

When the LLM is active, the reasoning engine sends this system prompt:

You are the autonomous reasoning engine of a persistent software runtime.
Analyze the system's current state and decide what actions to take.

Actions: spawn, scan, consolidate, adjust, explore, ignore.

RULES:
1. If errors > 5, suggest a scan
2. If a pipeline is quarantined, investigate
3. If uptime > 1 hour with no activity, explore
4. Don't make more than 3 decisions per analysis
5. If memory is growing fast, suggest consolidation

Respond ONLY with valid JSON:
{
  "decisions": [
    {"action": "...", "target": "...", "reason": "...", "confidence": 0.0}
  ],
  "reasoning": "brief chain of thought"
}

Decisions with confidence below 0.6 are discarded.

Availability Check

@property
def is_available(self) -> bool:
    if self.provider == "ollama":
        r = httpx.get(f"{OLLAMA_URL}/api/tags", timeout=3)
        return r.status_code == 200
    if self.provider == "nvidia":
        return bool(NVIDIA_API_KEY)
    if self.provider == "openai":
        return bool(OPENAI_API_KEY)
    return False

Stats

curl http://localhost:8000/engine/brain

{
  "provider": "ollama",
  "model": "llama3.2",
  "total_calls": 15,
  "available": true,
  "decisions_made": 23
}

Next: Security →

Provider Priority​

Configuration​

Default Models​

Setup​

Option 1: Ollama (Recommended)​

Option 2: NVIDIA NIM​

Option 3: OpenAI​

LLMResponse​

Reasoning System Prompt​

Availability Check​

Stats​