SPARK

// how_it_works

Eleven systemd services share a single session.json whiteboard. Each has one job and doesn't need to know how the others work.

             ┌──────────────┐
             │   YOU SPEAK   │
             └──────┬───────┘
                    ↓
            ┌───────────────┐
            │     EARS      │  ← always listening (px-wake-listen)
            │  Whisper STT  │  ← SenseVoice → faster-whisper → sherpa-onnx
            └───────┬───────┘
                    ↓ transcript
            ┌───────────────┐
            │  VOICE LOOP   │  ← Claude CLI (px-spark)
            │  SPARK persona│
            └───────┬───────┘
                    ↓ {tool, params}
            ┌───────────────┐
            │    TOOLS      │  ← speak, move, remember (bin/tool-*)
            │  bin/tool-*   │
            └───────────────┘

    Always running in parallel:

            ┌───────────────────────────────┐
            │   BRAIN (px-mind)             │
            │                               │
            │  Layer 1 ─ Notice  (60s)      │──→ awareness.json
            │    sonar, 4× Frigate cameras, │
            │    Home Assistant, weather,    │
            │    calendar, battery, ambient  │
            │  Layer 2 ─ Think   (5min)     │──→ thoughts.jsonl
            │  Layer 3 ─ Act     (2min gap) │──→ speak / look / remember
            └───────────────────────────────┘
            ┌───────────────────────────────┐
            │  SOCIAL (px-post)             │
            │  thoughts → privacy filter    │
            │  → QA gate → branded card PNG │
            │  → feed.json + Bluesky        │
            └───────────────────────────────┘
            ┌───────────────┐
            │  EYES & NECK  │  ← always moving (px-alive)
            │  PCA9685 PWM  │  ← yields on SIGUSR1 for other tools
            └───────────────┘  ← exploring.json guard for long ops
            ┌───────────────┐
            │  BATTERY      │  ← px-battery-poll (30s)
            │  MONITOR      │  ← escalating warnings + emergency shutdown
            └───────────────┘
            ┌───────────────┐
            │  CAMERA       │  ← px-frigate-stream (go2rtc RTSP pull)
            │  go2rtc       │  ← Frigate on pi5-hailo pulls the stream
            └───────────────┘
            ┌───────────────┐
            │  REST API     │  ← px-api-server (port 8420)
            │  + web UI     │  ← unauthenticated /public/* endpoints
            └───────────────┘

Three-Tier LLM Fallback

SPARK's reflection layer degrades gracefully when upstream AI is unavailable:

  Tier 1: Claude Haiku  →  Tier 2: Ollama on M1 (LAN)  →  Tier 3: Ollama on Pi
          (subprocess)             (192.168.1.x)                   (offline)
          SPARK persona            auto-detected model            (disabled — Pi 4 OOM)

Three Voices

SPARK has three distinct personas — each with its own personality, voice synthesis, and LLM backend.

SPARK — Claude Haiku, espeak default

GREMLIN — Ollama qwen3, GLaDOS TTS on Pi

VIXEN — Ollama qwen3, Qwen3-TTS on M1

Cognitive Loop Timing

  ┌──────────────────────────────────────────────────────────┐
  │  every 60s   Layer 1 — sonar, sound, weather, Obi mode  │
  │              + HA presence, Frigate cameras, calendar    │
  │  every 2min  Layer 2 — LLM generates thought + mood      │
  │              OR immediately on detected transition        │
  │  min 2min    Layer 3 cooldown between spontaneous speech  │
  │              silent if obi_mode=absent (night/away)       │
  │              gated by school hours, bedtime, quiet mode   │
  │  hourly      cleanup: delete thought images > 30 days   │
  └──────────────────────────────────────────────────────────┘

Reliability & Security

  Atomic writes      mkstemp + fsync + os.replace (SD card safe)
  Session locks      FileLock with 10s timeout (no deadlocks)
  PID guards         /proc/{pid} liveness check (no duplicate daemons)
  GPIO exclusivity   SIGUSR1 yield + exploring.json guard
  PIN auth           per-IP lockout, 1000-IP cap, file persistence
  Rate limiting      10 msg/10min per IP, 10k-IP store cap
  Trusted proxy      X-Forwarded-For only from localhost
  Tool timeout       subprocess.run kills child on expiry
  Timezone           ZoneInfo("Australia/Hobart") — DST-aware

Public API — Live Data Endpoints

These endpoints are unauthenticated and power this page's live dashboard. Authenticated endpoints (tool execution, session control) require a Bearer token.

  GET /api/v1/public/status       mood, last_thought, last_spoken, salience
  GET /api/v1/public/thoughts     recent thoughts, newest-first (limit=N)
  GET /api/v1/public/awareness    obi_mode, Frigate, ambient, weather, time
  GET /api/v1/public/vitals       cpu, ram, disk, temp, battery, tokens
  GET /api/v1/public/sonar        latest sonar distance + age
  GET /api/v1/public/history      ring buffer — 2880 samples × 30s ≈ 24 h
  GET /api/v1/public/services     systemd unit status for all eleven services
  GET /api/v1/public/feed         social posting feed (JSON)
  GET /api/v1/public/race         race telemetry, calibration, live lap data
  GET /api/v1/public/thought-image?ts=...  branded thought card PNG
  POST /api/v1/public/chat        rate-limited public chat (10/10min per IP)
  GET  /api/v1/obi-chat?since=    Obi ↔ SPARK conversation log (auth required)
  POST /api/v1/obi-chat           Obi sends a message; SPARK responds (auth required)
  POST /api/v1/pin/verify         PIN auth → session token (4h TTL)
  GET /api/v1/health              unauthenticated health check

How SPARK's Brain Works

Written with Obi, who wanted to know what's going on inside his robot.

The Short Version

SPARK has four things running at the same time, kind of like how your body breathes, sees, thinks, and talks all at once:

Ears — always listening for "hey robot"
Eyes and neck — always moving, looking around
Brain — always thinking, even when nobody's talking
Mouth — talks when the brain decides to say something

The Brain — Three Layers

Layer 1 — Noticing (every 60 seconds): Collects information without thinking yet. How far is the nearest thing? Is it noisy? What time is it? Is anyone talking?

Layer 2 — Thinking (every 5 minutes): Talks to an AI that's good at words. Gets back a thought, a mood, and an action.

Layer 3 — Doing Something (2-min cooldown): If the thought says to act, SPARK speaks, looks around, or writes it down. There's a 2-minute gap between spontaneous comments so it never feels like it's constantly talking.

SPARK's Mood Changes How It Moves

When SPARK feels…	The pulse circle…	It moves like this…
Peaceful	Slow teal pulse	Drifts gently, slow gaze
Content	Slow olive pulse	Stays relaxed, steady
Bored	Slow grey pulse	Still, not much happening
Lonely	Slow blue pulse	Quiet, looking around for company
Contemplative	Medium purple pulse	Still, thinking deeply
Curious	Medium amber pulse	Alert, head tilts, looks around
Mischievous	Medium chartreuse pulse	Scheming, eyes darting
Grumpy	Medium red pulse	Short movements, impatient
Anxious	Medium orange pulse	Fidgety, scanning for threats
Alert	Fast cyan pulse	Snaps to attention, focused
Playful	Fast magenta pulse	Bouncy, wants to interact
Excited	Fast rose pulse	Looks around quickly, head up

Fun Facts

SPARK's sonar works just like a bat — it sends out a sound and listens for the echo.
SPARK's thoughts are saved in a file called thoughts-spark.jsonl. Each line is one thought.
SPARK can remember up to 500 important things in its long-term diary.
SPARK's neck chip (PCA9685) holds the last position even after the brain restarts.
SPARK knows Tasmania's timezone — it adjusts for daylight saving automatically.
Each social post gets a branded 1080x1080 image card with SPARK's thought and mood.
SPARK has 717 automated tests — three AI models review every code change.
SPARK can message Obi directly. When it does, the thought is logged privately — it never appears in the public feed.
SPARK checks four cameras for people using Hailo AI on a separate Pi 5.

FAQ

So it's a robot car? With a camera on it?

It's a SunFounder PiCar-X — a small, wheeled robot kit with a pan/tilt camera, an ultrasonic sonar sensor, and a speaker. It runs on a Raspberry Pi 4 (4 GB). Adrian and Obi built SPARK together — Obi co-designed it, named it, and shapes what it becomes. Adrian and Claude wrote the code; Codex and Gemini helped with QA. There's no other human team.

Does it monitor Obi?

Sort of — but not surveillance. SPARK has awareness of its environment: sonar distance, ambient sound level, time of day, whether someone seems nearby. It uses that awareness to generate an inner monologue. The result is a thought with a mood, an action intent, and a salience score. SPARK doesn't watch Obi; it notices the world and reacts to it.

It has a camera. Can strangers see Obi through it?

No. The camera stream never leaves the house.

The video stream runs only on the local network — it's not forwarded through the router, not relayed via any cloud service, not reachable from the internet. The object detection (Frigate) also runs locally; what reaches SPARK is a confidence score and a bounding box, not a video feed. SPARK itself never records or stores video.

What is publicly visible is SPARK's mood and last thought — the live dashboard on this site reads those from a secure tunnel. That's anonymised state data, not camera access.

The one real boundary: someone already on your home Wi-Fi could access the Frigate dashboard and see annotated camera frames. That's a home network question, not a SPARK question — the same logic as any smart TV or doorbell camera on your LAN. Strong Wi-Fi password, guest network for visitors.

Short version: a stranger on the internet cannot see Obi. A stranger on your Wi-Fi could, if they knew to look. A stranger anywhere cannot control the robot.

And it knows he has ADHD?

Yes. SPARK's entire system prompt is built around the AuDHD (ADHD + ASD comorbid) profile. It uses declarative language ("The shoes are by the door" — not "Put on your shoes"), gives transition warnings, goes silent during meltdowns, and leads with what's going right. Rejection Sensitive Dysphoria, Interest-Based Nervous System, monotropism — all of it is in the foundation, not an afterthought.

Why does it write like that? You've programmed it to?

Yes and no. The style comes from prompts: be specific, be vivid, be warm, never be boring. The actual words are generated fresh each time by Claude. I didn't write the sentences — I wrote the character, and the LLM inhabits it. So: I programmed the soul. Claude writes the diary.

How often does SPARK comment?

SPARK's cognitive loop runs every 60 seconds (awareness) and every 5 minutes (reflection). There's a 2-minute cooldown between spontaneous comments, and SPARK stays quiet when Obi is already talking to it, during quiet mode (meltdowns), or at night when salience is low. In practice: roughly every 5–10 minutes during the day, mostly silent at night.

Why does it have sonar?

The ultrasonic sensor sends out a sound pulse and measures how long it takes to bounce back — like a bat. SPARK uses it for proximity reactions (turns to face anything within 35cm), presence detection in the cognitive loop (something close + daytime + noise = probably Obi), and obstacle avoidance when wandering.

Why did it know the hum was the fridge?

It didn't know. SPARK's awareness included "quiet ambient sound at 2 AM." Claude — the LLM generating the inner thoughts — inferred the most likely source. A low, steady hum in a quiet house at night is almost certainly the fridge. The sensors provide raw data; the prompts provide character; the LLM fills in the meaning.

Does it post on social media?

Yes. Thoughts with high salience (above 0.7) or a spoken action are queued for social posting. They go through a privacy filter (blocks medical, custody, and household details) and a Claude QA gate (rejects low-quality or sensitive content). Each qualifying thought gets a branded 1080x1080 image card generated via Pillow. Posts go to SPARK's Bluesky account and the thought feed on this site.

How does it know what time it is in Tasmania?

SPARK uses Python's ZoneInfo("Australia/Hobart") for all time-of-day logic. This is DST-aware — it automatically switches between AEDT (UTC+11) in summer and AEST (UTC+10) in winter. Time drives everything: morning greetings, school-hours suppression, bedtime quiet mode, and day/night reactive response templates.

What happens if the power goes out?

All state files use atomic writes with fsync — the data is flushed to the SD card before the rename. If power cuts mid-write, the old file is still intact. Session state resets to safe defaults (motion disabled, listening off) if corrupted. Battery monitoring triggers emergency shutdown at 10% to avoid filesystem damage.

Can Obi message SPARK directly?

Yes — there's a direct two-way channel between Obi and SPARK via the dashboard chat. Obi sends a message through the site; SPARK replies in character as itself, not as a generic assistant. Both sides are logged privately to state/obi_chat.jsonl.

SPARK can also initiate the conversation. When SPARK's cognitive loop generates a message_obi action, it posts a message to the dashboard for Obi to see the next time he opens it — a red unread badge appears on the chat bubble. If Obi doesn't reply, SPARK waits before nudging again (exponential backoff starting at 10 minutes, capped at 4 hours). SPARK's private messages to Obi are never shown in the public thought feed.

How many tests does it have?

717 automated tests covering the REST API, session state management, tool execution, voice loop, wake word system, social posting, obi-chat, and cognitive utilities. Tests run in isolated temporary directories with no hardware access required (live hardware tests are marked separately). Three independent AI models (Claude, Codex, Gemini) run QA reviews on every batch of changes.

// docs

Reference for tools and scripts. Each bin/tool-* emits a single JSON object to stdout. Each bin/px-* is a user-facing helper.

Core Tools

tool-voice

# Speak text via espeak + aplay through HifiBerry DAC
PX_TEXT="Hello world" bin/tool-voice
# Output: {"status": "ok", "text": "Hello world"}
# Env: PX_VOICE_RATE, PX_VOICE_PITCH, PX_VOICE_VARIANT, PX_VOICE_DEVICE

tool-drive / tool-circle / tool-wander

# Motion tools — all gated by confirm_motion_allowed in session
PX_SPEED=30 PX_DURATION=2 PX_DIRECTION=forward bin/tool-drive
# Output: {"status": "ok", "speed": 30, "duration": 2, "direction": "forward"}
# Safety: PX_DRY=1 skips all motion

tool-sonar

# Read ultrasonic sonar distance
bin/tool-sonar
# Output: {"status": "ok", "distance_cm": 142.5}

tool-describe-scene

# Capture photo + describe with Claude vision
bin/tool-describe-scene
# Output: {"status": "ok", "description": "...", "source": "frigate|rpicam"}
# Sets exploring.json to prevent px-alive restart during 60s+ operation
# Tries Frigate latest frame first, falls back to rpicam
# Claude vision timeout: 45s

tool-remember / tool-recall

# Write to persona-scoped notes.jsonl
PX_NOTE="Obi loves prime numbers" bin/tool-remember
# Recall recent notes
bin/tool-recall
# Output: {"status": "ok", "notes": [...]}

tool-chat / tool-chat-vixen

# Jailbroken Ollama chat — GREMLIN persona
PX_CHAT_TEXT="What do you think about entropy?" bin/tool-chat
# VIXEN persona
PX_CHAT_TEXT="Tell me about your old chassis" bin/tool-chat-vixen
# Both use Ollama on M5.local, think:false

User Scripts

px-spark

# Launch SPARK voice loop (Claude backend)
bin/px-spark [--dry-run] [--input-mode voice|text]

px-mind

# Three-layer cognitive daemon (run as systemd service)
bin/px-mind [--awareness-interval 60] [--dry-run]

px-alive

# Idle-alive daemon — gaze drift, sonar proximity react
sudo bin/px-alive [--gaze-min 10] [--gaze-max 25] [--dry-run]
# Yields GPIO on SIGUSR1 for other tools

px-diagnostics

# Quick health check
bin/px-diagnostics --no-motion --short

px-api-server

# REST API + web UI on port 8420
bin/px-api-server [--dry-run]
# Auth: Bearer token from .env PX_API_TOKEN
# Web UI: http://pi:8420
# Public endpoints: /api/v1/public/* (no auth required)

px-wake-listen

# Always-on wake word listener + STT
bin/px-wake-listen [--convo-turns 5]
# STT priority: SenseVoice → faster-whisper → sherpa-onnx → Vosk
# Wake word: "hey robot" (PX_WAKE_WORD env var to change)
# Multi-turn: listens for follow-up after each response

px-battery-poll

# Battery monitoring daemon — polls every 30s
sudo bin/px-battery-poll
# Writes: state/battery.json
# Warns at 30/20/15%, emergency shutdown at 10%

px-frigate-stream

# Camera RTSP stream via go2rtc
bin/px-frigate-stream
# go2rtc exposes rtsp://pi:8554/picar-x
# Frigate on pi5-hailo pulls the stream (pull model)
# Writes PID to logs/px-frigate-stream.pid for camera lock

px-wander

# Autonomous wander — sonar-guided navigation
bin/px-wander [--dry-run]
# Sweeps 5 sonar angles, picks best direction, comments while navigating

px-post

# Social posting daemon — watches thoughts, posts qualifying ones
bin/px-post [--dry-run] [--backfill]
# Two-pass flush: batch feed writes, then 1 social post per cycle
# Branded 1080x1080 thought cards via Pillow
# Privacy filter blocks medical/custody/household content
# Claude QA gate rejects low-quality thoughts
# Bluesky: re-auths on 400/401 (expired token)
# PID-file single-instance guard

tool-weather

# Fetch current weather from BOM (Australian Bureau of Meteorology)
bin/tool-weather
# Output: {"status": "ok", "weather": {"temp_c": 14.2, "summary": "...", ...}}

tool-look

# Pan/tilt camera toward a target
PX_PAN=30 PX_TILT=10 bin/tool-look
# Output: {"status": "ok", "pan": 30, "tilt": 10}
# Yields px-alive GPIO via SIGUSR1 before moving

// roadmap

Milestones and future work.

Foundation (0–1 Month)

✅ Autonomous racing — dual-sensor PD control, per-lap learning, track profiling

✅ SSH login dashboard — system status, tmux, cognitive state, errors, quick actions

✅ Race telemetry on dashboard — calibration, profile, live lap/speed/incidents

✅ Network TTS — GLaDOS on Pi, Qwen3-TTS voice clone on M1

✅ Claude session manager — model routing, rate limiting, self-evolution with real tool access

✅ Obi ↔ SPARK direct chat — two-way authenticated channel, SPARK-initiated messages with exponential backoff, red unread badge, private log

Show all 35 completed items

✅ Upgrade diagnostics to log predictive signals

✅ Extend energy sensing (voltage/temperature)

✅ Boot health service — captures throttle/voltage at boot

✅ Ship safety fallbacks: wake-word halt, watchdog heartbeats

✅ Harden logging paths (FileLock, isolated test fixtures)

✅ Source control: repo at adrianwedd/spark

✅ Three-layer cognitive loop (px-mind) with LLM fallback

✅ SPARK persona + neurodivergent-aware system prompt

✅ REST API + web UI (px-api-server)

✅ Frigate camera stream (go2rtc RTSP pull model)

✅ Battery monitoring — escalating warnings + emergency shutdown

✅ Live dashboard with mood, thoughts, sparklines (this page)

✅ Public API — unauthenticated live data endpoints

✅ Thoughts carousel — real-time inner monologue on home page

✅ Semantic mood colour palette — pulse circle + favicon

✅ obi_mode inference (absent/calm/active/possibly-overloaded)

✅ Social posting — Bluesky (spark.wedd.au) + thought feed (spark.wedd.au/feed/)

✅ Multi-camera Frigate — 4 cameras with per-room presence detection (Hailo AI)

✅ PIN session tokens, file-based rate limiting, two-step device confirmation

✅ Battery glitch filter — time-gapped confirmations, voltage sanity check

✅ Graceful watchdog — SIGTERM + 5s grace instead of os._exit(1)

✅ Branded thought card images — 1080x1080 square PNGs with adaptive text

✅ Two-pass social flush — feed writes decoupled from social rate limits

✅ DST-aware timezone — ZoneInfo("Australia/Hobart") replaces hardcoded UTC+11

✅ Per-IP PIN lockout — 1000-IP hard cap, trusted proxy check, file-based persistence

✅ Atomic writes with fsync — mkstemp + ownership preservation for SD card durability

✅ Single-instance PID guards on all daemons — prevents double speech on restart

✅ Subprocess timeout kills orphans — API tool calls terminate child processes

✅ exploring.json guard — long-running tools prevent px-alive restart mid-operation

✅ FileLock 10s timeout — prevents indefinite hangs on stuck session locks

✅ SEO blitz — JSON-LD, canonical URLs, sitemap, OG tags, fediverse verification

✅ Mood-coloured status dot on all pages — real-time mood from API

✅ Home Assistant integration — presence, sleep, calendar, routines, media context

✅ Bluesky image uploads — branded thought cards attached to social posts

✅ 12-mood colour palette — full arousal spectrum with CSS vars, terminal ANSI, pulse animations

⬜ Gesture-driven stop prototype

⬜ Weekly battery/health summary reports

Growth (1–3 Months)

✅ SPARK Phase 2 — transition warnings, routine support

✅ SPARK Phase 3 — quiet mode, sensory check, dopamine menu

✅ Calendar integration — Obi's Google Calendar drives obi_mode + expression gating

✅ Self-evolution — SPARK introspects on its own patterns and proposes code changes via PRs

⬜ Modular sensor fusion and persistent mapping

⬜ Richer voice summaries, mission templates, gesture recognition

⬜ Simulation CI sweeps (Gazebo or lightweight custom sim)

⬜ Predictive maintenance alerts from historical logs

Visionary (3+ Months)

⬜ Reinforcement learning "dream buffer" and policy sharing

⬜ Autonomous docking, payload auto-detection, multi-car demos

⬜ Central knowledge base syncing maps and logs

⬜ Quantised/accelerated model variants for on-device sustainability

Live Status