Skip to content

AI architecture

ADMINISTRATOR ::: danger Restricted

Internal architecture documentation. Do not paste outside the admin section. :::

RAPAX PMS uses five AI providers in a routed swarm. No single model is the source of truth; every model output is one piece of evidence against the deterministic Master List.

Provider matrix

ProviderModelWhere it runsPrimary jobsBudget guard
Workers AI (Cloudflare)@cf/meta/llama-4-scout-17b-16e-instructWorker-localFilename classifier · magic-byte sniff · short prompts · cheap fallbackNone — Worker-attached
Anthropic Claudeclaude-sonnet-4-6 (chat), claude-opus-4-7 (long-context audit)External APIComponent-card AI populate · code-advisor · long-context Knowledge Query auditwithBudgetGuard — daily $ cap, 429 on overrun
Google Geminigemini-2.5-pro (or fallback)External APIPDF-native extraction · multimodal source-document parsingAbortSignal.timeout(120s) for PDFs
Perplexitysonar-proExternal APIWeb-grounded model lookup · maker/model verificationPer-call timeout
Kimi (Moonshot)kimi-k2.6 (NOT -thinking)External APICL Remap orchestration · Tier 1 / Tier 2 batched JSON extractionTier 1: 120 batch / parallel 5; Tier 2: 60 batch / parallel 3

Kimi constraints (hard-won)

  • Model name always kimi-k2.6-thinking returns 404
  • Temperature must be exactly 0.6 (no thinking) or 1.0 (thinking) — anything else returns 400
  • Tier 1 max_tokens 32768; Tier 2 max_tokens 65536
  • D1 REST /query expects {batch: [...]} wrapper, not a raw array
  • Strict JSON schema is advisory — the parser must tolerate missing required arrays and max_tokens truncation

Router

src/ai-router.js exposes routeAi(jobKind, payload) which:

  1. Picks the provider by jobKind and current health (/api/ai-status exposes per-provider running / idle / failed counts)
  2. Wraps the call in Promise.race against a setTimeout reject (Workers AI binding can't take an AbortSignal)
  3. Records ai_calls rows with provider, model, latency_ms, tokens_in, tokens_out, cost_usd, and any timeout: <Nms> marker so we can distinguish timeouts from external 5xx
  4. Emits ai-status messenger threads on:
    • heal completion (admin-triggered, no dedupe)
    • wizard supersede event (Sofia 23-08 quiet-hours guard)
    • RAG eval recall@5 ≥5pp regression (sentinel-deduped per UTC day)

Chains

src/ai-chains.js defines the named multi-step chains. The most relevant ones for ops:

  • Chain B — extractor pipeline. Reads source-doc content → calls Claude / Gemini → emits structured fields → provenance-tracker.recordProvenance(...) writes (vessel_id, field_path, source_document_id, provenance_quality) rows
  • Chain C — deep extract for vessel-document chunks. Backed by Cloudflare Queue chain-c-steps (DLQ chain-c-steps-dlq) so long-running chunks don't block the request
  • Knowledge-query chainretrieveHybrid → auditLongContext (Kimi K2.6, free tier) → validateCitations → corrective-retry. The audit step uses parseModelJson() with three strategies (fenced block · pure JSON · balanced-brace scan) so hallucinated source_document_ids inside fenced JSON cannot bypass validateCitations

RAG retrieval

Two indexes, both consulted on every query:

IndexBacking storeUsed for
DenseCloudflare Vectorize (binding VEC, model EMBEDDING_MODEL)Semantic similarity over rag_chunks
SparseD1 FTS5 on rag_chunksLexical match for codes, names, makers, model numbers

retrieveHybrid({ vesselId, query, mandatoryOnly?, classFilter? }):

  • mandatoryOnly: true filters rag_chunks.mandatory_class IS NOT NULL — i.e. only chunks from one of the 6 mandatory blockers
  • classFilter: ['particulars', 'capacity_plan', ...] restricts to a named subset
  • Dense and sparse hits merged with reciprocal-rank fusion
  • winner_path trace tag attached to each result: master_fuzzy | legacy_inferUcs | kb-corrected | rag:mandatory | rag:helpful | keyword-rule | fewshot

Provenance authority order

  1. Active Master List (deterministic, code-perfect)
  2. Source-document evidence (vessel_particulars_provenance, provenance_quality='extracted')
  3. Manual override (provenance_quality='manual_override')
  4. RAG retrieval (advisory only, never authoritative)
  5. KB correction (cl_knowledge_base, post-AI rewrite layer)

Knowledge Base (cl_knowledge_base)

The KB is corrective only — it adjusts what the LLM will output next time, never what is in the live PMS state. Rules:

  • Read-only from the LLM — the LLM never writes to it; supervisors and administrators do
  • Quarantinequarantined=1 for orphan codes (target not in ucs_master_list). The KB matcher filters WHERE quarantined=0
  • Healkb-orphan-heal.js re-maps quarantined rows against the active Master at Jaccard ≥ 0.72. Two SQL fixes in v2.31.0.20: component_name LIKE (the column was renamed from name) and version_id IN (SELECT id FROM ucs_foundation_versions WHERE is_active=1)
  • Self-learning hooksrc/self-learning.js:93-130 mirrors non-REJECTED corrections into cl_knowledge_base (seeded_by='auto', learning_weight=1.0) with admin-priority guard (NOT EXISTS clause prevents overwriting seeded_by='admin' rows)

Budget guard

src/budget-guard.js enforces a daily $ cap per provider. On overrun:

  • Returns HTTP 429 with Retry-After: <seconds-until-UTC-midnight>
  • Emits an ai-status messenger thread (sentinel-deduped per UTC day so we don't spam at every overrun)
  • Per-component auto-image endpoint additionally uses a module-level autoImageInFlight Set to return 429 on overlapping calls

Swarm overlay

client/src/components/swarm-overlay.tsx polls /api/ai-status and renders a draggable, collapsible overlay with one tile per provider showing running / idle / failed state. State persistence:

  • swarm_overlay_open in localStorage controls open/closed
  • collapsed is React-only by design (resets on reload) — collapsed state shrinks to a 220px rounded-full pill, expanded restores the 300px rounded-2xl panel

RAG eval cron

Runs at 02:00 UTC. Compares current recall@5 against rolling 7-day baseline. On ≥5pp regression:

  • Sends to EMAIL_DISPATCH_QUEUE (Postmark)
  • Posts to ai-status messenger thread (sentinel-deduped per UTC day; falls through to "notify anyway" if sentinel SELECT errors)

Where to look when something is wrong

SymptomWhere to look first
/api/health says version mismatchclient/src/App.tsx:216 sidebar span vs /api/health.version literal in src/index.js
KB orphan heal returns errorSamples[]src/kb-orphan-heal.js — likely a Master schema drift (column rename), see v2.31.0.20
Wizard upload returns 500Check [wizard-update] / [wizard-insert] log lines for raw D1 error
RAG retrieval returns emptyCheck mandatory_class backfill state — POST /api/admin/backfill/rag-chunks-mandatory-class
AI swarm shows all-failed/api/ai-status for per-provider state; check budget guard for cap-hit
Kimi remap fails on batch Naudit-notes/kimi-run-<vessel>.json snapshot — orchestrator parser is not tolerant to max_tokens truncation as of v2.31.0.35 (known bug)

RAPAX PMS Help · v2.31.0.26 · released 2026-04-28