[ documentation ]

Docs

AI-powered server monitoring that learns your servers and explains what's wrong.

v2.1.1

[ 01 ]

Getting started

Spyd is a lightweight monitoring daemon for Linux servers. It learns what's normal for your box and, when something is actually wrong, sends a clear alert with the cause and the fix. AI and a redacted central brain are on by default — but only after you accept a one-time disclosure; until then Spyd runs fully local.

beta

Spyd is in beta — install it on test or staging hosts for now, not production. It runs read-only and never changes your system, but we're still hardening the edges.

install
# install (runs local-only until you enroll or accept consent)
$ curl -fsSL https://spyd.sh/install.sh | sh

# install AND join a Spyd Cloud org (token from the cockpit)
$ curl -fsSL https://spyd.sh/install.sh | sh -s -- --enroll <token> --accept-terms

Connect to Spyd Cloud (enroll)

Enrollment registers the host with your org. It mints an Ed25519 identity — only the public key leaves the machine. A token is one-time and expires in 24h. With systemd present, enrolling auto-installs the supervised service; after a bare install run spyd service install.

Quick start

$ spyd start          # start the daemon
$ spyd status         # live health, metrics, brain & consent
$ spyd doctor         # verify config, AI, channels, database
$ spyd test-alert     # once a channel is configured
upgrading to 2.1

v2.1 changes what data can leave the host (it preserves external threat indicators as evidence), so the consent disclosure was re-versioned. On upgrade each host drops to local-only safe mode until you re-acknowledge it: run spyd consent accept. Set privacy.preserve_threat_indicators: false first for the privacy-maximal posture.

[ 02 ]

How it works

collect detect local brain ai (optional) alert gate channels

What Spyd monitors

DetectorWatches forSeverity
CPUusage over cpu_percent thresholdwarning · critical
Memoryusage over memory_percent; swap pressurewarning · critical
Diskper-partition usage over disk_percentwarning · critical
Inodesinode table over inode_percentwarning · critical
Load1-min load over load_averagewarning · critical
Logsmatched error patterns in log filesinfo · warning
TLScertificate expiry forecastwarning · critical
DNSresolution failures for monitored nameswarning
SecuritySSH brute-force, miners, DDoS, suspicious login/outbound, crash-looppossible → high

The Local Brain

An on-host store (brain.db) that gives Spyd memory: it fingerprints each incident and records the evidence, diagnosis, and outcome. It is the first decision layer before any AI call, and it never leaves the host.

  • Recognize a recurring incident and reuse the prior diagnosis (no AI call).
  • Suppress evidence-backed benign noise — e.g. a CPU spike during the nightly apt window.
  • Calibrate its confidence against the outcomes you report with spyd resolve.

Retention is by data class (evidence ~30d, diagnoses ~90d, fingerprints ~180d); spyd brain vacuum prunes old history while keeping learned classes. Inspect it with spyd brain, spyd digest, spyd why-quiet, and spyd changes.

The Central Brain

When you enroll, each host syncs a redacted brain_sync to Spyd Cloud (default every 24h, sync.interval_hours). The central brain is a per-org aggregate (Postgres, tenant-isolated by row-level security) that powers cross-fleet Insights and Recurring Issues in the cockpit. It is redacted only — IPs coarsened to /24, usernames pseudonymized, command lines reduced to the program name, secrets stripped — and is reaped after 90 days by default. Full fidelity stays on the host.

How they work together

A host detects an incident and judges it locally against its own brain; if it's known, it's handled (or suppressed) on the box with no AI call. Redacted patterns roll up to the central brain, which aggregates them across your fleet so one host's lesson helps the rest. Outcomes you record with spyd resolve feed back into the local brain's calibration — so it gets more accurate, and quieter, over time.

Adaptive modes

ModeWhenCeiling
lightweightsteady state (default) — core checks + Brain lookup50 MB / 5% CPU
standardauto-elevated on an open incident; de-escalates after a calm window80 MB / 8% CPU
advancedexplicit opt-in only — deeper probes, never auto-enabled120 MB / 12% CPU

The Alert-Worthiness Gate (2.1)

Before anything pages you, every candidate alert passes a single decision layer scoring impact × duration/self-recovery × novelty × actionability:

  • Page — Critical, confirmed-impact, or AI-escalated.
  • Digest — sustained-but-not-harmful; no 3 a.m. page.
  • Suppress — known-benign or already self-recovered.

Severity follows consequence: disk/inode/memory page only near saturation (≈98%); CPU and load are never critical on level alone. SSH brute-force is Critical only when a login actually succeeds. Revert with notifications.routing.worthiness_gate: false.

[ 03 ]

Command reference

Global flag: -c, --config <path> (default ~/.config/spyd/config.yaml). Many read commands support --json.

Daemon

spyd start [-f]Start the daemon. --foreground runs attached.
spyd stop · restartGraceful stop (SIGTERM); restart verifies the new process came up.

Status & diagnostics

spyd statusHealth, metrics vs thresholds, brain health, data-sharing posture.
spyd doctor [--test-alert]Validate config, permissions, DB, brain, AI, consent, channels.
spyd logs · spyd alertsRecent matched log events; recent alerts (--ack <id> to acknowledge).
spyd probesList the read-only diagnostic commands Spyd may run. It never remediates.
spyd diagnose --from-fixtureReplay a recorded incident through the full decision pipeline.

AI

spyd explain "<q>"Ask the AI about an error or log line, with server context.
spyd analyzeCollect metrics, detect anomalies, produce an AI diagnosis of live state.

Local brain & learning

spyd brain [vacuum]What the brain learned; vacuum prunes old raw history (keeps classes).
spyd digest --since 24hWhat was handled silently vs alerted, plus recent changes.
spyd why-quietThe classes Spyd is suppressing and why — quiet is never a black box.
spyd resolve <id> --statusTell Spyd how an incident turned out; the ground truth it learns from.
spyd changes [record]The change ledger — deploys, upgrades, edits; correlated against incidents.

Cloud & consent

spyd enroll <token>Join an org with a one-time token; mints host identity (public key only).
spyd consent [accept|revoke]AI + central-brain sync engage only after you accept the disclosure.

Adaptive mode & lifecycle

spyd mode [pin|unpin]Inspect or override the adaptive runtime mode.
spyd upgrade [--check]Upgrade to latest or a pinned version (atomic, verified, rollback-able).
spyd rollback · version · uninstallRevert to the retained prior binary; show version; remove.
[ 04 ]

Scenarios → alerts

What an event looks like when it reaches you: the cause, the evidence, and a guided, copy-pasteable fix — and when Spyd decided not to alert.

[critical] Disk may fill in 42 minutes on api-prod-01
Confidence: 91% · AI usage: Local Brain, no external AI

Summary:
  /var is 94% full and growing fast. Traced to nginx access logs
  from repeated bot requests to /wp-login.php and /xmlrpc.php.

What to do:
  1. sudo du -h /var/log/nginx/* | sort -h | tail -10
  2. sudo logrotate -f /etc/logrotate.d/nginx
Confirm:  df -h /var
[suppressed] CPU spike from unattended package update on worker-03
Reason: matched apt-get inside the unattended-upgrades window;
        services healthy; CPU normalized after 7 minutes.
 shows under "handled silently" in spyd digest; never paged you.
[ 05 ]

Configuration

Config lives at ~/.config/spyd/config.yaml (mode 0600). Generate with spyd init; validate with spyd config validate.

# AI — on by default via the hosted proxy (no key on the host after consent).
ai:
  provider: spyd        # spyd (cloud proxy) | anthropic | openai | ollama

monitoring:
  interval: 60
  thresholds:
    cpu_percent: 80
    memory_percent: 85
    disk_percent: 90
    # critical pages only near saturation; cpu/load never critical on level alone
    disk_critical_percent: 98
    inode_critical_percent: 98
    memory_critical_percent: 98

# data-sharing: local_only (default until consent) | redacted | full_diagnostic
sync:
  mode: redacted
  endpoint: https://api.spyd.sh

notifications:
  min_severity: warning
  routing:
    worthiness_gate: true   # consequence-based Page/Digest/Suppress (2.1)

# auto-upgrade — an org policy set in the cockpit overrides these locally
upgrade:
  auto_upgrade: true
  window: "02:00-04:00"
  base_url: https://spyd.sh/releases  # internal mirror for air-gapped fleets
[ 06 ]

Alert channels

Spyd delivers full-fidelity alerts to the channels you configure. Test with spyd test-alert -C <channel>. Set notifications.min_severity for the floor.

· Telegram· Slack· Discord· Email· Webhook· Cockpit
[ 07 ]

Spyd Cloud & Cockpit

Spyd Cloud (api.spyd.sh) plus the cockpit (app.spyd.sh) turn a fleet into one screen.

  • Fleet — every host with real liveness (60s heartbeat; graceful stop flips offline instantly).
  • Incidents — a live SSE feed with the same evidence + guided fix; assign owners and roles.
  • Recurring Issues (2.1) — learned patterns across the fleet, populated automatically.
  • Auto-upgrade policy (2.1) — toggle org-wide; overrides each host's local config.
[ 08 ]

Privacy & redaction

Private by default. Nothing leaves the host until you accept a one-time disclosure, and even then your own data is redacted before it is sent to the fleet — IPs coarsened, usernames pseudonymized, command lines reduced to the program name, secrets stripped. External threat indicators are kept as security evidence so alerts stay actionable.

Full detail — data tiers, retention, sub-processors, AI usage, and terms — is on the Privacy & Terms page.