Getting started
Spyd is a lightweight monitoring daemon for Linux servers. It learns what's normal for your box and, when something is actually wrong, sends a clear alert with the cause and the fix. AI and a redacted central brain are on by default — but only after you accept a one-time disclosure; until then Spyd runs fully local.
Spyd is in beta — install it on test or staging hosts for now, not production. It runs read-only and never changes your system, but we're still hardening the edges.
# install (runs local-only until you enroll or accept consent) $ curl -fsSL https://spyd.sh/install.sh | sh # install AND join a Spyd Cloud org (token from the cockpit) $ curl -fsSL https://spyd.sh/install.sh | sh -s -- --enroll <token> --accept-terms
Connect to Spyd Cloud (enroll)
Enrollment registers the host with your org. It mints an Ed25519 identity — only the public key
leaves the machine. A token is one-time and expires in 24h. With systemd present, enrolling
auto-installs the supervised service; after a bare install run spyd service install.
Quick start
$ spyd start # start the daemon $ spyd status # live health, metrics, brain & consent $ spyd doctor # verify config, AI, channels, database $ spyd test-alert # once a channel is configured
v2.1 changes what data can leave the host (it preserves external threat indicators as evidence),
so the consent disclosure was re-versioned. On upgrade each host drops to local-only safe mode
until you re-acknowledge it: run spyd consent accept. Set
privacy.preserve_threat_indicators: false first for the privacy-maximal posture.
How it works
collect → detect → local brain → ai (optional) → alert gate → channels
What Spyd monitors
| Detector | Watches for | Severity |
|---|---|---|
| CPU | usage over cpu_percent threshold | warning · critical |
| Memory | usage over memory_percent; swap pressure | warning · critical |
| Disk | per-partition usage over disk_percent | warning · critical |
| Inodes | inode table over inode_percent | warning · critical |
| Load | 1-min load over load_average | warning · critical |
| Logs | matched error patterns in log files | info · warning |
| TLS | certificate expiry forecast | warning · critical |
| DNS | resolution failures for monitored names | warning |
| Security | SSH brute-force, miners, DDoS, suspicious login/outbound, crash-loop | possible → high |
The Local Brain
An on-host store (brain.db) that gives Spyd memory: it fingerprints each incident and
records the evidence, diagnosis, and outcome. It is the first decision layer before any AI call, and
it never leaves the host.
- — Recognize a recurring incident and reuse the prior diagnosis (no AI call).
- — Suppress evidence-backed benign noise — e.g. a CPU spike during the nightly apt window.
- — Calibrate its confidence against the outcomes you report with
spyd resolve.
Retention is by data class (evidence ~30d, diagnoses ~90d, fingerprints ~180d); spyd brain
vacuum prunes old history while keeping learned classes. Inspect it with spyd brain,
spyd digest, spyd why-quiet, and spyd changes.
The Central Brain
When you enroll, each host syncs a redacted brain_sync
to Spyd Cloud (default every 24h, sync.interval_hours).
The central brain is a per-org aggregate (Postgres, tenant-isolated by row-level security) that powers
cross-fleet Insights and Recurring
Issues in the cockpit. It is redacted only — IPs coarsened
to /24, usernames pseudonymized, command lines reduced to the program name, secrets stripped — and is
reaped after 90 days by default. Full fidelity stays on the host.
How they work together
A host detects an incident and judges it locally against its own brain; if it's known, it's handled
(or suppressed) on the box with no AI call. Redacted patterns roll up to the central brain, which
aggregates them across your fleet so one host's lesson helps the rest. Outcomes you record with
spyd resolve feed back into the local brain's calibration — so it gets more accurate, and
quieter, over time.
Adaptive modes
| Mode | When | Ceiling |
|---|---|---|
| lightweight | steady state (default) — core checks + Brain lookup | 50 MB / 5% CPU |
| standard | auto-elevated on an open incident; de-escalates after a calm window | 80 MB / 8% CPU |
| advanced | explicit opt-in only — deeper probes, never auto-enabled | 120 MB / 12% CPU |
The Alert-Worthiness Gate (2.1)
Before anything pages you, every candidate alert passes a single decision layer scoring impact × duration/self-recovery × novelty × actionability:
- — Page — Critical, confirmed-impact, or AI-escalated.
- — Digest — sustained-but-not-harmful; no 3 a.m. page.
- — Suppress — known-benign or already self-recovered.
Severity follows consequence: disk/inode/memory page only near saturation (≈98%); CPU and load are
never critical on level alone. SSH brute-force is Critical only when a login actually succeeds.
Revert with notifications.routing.worthiness_gate: false.
Command reference
Global flag: -c, --config <path> (default ~/.config/spyd/config.yaml).
Many read commands support --json.
Daemon
| spyd start [-f] | Start the daemon. --foreground runs attached. |
| spyd stop · restart | Graceful stop (SIGTERM); restart verifies the new process came up. |
Status & diagnostics
| spyd status | Health, metrics vs thresholds, brain health, data-sharing posture. |
| spyd doctor [--test-alert] | Validate config, permissions, DB, brain, AI, consent, channels. |
| spyd logs · spyd alerts | Recent matched log events; recent alerts (--ack <id> to acknowledge). |
| spyd probes | List the read-only diagnostic commands Spyd may run. It never remediates. |
| spyd diagnose --from-fixture | Replay a recorded incident through the full decision pipeline. |
AI
| spyd explain "<q>" | Ask the AI about an error or log line, with server context. |
| spyd analyze | Collect metrics, detect anomalies, produce an AI diagnosis of live state. |
Local brain & learning
| spyd brain [vacuum] | What the brain learned; vacuum prunes old raw history (keeps classes). |
| spyd digest --since 24h | What was handled silently vs alerted, plus recent changes. |
| spyd why-quiet | The classes Spyd is suppressing and why — quiet is never a black box. |
| spyd resolve <id> --status | Tell Spyd how an incident turned out; the ground truth it learns from. |
| spyd changes [record] | The change ledger — deploys, upgrades, edits; correlated against incidents. |
Cloud & consent
| spyd enroll <token> | Join an org with a one-time token; mints host identity (public key only). |
| spyd consent [accept|revoke] | AI + central-brain sync engage only after you accept the disclosure. |
Adaptive mode & lifecycle
| spyd mode [pin|unpin] | Inspect or override the adaptive runtime mode. |
| spyd upgrade [--check] | Upgrade to latest or a pinned version (atomic, verified, rollback-able). |
| spyd rollback · version · uninstall | Revert to the retained prior binary; show version; remove. |
Scenarios → alerts
What an event looks like when it reaches you: the cause, the evidence, and a guided, copy-pasteable fix — and when Spyd decided not to alert.
[critical] Disk may fill in 42 minutes on api-prod-01
Confidence: 91% · AI usage: Local Brain, no external AI
Summary:
/var is 94% full and growing fast. Traced to nginx access logs
from repeated bot requests to /wp-login.php and /xmlrpc.php.
What to do:
1. sudo du -h /var/log/nginx/* | sort -h | tail -10
2. sudo logrotate -f /etc/logrotate.d/nginx
Confirm: df -h /var [suppressed] CPU spike from unattended package update on worker-03 Reason: matched apt-get inside the unattended-upgrades window; services healthy; CPU normalized after 7 minutes. → shows under "handled silently" in spyd digest; never paged you.
Configuration
Config lives at ~/.config/spyd/config.yaml (mode 0600). Generate with
spyd init; validate with spyd config validate.
# AI — on by default via the hosted proxy (no key on the host after consent). ai: provider: spyd # spyd (cloud proxy) | anthropic | openai | ollama monitoring: interval: 60 thresholds: cpu_percent: 80 memory_percent: 85 disk_percent: 90 # critical pages only near saturation; cpu/load never critical on level alone disk_critical_percent: 98 inode_critical_percent: 98 memory_critical_percent: 98 # data-sharing: local_only (default until consent) | redacted | full_diagnostic sync: mode: redacted endpoint: https://api.spyd.sh notifications: min_severity: warning routing: worthiness_gate: true # consequence-based Page/Digest/Suppress (2.1) # auto-upgrade — an org policy set in the cockpit overrides these locally upgrade: auto_upgrade: true window: "02:00-04:00" base_url: https://spyd.sh/releases # internal mirror for air-gapped fleets
Alert channels
Spyd delivers full-fidelity alerts to the channels you configure. Test with
spyd test-alert -C <channel>. Set notifications.min_severity for the floor.
Spyd Cloud & Cockpit
Spyd Cloud (api.spyd.sh) plus the cockpit (app.spyd.sh) turn a fleet into
one screen.
- — Fleet — every host with real liveness (60s heartbeat; graceful stop flips offline instantly).
- — Incidents — a live SSE feed with the same evidence + guided fix; assign owners and roles.
- — Recurring Issues (2.1) — learned patterns across the fleet, populated automatically.
- — Auto-upgrade policy (2.1) — toggle org-wide; overrides each host's local config.
Privacy & redaction
Private by default. Nothing leaves the host until you accept a one-time disclosure, and even then your own data is redacted before it is sent to the fleet — IPs coarsened, usernames pseudonymized, command lines reduced to the program name, secrets stripped. External threat indicators are kept as security evidence so alerts stay actionable.
Full detail — data tiers, retention, sub-processors, AI usage, and terms — is on the Privacy & Terms page.