Skip to content

System Health Triage

Use system-health-check when the machine feels slow, hot, network-stuck, or unstable and you want one report to inspect before guessing.

The command samples system signals, scans recent logs for known failure words, prints a friendly verdict, and saves the full output to a timestamped report.

For the reference summary of utility commands, see System Utilities.

Terminal window
system-health-check

Reports are saved under:

${XDG_STATE_HOME:-~/.local/state}/system-health-check/

The default run collects five snapshots, fifteen seconds apart, and scans the last fifteen minutes of user and kernel journal logs.

Run only the sections you need:

Terminal window
system-health-check --only cpu,memory,psi --samples 3 --interval 5
system-health-check --only thermal --samples 4 --interval 10
system-health-check --only logs --known-logs-minutes 60

Valid sections are:

SectionWhat it checks
cpuLoad average and CPU busy percentage.
memoryAvailable memory and swap growth.
psiLinux pressure stall information for CPU, memory, and IO.
networkPrimary interface traffic and TCP socket growth.
diskRoot filesystem usage.
thermalThermal zone readings and temperature rise.
logsRecent user and kernel journal lines matching known failure patterns.

When you want immediate AI follow-up, run:

Terminal window
system-health-check --open-opencode

After saving the report, the script opens an interactive OpenCode session with a prompt that points to the report path.

Use this when you want diagnosis and next steps in the same flow. Without an interactive TTY, the script prints a warning instead of trying to open OpenCode.

  1. Run system-health-check while the problem is happening or soon after it happens.
  2. Read the Friendly Summary section first.
  3. Treat Watch as a signal to monitor and Stressed as a signal to investigate immediately.
  4. Use the report path from the final line for follow-up analysis.
  5. If the problem is narrow, rerun with --only for that subsystem.

Healthy means the sampled window looked stable.

Watch means one or more signals crossed a moderate threshold, such as elevated CPU pressure, noticeable memory drop, high disk usage, thermal rise, or noisy logs.

Stressed means at least one stronger threshold was crossed, such as very high CPU busy, sharp swap growth, high PSI, critical disk usage, high thermal readings, or many known log matches.

These are heuristics, not proof of root cause. Use them to decide where to inspect next.

The repo also ships a user timer for a delayed dot doctor check at login:

~/.config/systemd/user/dot-doctor-startup.timer
~/.config/systemd/user/dot-doctor-startup.service

The timer runs dot-doctor-notify sixty seconds after startup. Use it as a passive startup check; run dot doctor directly when you need an immediate health result.

Enable it if you want a startup warning for broken dotfiles state:

Terminal window
systemctl --user enable --now dot-doctor-startup.timer

For broader machine updates, use the topgrade wrapper documented in System Utilities. It logs the full session to:

${XDG_STATE_HOME:-~/.local/state}/topgrade.log

Keep health triage and update runs separate: collect a health report first when debugging instability, then update once you know whether the machine is already under stress.