Goal
Find root causes fast when CI/CD fails.
Debug checklist
- Which step failed?
- Is it deterministic or flaky?
- Did dependencies change?
- Is the environment different from local?
- Does it fail only on main?
Tactics
- rerun with debug logs
- print versions:
node -v
npm -v
uname -a
- upload artifacts (build logs, test reports)
- isolate by running only the failing step
Common failures
- missing env var in CI
- different Node version
- network timeouts during install
- tests depending on order/time
Next Step
Package with Docker so deploy uses the exact same artifact everywhere.