A curated list of Chaos Engineering resources.
updated at Dec. 1, 2024, 1:10 p.m.
The list of continuous integration services and tools
updated at Dec. 1, 2024, 12:06 p.m.
A curated list of Site Reliability and Production Engineering Tools
updated at Dec. 1, 2024, 1:17 a.m.
A collection of postmortems. Sorry for the delay in merging PRs!
updated at Nov. 30, 2024, 4:28 p.m.
📙 Amazon Web Services — a practical guide
updated at Nov. 30, 2024, 4:02 a.m.
Run Book / Operations Manual template for modern software systems
updated at Nov. 29, 2024, 7:13 a.m.
A collection of postmortem templates
updated at Nov. 27, 2024, 6:05 p.m.
Compilation of public failure/horror stories related to Kubernetes
updated at Nov. 19, 2024, 6:08 a.m.
Tips and tricks for getting through on-call
updated at Oct. 16, 2024, 8:06 p.m.
A lifecycle model for describing incident management
updated at Sept. 30, 2024, 7:35 p.m.