A curated list of Site Reliability and Production Engineering Tools
updated at Nov. 17, 2024, 5:36 a.m.
📙 Amazon Web Services — a practical guide
updated at Nov. 17, 2024, 3:10 a.m.
The list of continuous integration services and tools
updated at Nov. 16, 2024, 11:59 p.m.
A collection of postmortem templates
updated at Nov. 16, 2024, 5:03 a.m.
A curated list of Chaos Engineering resources.
updated at Nov. 15, 2024, 12:46 p.m.
A collection of postmortems. Sorry for the delay in merging PRs!
updated at Nov. 14, 2024, 9:02 p.m.
Run Book / Operations Manual template for modern software systems
updated at Nov. 8, 2024, 7:41 a.m.
Compilation of public failure/horror stories related to Kubernetes
updated at Nov. 5, 2024, 3:18 p.m.
Tips and tricks for getting through on-call
updated at Oct. 16, 2024, 8:06 p.m.
A lifecycle model for describing incident management
updated at Sept. 30, 2024, 7:35 p.m.