og-aws by open-guides

📙 Amazon Web Services — a practical guide

created at July 13, 2016, 5:30 p.m.

Shell

1,217 +1

29,548 +20

3,075 +4

GitHub
post-mortems by danluu

A collection of postmortems. Sorry for the delay in merging PRs!

created at Aug. 3, 2015, 12:20 a.m.

Unknown languages

510 +0

8,183 +15

316 +0

GitHub
kubernetes-failure-stories by hjacobs

Compilation of public failure/horror stories related to Kubernetes

created at Jan. 19, 2019, 10:42 a.m.

HTML

494 +0

6,215 -1

291 +0

GitHub
awesome-chaos-engineering by dastergon

A curated list of Chaos Engineering resources.

created at July 26, 2017, 5:05 p.m.

Unknown languages

297 +0

4,358 +18

474 +4

GitHub
awesome-ci by ligurio

List of Continuous Integration services

created at Oct. 28, 2014, 7:07 a.m.

Unknown languages

114 +0

2,571 +8

213 +0

GitHub
postmortem-templates by dastergon

A collection of postmortem templates

created at May 29, 2017, 10:41 a.m.

Unknown languages

29 +0

721 +4

256 +0

GitHub
run-book-template by SkeltonThatcher

Run Book / Operations Manual template for modern software systems

created at July 9, 2016, 7:23 p.m.

Unknown languages

37 +0

547 +2

301 +0

GitHub
oncall-handbook by alicegoldfuss

Tips and tricks for getting through on-call

created at Nov. 7, 2016, 3:46 a.m.

Unknown languages

13 +0

348 +0

43 +0

GitHub
awesome-sre-tools by SquadcastHub

A curated list of Site Reliability and Production Engineering Tools

created at March 9, 2020, 7:36 a.m.

Unknown languages

13 +1

303 +34

50 +2

GitHub
incident-lifecycle-model by preed

A lifecycle model for describing incident management

created at Nov. 4, 2016, 4:40 p.m.

Unknown languages

2 +0

22 +0

6 +0

GitHub
sre-playground by fhivemind

🎯 A set of Site Reliability Engineering notes & challenges

created at Feb. 2, 2020, 12:24 p.m.

Python

1 +0

21 +0

3 +0

GitHub