sre-playground by fhivemind

A set of Site Reliability Engineering notes & challenges

updated at Jan. 12, 2024, 8:33 a.m.

Python

2 +0

33 +0

5 +0

GitHub
incident-lifecycle-model by preed

A lifecycle model for describing incident management

updated at March 10, 2024, 12:03 p.m.

Unknown languages

2 +0

31 +0

7 +0

GitHub
run-book-template by SkeltonThatcher

Run Book / Operations Manual template for modern software systems

updated at March 19, 2024, 3:31 p.m.

Unknown languages

38 +0

697 +0

345 +0

GitHub
oncall-handbook by alicegoldfuss

Tips and tricks for getting through on-call

updated at March 23, 2024, 5:56 a.m.

Unknown languages

12 +0

396 +0

43 +0

GitHub
SRE-cheat-sheet by shibumi

A vocabulary collection for SREs

updated at April 18, 2024, 8:42 p.m.

Unknown languages

11 +0

182 +0

29 +0

GitHub
postmortem-templates by dastergon

A collection of postmortem templates

updated at April 26, 2024, 4:46 p.m.

Unknown languages

35 +1

1,228 +5

412 +3

GitHub
kubernetes-failure-stories by hjacobs

Compilation of public failure/horror stories related to Kubernetes

updated at April 27, 2024, 9:16 a.m.

HTML

472 +0

6,235 +0

309 +0

GitHub
awesome-sre-tools by SquadcastHub

A curated list of Site Reliability and Production Engineering Tools

updated at April 27, 2024, 2:11 p.m.

Unknown languages

37 -1

1,108 +6

154 +0

GitHub
og-aws by open-guides

📙 Amazon Web Services — a practical guide

updated at April 27, 2024, 5:03 p.m.

Shell

1,215 +0

35,385 +20

3,809 -2

GitHub
post-mortems by danluu

A collection of postmortems. Sorry for the delay in merging PRs!

updated at April 27, 2024, 8:13 p.m.

Unknown languages

557 +0

11,092 +11

431 +0

GitHub
awesome-chaos-engineering by dastergon

A curated list of Chaos Engineering resources.

updated at April 28, 2024, 7:29 a.m.

Unknown languages

311 +0

5,793 +13

637 +2

GitHub
awesome-ci by ligurio

List of Continuous Integration services

updated at April 28, 2024, 8:50 a.m.

Unknown languages

130 +0

3,493 +7

256 +0

GitHub