og-aws by open-guides

📙 Amazon Web Services — a practical guide

updated at April 21, 2024, 5:48 a.m.

Shell

1,215 -1

35,365 +15

3,811 +7

GitHub
awesome-sre-tools by SquadcastHub

A curated list of Site Reliability and Production Engineering Tools

updated at April 20, 2024, 2:18 p.m.

Unknown languages

38 +1

1,102 +12

154 +0

GitHub
awesome-chaos-engineering by dastergon

A curated list of Chaos Engineering resources.

updated at April 20, 2024, 8:24 a.m.

Unknown languages

311 +0

5,780 +5

635 +0

GitHub
awesome-ci by ligurio

List of Continuous Integration services

updated at April 19, 2024, 10:35 p.m.

Unknown languages

130 +0

3,486 +5

256 +0

GitHub
post-mortems by danluu

A collection of postmortems. Sorry for the delay in merging PRs!

updated at April 19, 2024, 3:30 p.m.

Unknown languages

557 +0

11,081 +16

431 +0

GitHub
postmortem-templates by dastergon

A collection of postmortem templates

updated at April 19, 2024, 12:42 a.m.

Unknown languages

34 +0

1,223 +4

409 +0

GitHub
SRE-cheat-sheet by shibumi

A vocabulary collection for SREs

updated at April 18, 2024, 8:42 p.m.

Unknown languages

11 +0

182 +1

29 +0

GitHub
kubernetes-failure-stories by hjacobs

Compilation of public failure/horror stories related to Kubernetes

updated at April 16, 2024, 3:47 p.m.

HTML

472 +0

6,235 +1

309 +0

GitHub
oncall-handbook by alicegoldfuss

Tips and tricks for getting through on-call

updated at March 23, 2024, 5:56 a.m.

Unknown languages

12 +0

396 +0

43 +0

GitHub
run-book-template by SkeltonThatcher

Run Book / Operations Manual template for modern software systems

updated at March 19, 2024, 3:31 p.m.

Unknown languages

38 +0

697 +0

345 +0

GitHub
incident-lifecycle-model by preed

A lifecycle model for describing incident management

updated at March 10, 2024, 12:03 p.m.

Unknown languages

2 +0

31 +0

7 +0

GitHub
sre-playground by fhivemind

A set of Site Reliability Engineering notes & challenges

updated at Jan. 12, 2024, 8:33 a.m.

Python

2 +0

33 +0

5 +0

GitHub