Cover of Google - Site Reliability Engineering

Google - Site Reliability Engineering

by Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff

Published 2016·529 pages
SoftwareDevelopmentEngineering ManagementDelivery and Execution
View on sre.google →

Why It Matters for Leaders

This resource is essential for Engineering Leaders as it addresses key challenges in maintaining site reliability and continuous improvement within engineering teams. An actionable takeaway is the implementation of reliability engineering practices that can enhance system performance and team collaboration.

Who Should Read This

Infrastructure and platform engineering leaders. Teams adopting SRE practices. Anyone responsible for system reliability at scale.

What's Inside

1. Core Principles of SRE

2. Emphasizes the importance of reliability in modern applications.

3. Service Level Objectives (SLOs)

4. Defines measurable reliability goals to guide engineering efforts.

5. Monitoring and Incident Response

6. Offers strategies for effective monitoring and rapid incident response to maintain service health.

7. Capacity Planning and Management

8. Discusses techniques for forecasting demand and scaling systems accordingly.

9. Change Management

10. Outlines practices for safe deployment and managing changes to systems.

Tags

Reliability & SREContinuous ImprovementBooks
Share: