
Google - Site Reliability Engineering
by Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff
Why It Matters for Leaders
This resource is essential for Engineering Leaders as it addresses key challenges in maintaining site reliability and continuous improvement within engineering teams. An actionable takeaway is the implementation of reliability engineering practices that can enhance system performance and team collaboration.
Who Should Read This
Infrastructure and platform engineering leaders. Teams adopting SRE practices. Anyone responsible for system reliability at scale.
What's Inside
1. Core Principles of SRE
2. Emphasizes the importance of reliability in modern applications.
3. Service Level Objectives (SLOs)
4. Defines measurable reliability goals to guide engineering efforts.
5. Monitoring and Incident Response
6. Offers strategies for effective monitoring and rapid incident response to maintain service health.
7. Capacity Planning and Management
8. Discusses techniques for forecasting demand and scaling systems accordingly.
9. Change Management
10. Outlines practices for safe deployment and managing changes to systems.
Tags
๐ฐFeatured in the Newsletter
More Related Content
The future of software engineering is SRE | Swizec Teller
Accelerate: The Science of Lean Software and DevOps
The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses by Eric Ries | Goodreads
The Phoenix Project
Leaders Eat Last by Simon Sinek | Goodreads
Rewarding Talent | Index Ventures
How can I help?