8 Site Reliability Engineering KPIs

Change Success Rate %

Measures the percentage of changes applied to the system that are successful without causing incidents or degradations, indicating the effectiveness of change management.

Error Budget Burn Rate ratio

Measures the rate at which the error budget (the acceptable threshold of unreliability) is consumed.

Incident Reoccurrence Rate %

Calculates the frequency of repeated incidents, highlighting the effectiveness of measures taken to prevent similar future incidents.

Infrastructure Cost Efficiency ratio

Assesses how cost-effectively the infrastructure is utilized, balancing performance and reliability against cost.

Service Level Indicators %

Service Level Indicators (SLIs) are specific, quantifiable measures of service reliability, such as uptime, error rates, or response times.

Service Level Objectives %

Service Level Objectives (SLOs) are targets for Service Level Indicators (SLIs), representing the desired level of service reliability.

Toil Reduction time

Tracks the reduction in toil, which is the repetitive, manual work in system maintenance, over time.