AI Compliance Score score
Assesses the AI model's adherence to ethical guidelines, regulatory standards, and best practices in AI development.
Dive into code quality, deployment frequencies, and team productivity.
Assesses the AI model's adherence to ethical guidelines, regulatory standards, and best practices in AI development.
Indicates the number of bugs per a certain amount of lines of code, providing insight into the overall quality of the code.
Calculates the frequency of build failures in the Continuous Integration (CI) process.
Measures how effectively the development team utilizes their available capacity for deployments and handling changes.
Calculates the proportion of deployments that result in failure in production, necessitating immediate remedies like hotfixes or rollbacks.
Assesses the AI model's adherence to ethical guidelines, regulatory standards, and best practices in AI development.
Data Completeness evaluates the extent to which necessary data is available for model training.
Measures the diversity in the training dataset, ensuring that the model is exposed to a wide range of scenarios.
This KPI tracks the time taken for data to move through the entire pipeline, from collection and processing to being ready for use in model training.
Measures the amount of data processed per unit of time in the data pipeline, indicating the pipeline's efficiency and capacity.
Feature Importance Score evaluates the impact of different input features on the model’s predictions.
Label Accuracy quantifies the correctness of the labels in the training dataset.
This metric assesses the overall accuracy of an AI model, indicating the percentage of total predictions made correctly, both positives and negatives.
The F1 Score is the harmonic mean of Precision and Recall, providing a balance between them.
The frequency at which the AI model fails to provide a valid output or encounters errors during operation.
This index assesses how understandable the model’s decisions or predictions are to humans.
Model Precision measures the accuracy of positive predictions made by an AI model.
Model Recall, or Sensitivity, calculates the proportion of actual positives correctly identified.
Model Robustness Score measures an AI model's ability to maintain performance when exposed to new, unseen data or adversarial conditions.
Evaluates how well an AI model maintains its performance as the amount of data increases.
Measures how often an AI model is updated or retrained.
The duration taken to train an AI/ML model.
The time taken from when a model is fully trained until it is deployed in a production environment.
Indicates the number of bugs per a certain amount of lines of code, providing insight into the overall quality of the code.
Calculates the frequency of build failures in the Continuous Integration (CI) process.
Measures the complexity of the code, which can impact maintainability and readability.
Represents the percentage of code that is covered by automated tests, which is crucial for ensuring that as much code as possible is tested to identify defects.
Quantifies the amount of duplicated code in a codebase.
Indicators of deeper problems in code, 'code smells' are patterns that may not be outright bugs but suggest design issues that can increase the risk of bugs or failures in the future.
The percentage of the CPU's capacity that the application uses during execution, impacting the application's performance and server load.
Measures the percentage of defects that escape into production, signifying the effectiveness of pre-release testing.
Measures the extent to which the codebase is documented.
Flaky tests are those that produce inconsistent results each time they are run.
Amount of memory used by the application during execution.
Counts the number of revisions a pull request goes through before merging, which can indicate the clarity of requirements and effectiveness of initial submissions.
Refers to the size of pull requests in terms of lines of code, where smaller pull requests are generally easier to review and less likely to introduce errors.
The average time taken for the system to respond to a request in a production environment.
Measures the percentage of tests that pass during the development process.
Tracks the amount of time spent addressing technical debt, which includes refactoring code, improving design, or updating documentation, crucial for long-term project health.
The average duration from when a pull request is opened until it is merged.
Measures how effectively the development team utilizes their available capacity for deployments and handling changes.
Calculates the proportion of deployments that result in failure in production, necessitating immediate remedies like hotfixes or rollbacks.
Assesses the percentage of changes or deployments that are successfully implemented without causing failures or outages.
Measures the rate of software deployments over a specified period.
Tracks the total duration from the inception of an idea to its deployment in production.
Reflects the average time required to recover from a failure in the production environment.
Measures the percentage of the CPU's capacity utilized by the application during execution, impacting performance and server load.
Assesses the extent to which the codebase is documented.
Indicates the amount of memory used by the application during execution.
Counts the number of revisions a pull request goes through before being merged, indicating the clarity of requirements and the effectiveness of initial submissions.
The time taken for the system to respond to a request in a production environment.
Measures the amount of work a team completes in a sprint or iteration, typically in story points or number of features.
Tracks the time dedicated to addressing technical debt, including code refactoring and design improvement, essential for long-term project health.
Reflects the average duration from when a pull request is opened until it is merged.
Calculates the total cost associated with incidents, including lost revenue, remediation efforts, and any compensation to customers.
Evaluates how incidents affect customers, considering factors like downtime, data loss, or reduced functionality.
The frequency at which incidents are escalated to higher-level teams or management, indicating the complexity of incidents and potential gaps in initial response capabilities.
The total number of incidents recorded in a given period.
Assesses the average time taken for a team to acknowledge an incident after detection.
Measures the average time taken to detect an incident after it has occurred, indicating the effectiveness of monitoring and alerting systems.
Evaluates the promptness of conducting a thorough investigation (post-mortem) after an incident to determine its root cause.
Tracks the percentage of action items identified in post-mortem analyses that are successfully completed, reflecting the team’s commitment to improving based on past incidents.
Categorizes incidents based on their severity levels, such as critical, high, medium, and low.
Measures how quickly teams analyze and derive learnings from incidents, crucial for improving systems and processes to prevent future occurrences.
Measures the percentage of changes applied to the system that are successful without causing incidents or degradations, indicating the effectiveness of change management.
Gauges the satisfaction level of employees with on-call responsibilities, reflecting the workload, stress level, and overall work-life balance.
Measures the rate at which the error budget (the acceptable threshold of unreliability) is consumed.
Calculates the frequency of repeated incidents, highlighting the effectiveness of measures taken to prevent similar future incidents.
Assesses how cost-effectively the infrastructure is utilized, balancing performance and reliability against cost.
Service Level Indicators (SLIs) are specific, quantifiable measures of service reliability, such as uptime, error rates, or response times.
Service Level Objectives (SLOs) are targets for Service Level Indicators (SLIs), representing the desired level of service reliability.
Tracks the reduction in toil, which is the repetitive, manual work in system maintenance, over time.