Job Failure Filtering by Type with Automated Credit Refunds
J
Jacob Adams
Summary
Implement a system that categorizes all job failures in CircleCI by type, provides robust filtering capabilities, and automates credit refunds for platform-related failures.
Problem Statement
When jobs fail in CircleCI, customers:
Cannot easily filter and categorize failures by their underlying cause or type
Struggle to distinguish between different failure categories (infrastructure, CircleCI errors, configuration, code, dependencies, etc.)
Lack the ability to analyze patterns in specific types of failures across their pipelines
Waste time manually investigating and categorizing failures that could be automatically classified
Have no streamlined way to identify and receive credit refunds for platform-related failures
Use Cases:
Engineers need to filter out CircleCI related failures to focus on actual code issues
Team leads want to analyze trends in specific failure types over time
Organizations need to identify recurring failure patterns across multiple projects
Finance teams need to track and recover credits spent on platform-related failures
Administrators need to validate and request refunds for credits spent on CircleCI errors/issues
Proposed Solution:
Implement a comprehensive failure type classification and filtering system with integrated credit refund automation:
- Failure Classification Framework
Create a taxonomy of failure types covering all possible job failures
Include categories such as:
Infrastructure failures
Platform issues
Resource limitations (memory, CPU, disk)
Configuration errors (YAML, environment)
Dependency failures (build tools, libraries)
Test failures (unit, integration, UI)
Deployment failures
Timeout issues
Automatically classify failures using pattern matching and machine learning
Flag platform-responsibility failures that qualify for credit refunds
Allow users to manually classify or reclassify failures when needed
- Enhanced Filtering Interface
Add failure type as a primary filter in the jobs dashboard
Enable multi-select filtering across multiple failure types
Provide nested category filtering (main category → subcategory)
Allow combining failure type filters with other existing filters (project, branch, user, etc.)
Include specific filters for refund-eligible failures
Support saving and sharing of filter configurations
- Automated Credit Refund System
Automatically identify jobs that failed due to platform issues
Calculate credit usage for refund-eligible failures
Generate monthly credit refund reports with itemized failure details
Provide one-click refund request functionality for eligible failures
Implement automatic credit refunds for confirmed platform issues
Create an audit trail of refund requests and processing
Include refund status tracking (pending, approved, processed)
- Credit Usage Analytics
Display historical credit usage broken down by failure types
Show potential savings from addressing specific failure types
Track refunded credits vs. total credits used
Provide forecasting based on failure patterns and credit usage
- API Integration
Extend the CircleCI API to include failure type parameters
Enable programmatic filtering of jobs by failure type
Support credit refund status and requests via API
Allow third-party integrations to leverage failure type and credit data
Business Value
Improved Troubleshooting Efficiency: Reduces time spent analyzing failures
Better Resource Allocation: Helps teams focus on the most impactful failure types
Enhanced Visibility: Provides clear insights into failure patterns
Financial Fairness: Ensures customers don't pay for platform-related issues
Reduced Support Burden: Decreases manual credit refund requests and processing
Increased Customer Trust: Demonstrates transparency and accountability
Technical Considerations
Requires development of a comprehensive failure classification system
Must integrate with billing and credit management systems
Should handle both real-time classification and historical reclassification
Needs automated detection algorithms for platform-related failures
Must maintain high performance with complex filtering
Should include approval workflows for credit refunds
Success Metrics
Reduction in time spent troubleshooting failures
Decrease in support tickets related to credit refunds
Increased automation rate for refund processing
User satisfaction with failure classification accuracy
Adoption of failure type filtering across the user base
Improved transparency in credit usage reporting
Prioritization Factors:
This feature addresses both technical and financial pain points for CircleCI users. By enabling filtering by failure type and automating credit refunds, we can significantly improve the user experience, make troubleshooting more efficient, ensure fair billing practices, and reduce administrative overhead for both customers and CircleCI support.