When Jenkins pipelines fail, DevOps engineers face the same tedious ritual: scroll through hundreds of log lines, search for error patterns, cross-reference with recent changes, and eventually piece together the root cause. This manual investigation can take anywhere from 15 minutes to several hours per failure.
Jenkins Sentinel eliminates this entirely.
It's an event-driven AI agent that automatically analyzes pipeline failures using AWS Bedrock (bring your own model), classifies the root cause, and delivers actionable fixes directly to Slack or email—before an engineer even opens Jenkins.
The Problem We Solved
Consider a typical day at a company with 200+ Jenkins pipelines:
- 9:15 AM: Three pipelines fail simultaneously
- 9:20 AM: DevOps engineer gets paged
- 9:25 AM: Opens Jenkins, starts scrolling through logs
- 9:45 AM: Identifies the issue—NPM registry timeout
- 10:00 AM: Reruns the job, it passes
- Total time: 45 minutes for a transient network issue
Now multiply this by 5-10 failures per day. That's 4-8 hours of engineering time spent on log archaeology.
Jenkins Sentinel changes this:
- 9:15 AM: Pipelines fail
- 9:15 AM: Slack notification arrives with root cause + fixes
- 9:16 AM: Engineer reruns the job
- Total time: 1 minute
How It Works
Jenkins Sentinel follows a three-stage event-driven pipeline:
Jenkins Pipeline Failure
│
▼
┌─────────────────────────────────────────┐
│ Webhook (API Gateway) │
│ • HMAC-SHA256 signature validation │
│ • Prevents unauthorized requests │
└─────────────────┬───────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Ingestion Lambda │
│ • Validates payload structure │
│ • Sends to analyzer queue │
└─────────────────┬───────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Analyzer SQS Queue │
│ • Provides durability + backpressure │
│ • Dead Letter Queue for failures │
└─────────────────┬───────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Analyzer Lambda │
│ • Fetches build logs from Jenkins │
│ • Redacts secrets (AWS keys, tokens) │
│ • Wraps logs in XML boundaries │
│ • Calls Bedrock model for analysis │
│ • Sends results to dispatcher queue │
└─────────────────┬───────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Dispatcher Lambda │
│ • Routes to enabled adapters │
│ • SNS (email), Slack, Teams, PagerDuty│
└─────────────────┬───────────────────────┘
│
┌───────┴───────┐
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Slack │ │ Email │
│ Block Kit │ │ (via SNS) │
└─────────────┘ └─────────────┘
Why SQS?
For organizations running 1000+ pipelines, SQS provides:
- Durability: Messages persist until processed (no loss during Bedrock throttling)
- Backpressure: Configurable concurrency prevents API overload
- Dead Letter Queues: Failed analyses are preserved for investigation
- Observability: Queue depth metrics enable proactive monitoring
Key Features
1. Intelligent Failure Classification
Sentinel categorizes every failure into one of five categories:
| Category | Examples |
|---|---|
| ENVIRONMENT | Network timeouts, disk space issues, Docker daemon failures, AWS credential expiry |
| SECURITY | Secret scanning failures, audit violations, permission denied errors |
| CODE | Compilation errors, test failures, linting violations |
| DEPENDENCY | npm/pip install failures, version conflicts, missing packages |
| CONFIGURATION | Invalid Jenkinsfile syntax, missing environment variables |
Each analysis includes:
- Root cause: One-sentence summary
- Suggested fixes: 1-5 actionable steps, prioritized
- Confidence score: 0.0-1.0 indicating certainty
- Relevant log lines: Key error messages extracted
- Human review flag: Set when AI confidence is low
2. Security-First Design
Secret Redaction: Before logs reach the LLM, seven regex patterns redact:
- AWS access keys (
AKIA*) - AWS secret keys
- Generic passwords and tokens
- JWT tokens (
eyJ...) - Private keys (
-----BEGIN...END-----) - GitHub tokens (
ghp_*) - npm tokens (
npm_*)
Prompt Injection Defense: Logs are wrapped in XML boundary tags with explicit instructions. The model only analyzes content within the tags—malicious log content can't hijack the prompt.
HMAC-SHA256 Webhooks: Cryptographic signatures prevent unauthorized requests. Constant-time comparison guards against timing attacks.
3. Circuit Breaker Pattern
When Jenkins becomes unavailable (maintenance, network issues), Sentinel gracefully degrades:
- Opens circuit after 5 consecutive failures
- Automatic recovery after 30 seconds
- Prevents cascade failures across the system
4. Rich Slack Notifications
Slack messages use Block Kit formatting with:
- Color-coded severity (red for failures)
- Category badges for quick scanning
- Root cause prominently displayed
- Suggested fixes in numbered list
- Direct link to Jenkins build
- One-click "View Logs" action
What the AI Output Looks Like
Here's an actual Sentinel analysis for a Node.js build failure:
{
"root_cause": "npm install failed due to ECONNREFUSED connecting to registry.npmjs.org",
"category": "ENVIRONMENT",
"suggested_fixes": [
"Rerun the pipeline - this appears to be a transient network issue",
"If retries fail, check if npm registry is experiencing an outage at status.npmjs.org",
"Consider adding a registry mirror (Artifactory, Verdaccio) for resilience",
"Add retry logic to npm install: npm install --retry 3"
],
"confidence_score": 0.92,
"relevant_log_lines": [
"npm ERR! code ECONNREFUSED",
"npm ERR! syscall connect",
"npm ERR! errno -111",
"npm ERR! FetchError: request to https://registry.npmjs.org/lodash failed"
],
"requires_human_review": false
}
This transforms into a Slack message in seconds:
🔴 Pipeline Failure: my-app/main #1234
Category: ENVIRONMENT
Confidence: 92%
Root Cause:
npm install failed due to ECONNREFUSED connecting to registry.npmjs.org
Suggested Fixes:
1. Rerun the pipeline - this appears to be a transient network issue
2. If retries fail, check if npm registry is experiencing an outage
3. Consider adding a registry mirror for resilience
4. Add retry logic: npm install --retry 3
[View Build] [View Logs]
Get Access to Jenkins Sentinel
Jenkins Sentinel is currently in private beta. Enter your email below to request access to the repository.
Request Access to Jenkins Sentinel
Enter your email to request access to the private GitHub repository.
We'll review your request and send you the repository URL once approved.
Deployment
Once you have access, Jenkins Sentinel deploys as a self-contained Terraform module:
module "jenkins_sentinel" {
source = "github.com/AIOpsCrew/jenkins-sentinel//terraform/modules/jenkins-sentinel"
# Jenkins Connection
jenkins_url = "https://jenkins.example.com"
jenkins_username = "sentinel-user"
jenkins_api_token = var.jenkins_api_token
# Source Paths
source_path = "${path.module}/../../../src"
requirements_path = "${path.module}/../../../requirements.txt"
# Notifications
notification_email = "devops@example.com"
enabled_adapters = "sns,slack"
slack_incoming_webhook_url = var.slack_webhook
}
The module creates:
- Lambda functions (ingestion, analyzer, dispatcher)
- SQS queues with dead letter queues
- SNS topic for email notifications
- API Gateway HTTP endpoint
- IAM roles with least-privilege permissions
- Secrets Manager entries for Jenkins credentials
- CloudWatch log groups
Jenkins Integration
Add Sentinel to your pipelines with the shared library:
@Library('jenkins-sentinel') _
pipeline {
agent any
stages {
stage('Build') {
steps {
sh 'npm install && npm run build'
}
}
stage('Test') {
steps {
sh 'npm test'
}
}
}
post {
failure {
notifySentinel(
credentialsId: 'sentinel-webhook-secret',
apiUrlCredentialsId: 'sentinel-api-url',
severity: 'error',
additionalContext: [
team: 'platform',
oncall: 'devops@example.com'
]
)
}
}
}
The shared library:
- Retrieves webhook secret from Jenkins credentials store
- Builds payload with job info, git details, build URL
- Generates HMAC-SHA256 signature
- Sends POST request to Sentinel endpoint
- Non-blocking (won't fail pipeline if notification fails)
Cost Analysis
Jenkins Sentinel is serverless—you only pay for what you use.
Typical Monthly Costs (100 failures/day)
| Component | Configuration | Monthly Cost |
|---|---|---|
| Lambda | 3 functions, ~3000 invocations | ~$2 |
| AWS Bedrock | ~90K input tokens/day | ~$15 |
| SQS | ~100K messages | ~$0.50 |
| API Gateway | ~3000 requests | ~$0.50 |
| Secrets Manager | 2 secrets | ~$1 |
| CloudWatch Logs | Standard retention | ~$2 |
| SNS | Email notifications | ~$0.50 |
| Total | ~$22/month |
ROI Calculation
Without Sentinel (manual investigation):
- 10 failures/day × 30 min/failure × $75/hr engineer cost = $375/day
- Monthly: $11,250
With Sentinel (automated analysis):
- Infrastructure: $22/month
- Engineer review time: 10 failures × 2 min × $75/hr = $25/day
- Monthly: $772
Savings: $10,478/month (93% reduction)
Even at just 5 failures per day, Sentinel pays for itself within the first week.
Security Considerations
What Data Reaches the LLM?
- Build logs (after secret redaction)
- Jenkinsfile content (if available)
- Job metadata (name, build number, duration)
What Never Leaves Your AWS Account?
- Jenkins credentials (stored in Secrets Manager)
- Webhook secrets
- Original unredacted logs
Compliance
- SOC 2: All data encrypted at rest (KMS) and in transit (TLS)
- GDPR: No PII in build logs (typically)
- HIPAA: Can be deployed in compliant AWS regions
Comparison to Alternatives
| Feature | Jenkins Sentinel | Manual Triage | Generic Log Aggregator |
|---|---|---|---|
| Time to root cause | Seconds | 15-60 min | Minutes (search) |
| Suggested fixes | Yes | No | No |
| Works offline | No | Yes | Yes |
| Setup complexity | Low (Terraform) | None | Medium |
| Monthly cost | ~$22 | $11K+ (eng time) | $200-500 |
| Proactive notifications | Yes | No | Configurable |
| AI-powered analysis | Yes | No | Rarely |
Roadmap
We're actively developing Jenkins Sentinel. Here's what's coming:
Q1 2026:
- GitHub Actions support
- GitLab CI support
- Custom prompt templates
Q2 2026:
- Historical trend analysis ("this failure occurred 3 times this week")
- Auto-retry for transient failures
- Integration with incident management (Opsgenie, VictorOps)
Q3 2026:
- Multi-region deployment
- Self-healing suggestions (automatic PR creation)
- Fine-tuned model for your codebase patterns
Getting Started
Ready to eliminate manual log investigation?
Prerequisites
- AWS account with Bedrock access (any supported model)
- Jenkins instance with API access
- Terraform 1.6.0+
Step 1: Request Access
Request access to the repository by entering your email above. We'll review your request and send you the GitHub repository URL.
Step 2: Clone the Repository
Once approved, clone the repository:
git clone https://github.com/AIOpsCrew/jenkins-sentinel.git
cd jenkins-sentinel
Step 3: Configure Variables
Create terraform.tfvars:
jenkins_url = "https://jenkins.example.com"
jenkins_username = "sentinel-service-account"
jenkins_api_token = "your-api-token"
notification_email = "devops@example.com"
enabled_adapters = "sns,slack"
# Optional: Slack integration
slack_incoming_webhook_url = "https://hooks.slack.com/services/..."
Step 4: Deploy
cd terraform/environments/dev
terraform init
terraform apply
Step 5: Configure Jenkins
- Add the shared library to Jenkins global configuration
- Store webhook secret in Jenkins credentials
- Add
post { failure { notifySentinel(...) } }to pipelines
Step 6: Test It
Intentionally fail a pipeline and watch Sentinel deliver the analysis to Slack/email.
About This Project
Jenkins Sentinel was born from frustration. After spending countless hours debugging Jenkins failures at scale, we built the tool we wished existed.
We're currently in private beta as we refine the product with early adopters. Once stable, we plan to open source the project because we believe every DevOps team deserves intelligent automation—not just those with ML expertise.
Built with:
- Python 3.12
- AWS Bedrock (bring your own model)
- Terraform
- Pydantic for data validation
- Jenkins Shared Library (Groovy)
Community & Support
Get Involved
- Request Access: Sign up above to join the private beta
- Beta Feedback: Help shape the product with your feedback
- Early Adopters: Get priority support and influence the roadmap
Need Help?
- Documentation: Full guide included in the repository
- Email: info@aiopscrew.com
- Consulting: Complex Jenkins environments? We can help.
Conclusion
Jenkins Sentinel transforms pipeline failures from interruptions into minor notifications. Instead of context-switching to investigate logs, your team reviews AI-generated analysis and moves on.
For a typical organization:
- MTTR reduction: 30+ minutes → 2 minutes
- Engineering time saved: 90%+
- Cost: $22/month vs. $11K+ in manual investigation
Stop doing log archaeology. Let AI handle the tedious parts so your engineers can focus on building.
Request access to the private beta and start transforming your Jenkins pipeline failures into actionable insights.
Have questions about Jenkins Sentinel? Email us at info@aiopscrew.com or request access to join the discussion with other beta users.