We've been building AI agents for a while now — our CloudWatch alarm agent, Jenkins Sentinel, internal Slack bots. Every time, we hit the same friction: agent frameworks that add thousands of lines of dependency code, abstractions that fight against AWS-native services, and deployment pipelines that need Docker, Makefiles, and custom packaging scripts just to get a Lambda function running.
So we built a module that strips all of that away. Define your agent's behavior in markdown. Write tools as plain Python functions. Deploy with terraform apply. We're open-sourcing it.
The Problem with Agent Frameworks
If you want to build an AI agent today, you're probably looking at LangChain, AutoGen, CrewAI, or similar libraries. They're powerful — but they come with trade-offs:
- Dependency weight: LangChain alone pulls in 50+ transitive dependencies. That's a lot of surface area for a Lambda function.
- Abstraction overhead: Most frameworks wrap the LLM API in layers of classes, chains, and runnables. When something breaks, you're debugging the framework, not your logic.
- Deployment complexity: Getting a framework-heavy agent into Lambda means Docker builds, large layers, and cold start penalties.
- Vendor lock-in: Framework-specific concepts (chains, agents, runnables) don't transfer. Switch frameworks and you rewrite everything.
What if you could skip all of that and talk directly to Bedrock's Converse API — with just enough structure to make it production-ready?
How It Works
The module deploys a complete AI agent stack on AWS. The entire runtime engine is ~140 lines of Python that uses only boto3 — which is already in the Lambda runtime. Zero external dependencies for the core loop.
┌─────────────────┐ ┌──────────────┐ ┌─────────────────────┐
│ Slack / HTTP │────▸│ API Gateway │────▸│ Lambda Function │
└─────────────────┘ └──────────────┘ │ │
│ ┌───────────────┐ │
┌─────────────────┐ │ │ Handler │ │
│ EventBridge │──────────────────────────▸│ │ ↓ │ │
└─────────────────┘ │ │ Runtime Engine│ │
│ │ ↓ ↓ │ │
│ │ Skills Tools │ │
│ └───────────────┘ │
│ ↓ │
│ ┌───────────────┐ │
│ │ Bedrock │ │
│ │ Converse API │ │
│ └───────────────┘ │
└─────────────────────┘
↕
┌─────────────────────┐
│ DynamoDB (memory) │
└─────────────────────┘
The agent loop:
- User sends a message (Slack, HTTP, or EventBridge schedule)
- Handler loads conversation history from DynamoDB (if enabled)
- Runtime engine loads the skill markdown → becomes the system prompt
- Shared rules from
rules/are appended to every prompt - Engine calls
bedrock.converse()in a loop - If the model calls a tool → route to your Python function → feed result back
- When the model returns text, save to memory and respond
That's it. No chains, no runnables, no graph DSL. Just a loop.
Skills as Markdown
This is the core idea. Instead of defining agent behavior in code — classes, decorators, configuration objects — you write a markdown file:
---
name: my-coordinator
version: 1.0.0
description: Routes requests to the right tools
tags: [coordinator, routing]
---
# Agent Coordinator
## When to Use
This is the default entry skill for all interactions.
## Available Tools
- **get_time**: Get the current UTC time
- **get_weather**: Get weather for a city
- **search_logs**: Search CloudWatch logs for errors
## Process
1. Read the incoming message
2. Classify the request type
3. Use the appropriate tool
4. Summarize findings in a clear response
## Guardrails
- Keep responses concise and helpful
- Never fabricate data — use tools when available
- If a tool fails, explain what happened
## Standalone Mode
Without tools, respond conversationally. Explain what
tools would be needed for tool-dependent questions.
Drop this file in skills/ and your agent knows what to do. The markdown becomes the system prompt — readable by developers and non-developers alike.
Why markdown?
- Version-controllable: Skill behavior is tracked in git like any other code
- Reviewable: Product managers and security teams can read and audit agent behavior without knowing Python
- Composable: Multiple skills can be combined through delegation
- Portable: Markdown files work across projects — swap one into a different agent
Shared Rules
Files in rules/ are appended to every skill's system prompt. Use them for company-wide policies:
# Formatting Rules
- Use Slack mrkdwn formatting
- Keep responses under 3000 characters
- Use bullet points for lists
- Bold key findings with *asterisks*
One formatting rule file. Every skill follows it. No duplication.
Tools as Plain Python Functions
No decorators. No base classes. No framework. Just functions:
Define the spec (tools/specs/my_tools.py):
MY_TOOL_SPECS = [
{
"toolSpec": {
"name": "search_logs",
"description": "Search CloudWatch logs for a pattern",
"inputSchema": {
"json": {
"type": "object",
"properties": {
"log_group": {
"type": "string",
"description": "CloudWatch log group name"
},
"pattern": {
"type": "string",
"description": "Search pattern"
}
},
"required": ["log_group", "pattern"]
}
}
}
}
]
Implement the handler (tools/my_tools.py):
import boto3
def search_logs(log_group: str, pattern: str) -> str:
client = boto3.client("logs")
resp = client.filter_log_events(
logGroupName=log_group,
filterPattern=pattern,
limit=20,
)
events = [e["message"] for e in resp.get("events", [])]
return "\n".join(events) if events else "No matching log events found."
Register it (tools/registry.py):
from tools.specs.my_tools import MY_TOOL_SPECS
from tools.my_tools import search_logs
TOOL_HANDLERS = {
"search_logs": lambda name, tool_input: search_logs(**tool_input),
}
def get_all_specs():
return MY_TOOL_SPECS
def handle_tool(name, tool_input):
if name not in TOOL_HANDLERS:
raise ValueError(f"Unknown tool: {name}")
return TOOL_HANDLERS[name](name, tool_input)
That's three files. Your agent can now search CloudWatch logs. Add more tools by adding more functions and specs — no framework code to learn.
Multi-Agent Delegation
For complex workflows, a coordinator skill can delegate to specialized sub-skills. The coordinator doesn't need to know how the sub-skill works — it just hands off a task and gets a result.
User Message
│
▼
┌──────────────────────┐
│ coordinator skill │
│ "Route this request"│
└──────────┬───────────┘
│ delegate_to_skill("log-analyst", "Find errors in prod")
▼
┌───────────────────┐
│ log-analyst skill│
│ (own tools, │
│ own prompt) │
└───────────────────┘
│
▼
Result flows back
to coordinator
Each sub-skill gets its own system prompt and tool set. Delegation depth is limited (default: 3 levels) to prevent infinite recursion. This enables complex multi-agent workflows without service-to-service calls — everything runs in a single Lambda invocation.
Conversation Memory
The module optionally creates a DynamoDB table for multi-turn conversations. Messages are stored per thread with automatic TTL cleanup:
┌─────────────────────────────────────────────┐
│ DynamoDB: my-agent-conversations │
├───────────────────┬─────────────────────────┤
│ PK │ SK │
│ THREAD#slack-123 │ MSG#1710000001.000 │
│ THREAD#slack-123 │ MSG#1710000002.000 │
│ THREAD#slack-456 │ MSG#1710000003.000 │
└───────────────────┴─────────────────────────┘
- History is capped at 100 messages per thread to prevent exceeding Bedrock context windows
- Old messages auto-expire via DynamoDB TTL (configurable, default 30 days)
- Collision-safe sort keys prevent race conditions in concurrent conversations
Enable it with one variable:
enable_memory_table = true
What the Module Deploys
One terraform apply creates everything:
| Component | Description |
|---|---|
| Lambda Function | Your agent code + runtime engine, with create_before_destroy lifecycle |
| Lambda Layer | Optional — pip dependencies from requirements.txt |
| IAM Role | Least-privilege policies scoped to deployment region |
| API Gateway | HTTP API with throttling and access logs (optional) |
| DynamoDB Table | Conversation memory with encryption and TTL (optional) |
| EventBridge Rules | Scheduled tasks for cron-based agent invocations (optional) |
| CloudWatch Logs | Log groups with configurable retention |
Security Built In
- IAM policies are scoped to the deployment region and specific resource ARNs
- Bedrock access is limited to Anthropic models in the deployed region
- Slack signature verification (HMAC-SHA256) runs before processing events
- Secrets stored as
SecureStringin SSM Parameter Store — Lambda only reads explicitly listed prefixes - Tool errors don't leak internal details to end users
- Skill names are validated to prevent path traversal attacks
Cost Breakdown
Running an AI agent on this module is remarkably cheap:
| Component | Configuration | Monthly Cost |
|---|---|---|
| Lambda | 1024 MB, ~1000 invocations/day | ~$3 |
| Bedrock (Claude Sonnet) | ~1000 conversations/day, avg 3 turns | ~$30-80 |
| DynamoDB | On-demand, memory table | ~$2 |
| API Gateway | HTTP API, ~30k requests/month | ~$1 |
| CloudWatch Logs | 14-day retention | ~$2 |
| SSM Parameter Store | Standard parameters | ~$0 |
| Total | Moderate usage | ~$38-88/month |
Most of the cost is Bedrock inference — the infrastructure itself is negligible. Compare that to running a framework-heavy agent on ECS/Fargate ($50-100/month just for compute) or paying per-seat for a hosted agent platform.
Cost optimization tips:
- Use
lambda_reserved_concurrencyto cap concurrent invocations and control Bedrock spend - Set
memory_ttl_daysto auto-clean old conversations - Choose the right model — Claude Haiku for simple routing, Sonnet for complex reasoning
- API Gateway caching reduces duplicate Lambda invocations
Getting Started
Prerequisites
- AWS account with Bedrock access (Claude models enabled)
- Terraform >= 1.5
- Python 3.12
Step 1: Get the Module
Want to try the module? Enter your email to get the GitHub repository URL.
Get Free Access to the Markdown Agent Module
Enter your email to get instant access to the GitHub repository.
No spam. Unsubscribe anytime. We respect your privacy.
Step 2: Create Your Agent
Set up your project structure:
my-agent/
├── main.tf
├── requirements.txt # pip dependencies (e.g., requests)
└── src/
├── orchestrator/
│ ├── handler.py # Lambda entry point (copy from module examples)
│ └── agent.py # Calls runtime engine
├── runtime/
│ ├── engine.py # Copy from module runtime/
│ └── memory.py # Copy from module runtime/
├── skills/
│ └── my-skill.md # Your agent's behavior
├── rules/
│ └── formatting.md # Shared rules
└── tools/
├── registry.py # Tool registry
├── specs/
│ └── my_tools.py # Bedrock toolSpec definitions
└── my_tools.py # Tool implementations
Step 3: Configure Terraform
module "agent" {
source = "github.com/AIOpsCrew/terraform-module-markdown-agent"
name = "my-agent"
environment = "prod"
bedrock_model_id = "us.anthropic.claude-sonnet-4-5-20250929-v1:0"
source_dir = "${path.module}/src"
layer_path = "${path.module}/dist/layer.zip"
ssm_parameter_prefixes = ["/my-agent/slack/*"]
lambda_environment_variables = {
MEMORY_TABLE = "my-agent-conversations"
MODEL_ID = "us.anthropic.claude-sonnet-4-5-20250929-v1:0"
}
enable_api_gateway = true
enable_memory_table = true
tags = {
Project = "my-agent"
}
}
Step 4: Deploy
# Build the Lambda layer (if you have pip dependencies)
bash scripts/build_layer.sh .
# Deploy
terraform init
terraform apply
# Copy the API Gateway URL to your Slack app's Event Subscriptions
terraform output api_gateway_url
Step 5: Add Scheduled Tasks (Optional)
scheduled_tasks = [
{
name = "daily-report"
description = "Generate daily summary"
schedule_expression = "cron(0 13 * * ? *)"
input = {
source = "scheduled"
task = "daily-report"
slack_channel = "C123ABC"
prompt = "Generate the daily operations report"
}
}
]
Real-World Use Cases
Here's what we've built with this module:
1. Slack DevOps Assistant
Skill: devops-coordinator.md with tools for CloudWatch, EC2, RDS
Result: Engineers ask questions in Slack, agent investigates AWS resources in real time
2. Scheduled Compliance Checker
Skill: compliance-auditor.md with tools for AWS Config and IAM
Trigger: EventBridge cron, daily at 9am
Result: Daily Slack report of security findings — no human has to remember to check
3. Customer Support Triage Bot
Skill: support-router.md with delegation to billing-agent.md and technical-agent.md
Result: Multi-agent system that classifies and handles support requests in Slack
4. Incident Response Automation
Skill: incident-responder.md with tools for PagerDuty, Jira, and CloudWatch
Trigger: HTTP webhook from monitoring system
Result: Automatically gathers context, creates Jira ticket, and posts summary to incident channel
Troubleshooting
Slack 3-second timeout: Slack retries if it doesn't get a response within 3 seconds. The handler acknowledges retries with HTTP 200 immediately to prevent duplicate processing.
Bedrock throttling: The engine retries with exponential backoff. For sustained throttling, request a quota increase or set lambda_reserved_concurrency to limit concurrent invocations.
Cold starts: Keep lambda_memory at 1024+ MB for faster initialization. The runtime caches the Bedrock client and SSM secrets across warm invocations.
Contributing
We welcome contributions! Here's how you can help:
Bug Reports: Open an issue on GitHub with your Terraform version, module version, error messages, and expected vs. actual behavior.
Feature Requests: Describe your use case and why the feature would help.
Pull Requests: Fork the repository, create a feature branch, and submit a PR with a clear description of your changes.
Share Your Skills: Built a useful skill markdown file? We'd love to feature community skills in the documentation.
About AI Ops Crew
We build production-ready Terraform modules for AWS operations. Our mission: make infrastructure automation accessible to every engineering team.
Our Modules:
- CloudWatch AI Agent (Premium - $5/mo): AI-powered alarm investigation with real-time AWS analysis
- n8n Fargate Cluster (Free): Workflow automation platform on AWS
- Jenkins Sentinel (Free): AI-powered pipeline failure analysis
- API Gateway Custom Auth (Free): Plugin-based Lambda authorizer for API Gateway
- Markdown Agent (Free): Deploy AI agents on AWS with markdown skills
- More coming soon...
Follow our journey as we open-source more infrastructure tools. Subscribe to our newsletter for updates.
Ready to build your first markdown-driven agent? Get the module and have it running in minutes.
Have questions about the module or want to share your skills? Email us at info@aiopscrew.com or open a discussion on GitHub.