← Back to Blog

Eliminating Noisy Calls in Amazon Lex with a Transcribe Pre-Processing Pattern

AI Ops Crew Team • March 7, 2026 • 10 min read

awsamazon-lexamazon-transcribecontact-centerterraformopen-source

If you've deployed Amazon Lex behind Amazon Connect, you've probably hit this: a caller is at a busy intersection, a coffee shop, or driving with the windows down, and your bot completely falls apart. It mis-transcribes utterances, matches the wrong intent, falls back repeatedly, and eventually transfers the caller to an agent who has to start from scratch.

The problem isn't Lex's NLU. It's the built-in speech-to-text. And there's a surprisingly simple pattern to fix it.

The Problem: Lex's STT Wasn't Built for Noise

When you send audio to Lex via RecognizeUtterance, here's what happens internally:

Caller Audio (8kHz PCM) --> Lex Built-in STT --> Text --> NLU --> Intent Match

Lex's internal STT engine works fine in quiet environments. But in real-world contact center scenarios, callers are often in noisy places. Background traffic, music, crowds, wind, car noise, and other environmental sounds degrade transcription accuracy significantly.

The result is a cascade of failures:

Garbled transcription -- Lex hears "I need help with my bill" as "I need held mitt pile"
Wrong intent match -- The garbled text matches FallbackIntent instead of AccountLookup
Retry loop -- The bot asks the caller to repeat, they do, same noise, same failure
Agent transfer -- After 2-3 fallbacks, the bot gives up and transfers to an agent

You can lower the NLU confidence threshold, but that doesn't fix the root cause. The text going into NLU is wrong.

The Pattern: Decouple STT from NLU

The fix is to stop using Lex's built-in speech-to-text and instead pre-process the audio through Amazon Transcribe before sending the cleaned text to Lex:

Standard Path (noisy environments fail):
  Caller Audio --> Lex RecognizeUtterance (built-in STT + NLU)

Improved Path (noise-resilient):
  Caller Audio --> Amazon Transcribe Streaming --> Clean Text --> Lex RecognizeText (NLU only)

Why does this work? Amazon Transcribe's telephony models are specifically trained on real-world call audio with background noise. It's been battle-tested across thousands of Amazon Connect deployments. When you give it the same noisy audio that trips up Lex's internal STT, Transcribe produces dramatically better transcriptions.

The key insight is that RecognizeText gives you the same NLU pipeline as RecognizeUtterance, minus the STT step. You get the same intent matching, slot filling, and dialog management. You're just feeding it better input.

Architecture

We built a complete Terraform module that deploys both paths side-by-side with an A/B comparison test harness:

                     +-------------------+
                     |   Microphone /    |
                     |   Audio Source    |
                     +--------+----------+
                              |
                +-------------+-------------+
                |                           |
                v                           v
  +----------------------------+  +----------------------------+
  |    Path A: Direct Audio    |  |  Path B: Transcribe Pre-   |
  |                            |  |     processor              |
  |  Resample to 8kHz PCM     |  |  Resample to 16kHz PCM     |
  |           |                |  |           |                |
  |           v                |  |           v                |
  |  Lex RecognizeUtterance    |  |  Transcribe Streaming      |
  |  (built-in STT + NLU)     |  |  (telephony-optimized)     |
  |           |                |  |           |                |
  |           v                |  |           v                |
  |  Intent + Confidence       |  |  Clean text                |
  |                            |  |           |                |
  |                            |  |           v                |
  |                            |  |  Lex RecognizeText         |
  |                            |  |  (NLU only)                |
  |                            |  |           |                |
  |                            |  |           v                |
  |                            |  |  Intent + Confidence       |
  +----------------------------+  +----------------------------+
                |                           |
                +-------------+-------------+
                              |
                              v
                     +-------------------+
                     |  Compare Results  |
                     |  Transcript,      |
                     |  Intent, Score    |
                     +-------------------+

What Gets Deployed

The Terraform module creates:

Amazon Lex V2 Bot with 5 intents (Greeting, AccountLookup, TransferToAgent, EndCall, Fallback)
Lambda fulfillment handler with slot validation, retry tracking, and fallback escalation
S3 bucket for audio/text conversation logging (30-day lifecycle to IA, 90-day expiry)
CloudWatch dashboard tracking fallback count, agent transfer rate, and retry count
IAM roles scoped for least privilege

The Lex Bot

The bot is designed to exercise real-world contact center patterns:

Intent	Purpose	Slots
`GreetingIntent`	Opening the conversation	None
`AccountLookup`	Multi-turn slot filling	AccountNumber (6-10 digits), IssueType (billing/technical/general), CallerName
`TransferToAgent`	Explicit agent request	None
`EndCallIntent`	Closing the conversation	None
`FallbackIntent`	Catch-all with escalation	None (auto-transfers after 2 consecutive fallbacks)

The AccountLookup intent is deliberately complex. It requires multi-turn dialog with slot validation, which is exactly where noisy audio causes the most failures. When the bot asks "What is your account number?" and the caller says "1234567" in a noisy environment, the standard path frequently mis-transcribes the digits.

The Lambda Fulfillment Handler

The fulfillment handler implements noise-aware retry logic:

def handle_dialog_code_hook(event):
    """Validate slots during dialog -- retry-aware"""
    session_attrs = event.get('sessionState', {}).get('sessionAttributes', {}) or {}
    retry_count = int(session_attrs.get('retryCount', '0'))

    # Validate AccountNumber: must be 6-10 digits
    account_number = slots.get('AccountNumber', {})
    if account_number and account_number.get('value', {}).get('interpretedValue'):
        value = account_number['value']['interpretedValue']
        if not value.isdigit() or not (6 <= len(value) <= 10):
            retry_count += 1
            if retry_count >= max_retries:
                # Too many retries -- transfer to agent
                return transfer_to_agent(session_attrs)
            return elicit_slot('AccountNumber', 'Please provide a valid account number.',
                             session_attrs, retry_count)

Key behaviors:

Retry tracking via session attributes persisted across turns
Progressive escalation -- after configurable max retries (default 3), transfers to agent
Fallback counting -- 2 consecutive fallbacks triggers agent transfer
Metric emission -- CloudWatch metric filters track fallback, retry, and transfer rates

Running the A/B Comparison

The test harness records audio from your microphone and sends the same recording through both paths simultaneously:

# Clone and deploy
git clone https://github.com/AIOpsCrew/terraform-module-lexbot-noisy-caller-poc.git
cd ai-lex-bot/terraform
terraform init && terraform apply

# Set up the test harness
cd ..
python3 -m venv venv && source venv/bin/activate
pip install boto3 pyaudio numpy scipy amazon-transcribe

# Run the comparison
python3 record_and_send.py

The output shows both paths side by side:

[Path A -- Direct to Lex (0.8s)]
  Heard: i need help mitt my account
  Intent: FallbackIntent (0.42)
  Bot says: I didn't understand that. Could you try again?

[Path B -- Transcribe (0.3s) + Lex text (0.2s)]
  Transcribe heard: i need help with my account
  Intent: AccountLookup (0.97)
  Bot says: I'd be happy to help! What is your account number?

In noisy environments, the difference is stark. Path B consistently outperforms Path A on intent accuracy, especially for multi-turn dialogs where slot values contain numbers or proper nouns.

Latency Trade-off

Path B adds an extra hop. In our testing:

Metric	Path A	Path B	Delta
STT Latency	~0ms (built into Lex)	200-500ms (Transcribe)	+200-500ms
NLU Latency	~300ms	~200ms	-100ms
Total	~300ms	~400-700ms	+100-400ms

The additional latency is noticeable but acceptable for most contact center scenarios. The accuracy improvement far outweighs the speed cost, especially when you factor in the time wasted on retries and agent transfers in the standard path.

Deploying in Production

The test harness proves the pattern. For production deployment in Amazon Connect, the architecture looks like this:

Caller --> Connect --> Kinesis Video Stream --> Lambda --> Transcribe Streaming
                                                             |
                                                             v
                                                    Clean Transcription
                                                             |
                                                             v
                                          Lex RecognizeText --> Contact Flow

The Lambda function sits between Connect's Kinesis Video Stream and Lex. It:

Receives the audio stream from Connect
Forwards it to Transcribe Streaming
Takes the transcribed text
Calls Lex RecognizeText with the clean transcription
Returns the bot response back to the contact flow

This is a drop-in replacement for the standard Connect + Lex integration. The contact flow doesn't change. The caller experience doesn't change. The only difference is better accuracy.

Cost Analysis

For a contact center handling 10,000 calls/month with an average of 5 utterances per call:

Service	Standard Path	Transcribe Pre-process Path
Lex	$3.75 (50k audio reqs)	$0.75 (50k text reqs)
Transcribe	$0	$50 (833 minutes)
Lambda	$0.50	$1.00
Total	$4.25/mo	$51.75/mo

The Transcribe pre-processing path costs more. But consider the cost of agent time:

If the standard path causes 500 unnecessary agent transfers per month
At an average handle time of 5 minutes and $25/hour agent cost
That's $1,042/month in wasted agent time

The Transcribe path pays for itself many times over by keeping callers in the self-service flow.

Getting Started

The entire project is open source and deploys with a single terraform apply:

Get the Lex Noisy Calls Elimination Module

Enter your email to get instant access to the GitHub repository.

No spam. Unsubscribe anytime. We respect your privacy.

Prerequisites

AWS account with Lex V2, Transcribe, Lambda, and S3 access
Terraform 1.5+ with both aws and awscc providers
Python 3.10+ for the test harness
A microphone (for live A/B testing)

Quick Start

# Deploy the infrastructure
cd terraform
terraform init
terraform apply

# Run the A/B test
cd ..
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python3 record_and_send.py

The test harness supports interactive commands:

Enter -- Record and send through both paths
a -- Path A only (direct Lex audio)
b -- Path B only (Transcribe pre-process)
n -- Start new sessions
q -- Quit

When to Use This Pattern

This pattern is most valuable when:

Your callers are frequently in noisy environments (field workers, drivers, public spaces)
You're seeing high fallback rates or unnecessary agent transfers
Your bot handles slot filling with numbers, names, or specific values
You need measurable accuracy data to justify the investment

It's less necessary when:

Callers are primarily in quiet office environments
Your bot only handles simple yes/no intents
Latency is more critical than accuracy (rare in contact centers)

Conclusion

The standard Amazon Lex audio pipeline works fine in ideal conditions. But contact centers don't operate in ideal conditions. By decoupling speech-to-text from natural language understanding, you can plug in a purpose-built STT engine that handles real-world noise.

The pattern is simple: Transcribe Streaming for STT, Lex RecognizeText for NLU. Two services, each doing what they do best.

The module is open source, deploys in minutes, and includes a test harness so you can measure the improvement in your own environment before committing to production changes.