AI Processing Platform

A 3-phase evolutionary architecture for scalable AI task processing

Architecture Evolution

From MVP to Scale — infrastructure grows with business demand

User
API Service (ECS Fargate, Go+Echo)
S3
SQS (Single Queue)
STT Worker (Fargate)
LLM Worker (Fargate)
AWS Transcribe
Amazon Bedrock
RDS PostgreSQL (Single-AZ)
ElastiCache Redis (Single)
$Monthly Cost
$800 – $1,500
Throughput
~50 tasks/min
Compute
ECS Fargate
AI Models
AWS Transcribe + Bedrock
Deployment
Rolling Update

Task Processing Flow

From audio upload to summarized result — fully async pipeline

1
2
3
4
5
6
7
8
9

Capacity Planner

Calculate GPU requirements and costs for your workload

50,000
1K10K100K1M
3.0 min
0.5510
Recommended PhasePhase 2: Growth

STT GPUs Needed

1

LLM GPUs Needed

1

Estimated Monthly Cost(Self-hosted)

$1,340

vs Managed Service Cost(Managed)

$3,850

Savings65%

Cost Crossover Analysis

When does self-hosting GPU become cheaper than managed services?

Managed (AWS Transcribe + Bedrock)
Self-Hosted (GPU)

Technology Selection

Technology choices evolve with each phase — here's what we chose and why

Compute Platform

ECS Fargate

No GPU needed — zero cluster management, per-second billing. Control plane cost ($73/mo for EKS) not justified at this scale.

Self-managed K8s

High ops burden, no advantage over EKS for GPU workloads

Docker Compose

No auto-scaling, suitable only for local dev

STT Model

AWS Transcribe

Zero GPU ops, pay-per-minute ($0.024/min). Acceptable cost below 50K tasks/month. Validate product first.

Google Speech-to-Text

Similar per-minute pricing, less integrated with AWS ecosystem

OpenAI Whisper API

Higher latency (cross-cloud), less cost-effective at scale

LLM Model

Amazon Bedrock (Claude/Titan)

Zero GPU infrastructure. Access to frontier models. ~$0.003/task. Ideal for POC validation.

OpenAI GPT-4o

Higher per-call cost, external dependency, no self-hosting option

Message Queue

SQS (Single Queue)

One queue with message attributes to distinguish STT vs LLM tasks. Simplest setup for low volume.

Kafka

Event streaming semantics — overkill for task queue. MSK $200+/mo minimum

RabbitMQ

Requires self-hosting. Used in local dev only (docker-compose)

Database

RDS PostgreSQL (Single-AZ)

ACID for task state. Single-AZ keeps cost low (~$15/mo). Acceptable downtime risk for POC.

Aurora

3x cost, 6-way replication overkill for this workload

DynamoDB

Poor at relational queries needed for task state management

Cache

ElastiCache Redis (Single Node)

Result caching, idempotency locks, rate limiting. Single node sufficient for POC (~$12/mo).

Memcached

No persistence, no pub/sub, no SETNX for distributed locks

Observability

CloudWatch

Built-in with AWS. Zero setup. Sufficient for basic metrics and logs at POC stage.

Datadog

Per-host pricing scales expensively with GPU nodes

New Relic

Similar per-host cost concern

Deployment Strategy

ECS Rolling Update

Simple rolling update. No canary needed for POC. Fast iteration.

Blue-Green

Requires 2x resources during deployment, more costly

Language & Framework

Go + Echo

Goroutines (~2KB each) excel at I/O-bound HTTP calls. Single binary ~10MB Docker image. Echo provides mature middleware.

Python

GIL limits concurrency; ML ecosystem irrelevant since workers only make HTTP calls

Java

Higher memory (~1MB/thread), slower cold start

Architecture Characteristics

Six dimensions of system quality

9/10

KEDA + Cluster Autoscaler for proactive, queue-depth-driven scaling

  • KEDA scales workers based on SQS queue depth (proactive, not reactive)
  • Cluster Autoscaler provisions GPU nodes when pods are unschedulable
  • Spot + On-Demand GPU mix for cost optimization (40-60% savings)
  • API pods scale via standard HPA on CPU utilization

Deployment & Operations

From local development to production with zero-downtime canary releases

D

Dev

Infradocker-compose
GPU0
ServicesMock
Cost: ~$30/mo
S

Staging

InfraEKS 3 nodes
GPU1
ServicesReal models
Cost: ~$800/mo
P

Production

InfraEKS 6+ nodes
GPU4+
ServicesFull stack
Cost: Phase-dependent

CI/CD Pipeline

Push
Lint
Test
Build
Scan
PR
ECR
Staging
Approval
Canary
CI
CD

Canary Deployment

Phase 1
10% canary90% stable
Phase 2
30% canary70% stable
Phase 3
100% canary