AI Processing Platform — Architecture Design

Architecture Evolution

From MVP to Scale — infrastructure grows with business demand

User

API Service (ECS Fargate, Go+Echo)

S3

SQS (Single Queue)

STT Worker (Fargate)

LLM Worker (Fargate)

AWS Transcribe

Amazon Bedrock

RDS PostgreSQL (Single-AZ)

ElastiCache Redis (Single)

$Monthly Cost

$800 – $1,500

⚡Throughput

~50 tasks/min

☁Compute

ECS Fargate

⚙AI Models

AWS Transcribe + Bedrock

↻Deployment

Rolling Update

Task Processing Flow

From audio upload to summarized result — fully async pipeline

1

2

3

4

5

6

7

8

9

Capacity Planner

Calculate GPU requirements and costs for your workload

Monthly Task Volume50,000

1K10K100K1M

Average Audio Length (min)3.0 min

0.5510

Recommended PhasePhase 2: Growth

STT GPUs Needed

1

LLM GPUs Needed

1

Estimated Monthly Cost(Self-hosted)

$1,340

vs Managed Service Cost(Managed)

$3,850

Savings65%

Cost Crossover Analysis

When does self-hosting GPU become cheaper than managed services?

Managed (AWS Transcribe + Bedrock)

Self-Hosted (GPU)

Technology Selection

Technology choices evolve with each phase — here's what we chose and why

Compute Platform

ECS Fargate

No GPU needed — zero cluster management, per-second billing. Control plane cost ($73/mo for EKS) not justified at this scale.

✕

Self-managed K8s

High ops burden, no advantage over EKS for GPU workloads

✕

Docker Compose

No auto-scaling, suitable only for local dev

STT Model

AWS Transcribe

Zero GPU ops, pay-per-minute ($0.024/min). Acceptable cost below 50K tasks/month. Validate product first.

✕

Google Speech-to-Text

Similar per-minute pricing, less integrated with AWS ecosystem

✕

OpenAI Whisper API

Higher latency (cross-cloud), less cost-effective at scale

LLM Model

Amazon Bedrock (Claude/Titan)

Zero GPU infrastructure. Access to frontier models. ~$0.003/task. Ideal for POC validation.

✕

OpenAI GPT-4o

Higher per-call cost, external dependency, no self-hosting option

Message Queue

SQS (Single Queue)

One queue with message attributes to distinguish STT vs LLM tasks. Simplest setup for low volume.

✕

Kafka

Event streaming semantics — overkill for task queue. MSK $200+/mo minimum

✕

RabbitMQ

Requires self-hosting. Used in local dev only (docker-compose)

Database

RDS PostgreSQL (Single-AZ)

ACID for task state. Single-AZ keeps cost low (~$15/mo). Acceptable downtime risk for POC.

✕

Aurora

3x cost, 6-way replication overkill for this workload

✕

DynamoDB

Poor at relational queries needed for task state management

Cache

ElastiCache Redis (Single Node)

Result caching, idempotency locks, rate limiting. Single node sufficient for POC (~$12/mo).

✕

Memcached

No persistence, no pub/sub, no SETNX for distributed locks

Observability

CloudWatch

Built-in with AWS. Zero setup. Sufficient for basic metrics and logs at POC stage.

✕

Datadog

Per-host pricing scales expensively with GPU nodes

✕

New Relic

Similar per-host cost concern

Deployment Strategy

ECS Rolling Update

Simple rolling update. No canary needed for POC. Fast iteration.

✕

Blue-Green

Requires 2x resources during deployment, more costly

Language & Framework

Go + Echo

Goroutines (~2KB each) excel at I/O-bound HTTP calls. Single binary ~10MB Docker image. Echo provides mature middleware.

✕

Python

GIL limits concurrency; ML ecosystem irrelevant since workers only make HTTP calls

✕

Java

Higher memory (~1MB/thread), slower cold start

Architecture Characteristics

Six dimensions of system quality

9/10

KEDA + Cluster Autoscaler for proactive, queue-depth-driven scaling

KEDA scales workers based on SQS queue depth (proactive, not reactive)
Cluster Autoscaler provisions GPU nodes when pods are unschedulable
Spot + On-Demand GPU mix for cost optimization (40-60% savings)
API pods scale via standard HPA on CPU utilization

Deployment & Operations

From local development to production with zero-downtime canary releases

D

Dev

Infradocker-compose

GPU0

ServicesMock

Cost: ~$30/mo

S

Staging

InfraEKS 3 nodes

GPU1

ServicesReal models

Cost: ~$800/mo

P

Production

InfraEKS 6+ nodes

GPU4+

ServicesFull stack

Cost: Phase-dependent

CI/CD Pipeline

Push

Lint

Test

Build

Scan

PR

ECR

Staging

Approval

Canary

CI

CD

Canary Deployment

Phase 1

10% canary90% stable

Phase 2

30% canary70% stable

Phase 3

100% canary