Day 5: Message Queues & Async Processing

What You'll Learn Today

Synchronous vs asynchronous communication and when to use each
Message queue core concepts (Producer, Consumer, Topic, Partition)
Kafka vs RabbitMQ vs SQS comparison
Event-driven architecture patterns
Pub/Sub pattern and its applications
Delivery semantics: exactly-once, at-least-once, at-most-once

Synchronous vs Asynchronous Communication

In a synchronous system, the caller waits for a response before proceeding. In an asynchronous system, the caller sends a message and continues without waiting.

flowchart TB
    subgraph Sync["Synchronous"]
        SA["Service A"]
        SB["Service B"]
        SA -->|"1. Request"| SB
        SB -->|"2. Wait..."| SB
        SB -->|"3. Response"| SA
    end
    subgraph Async["Asynchronous"]
        AA["Service A"]
        Q["Message Queue"]
        AB["Service B"]
        AA -->|"1. Send message"| Q
        Q -->|"2. Process later"| AB
    end
    style Sync fill:#ef4444,color:#fff
    style Async fill:#22c55e,color:#fff
    style SA fill:#ef4444,color:#fff
    style SB fill:#ef4444,color:#fff
    style AA fill:#22c55e,color:#fff
    style Q fill:#f59e0b,color:#fff
    style AB fill:#22c55e,color:#fff

Aspect	Synchronous	Asynchronous
Coupling	Tight (caller depends on receiver)	Loose (decoupled by queue)
Latency	Caller waits for full processing	Caller returns immediately
Failure handling	Caller fails if receiver is down	Message is queued; retried later
Scalability	Limited by slowest service	Services scale independently
Complexity	Simple to implement	Requires message infrastructure
Debugging	Easy (linear flow)	Harder (distributed, async)

When to Use Async

Long-running tasks: Video transcoding, report generation, email sending
Spiky workloads: Flash sales, viral content
Decoupling services: Order placement triggers inventory, payment, notification independently
Reliability: Tasks that must not be lost even if a service is temporarily down

When Sync Is Fine

Low-latency user-facing requests: Login, search, page loads
Simple request-response: REST APIs where the client needs an immediate answer
Strong consistency required: Payment verification before confirming an order

Message Queue Concepts

A message queue is a buffer that sits between producers and consumers, enabling asynchronous communication.

flowchart LR
    P1["Producer 1"]
    P2["Producer 2"]
    Q["Message Queue"]
    C1["Consumer 1"]
    C2["Consumer 2"]
    C3["Consumer 3"]
    P1 --> Q
    P2 --> Q
    Q --> C1
    Q --> C2
    Q --> C3
    style Q fill:#f59e0b,color:#fff
    style P1 fill:#3b82f6,color:#fff
    style P2 fill:#3b82f6,color:#fff
    style C1 fill:#22c55e,color:#fff
    style C2 fill:#22c55e,color:#fff
    style C3 fill:#22c55e,color:#fff

Core Terminology

Term	Definition
Producer	Sends messages to the queue
Consumer	Reads and processes messages from the queue
Queue/Topic	Named destination where messages are stored
Partition	Subdivision of a topic for parallel processing
Offset	Position of a message within a partition
Consumer Group	Set of consumers that share the workload of a topic
Broker	Server that stores and delivers messages
Dead Letter Queue	Queue for messages that repeatedly fail processing

Topics and Partitions

A topic is a logical channel. Partitions allow a topic to be split across multiple brokers for parallel processing.

flowchart TB
    subgraph Topic["Topic: orders"]
        P0["Partition 0<br>msg1, msg4, msg7"]
        P1["Partition 1<br>msg2, msg5, msg8"]
        P2["Partition 2<br>msg3, msg6, msg9"]
    end
    subgraph CG["Consumer Group"]
        C0["Consumer 0 → P0"]
        C1["Consumer 1 → P1"]
        C2["Consumer 2 → P2"]
    end
    P0 --> C0
    P1 --> C1
    P2 --> C2
    style Topic fill:#8b5cf6,color:#fff
    style P0 fill:#8b5cf6,color:#fff
    style P1 fill:#8b5cf6,color:#fff
    style P2 fill:#8b5cf6,color:#fff
    style CG fill:#22c55e,color:#fff
    style C0 fill:#22c55e,color:#fff
    style C1 fill:#22c55e,color:#fff
    style C2 fill:#22c55e,color:#fff

Each partition is consumed by exactly one consumer within a consumer group. To increase parallelism, add more partitions and consumers.

Kafka vs RabbitMQ vs SQS

Apache Kafka

A distributed event streaming platform designed for high-throughput, durable message processing. Messages are persisted to disk and retained for a configurable period.

Architecture: Producers write to topic partitions; consumers pull messages and track their own offsets.

RabbitMQ

A traditional message broker implementing AMQP. Messages are pushed to consumers and removed from the queue after acknowledgment.

Architecture: Producers send to exchanges; exchanges route to queues based on routing rules; consumers receive messages from queues.

Amazon SQS

A fully managed message queue service. No infrastructure to manage. Offers Standard (at-least-once, best-effort ordering) and FIFO (exactly-once, strict ordering) queues.

flowchart TB
    subgraph Kafka_Arch["Kafka"]
        KP["Producer"]
        KT["Topic<br>(Partitioned Log)"]
        KC["Consumer<br>(Pull-based)"]
        KP --> KT --> KC
    end
    subgraph Rabbit_Arch["RabbitMQ"]
        RP["Producer"]
        RE["Exchange"]
        RQ["Queue"]
        RC["Consumer<br>(Push-based)"]
        RP --> RE --> RQ --> RC
    end
    subgraph SQS_Arch["Amazon SQS"]
        SP["Producer"]
        SQ["Queue<br>(Managed)"]
        SC["Consumer<br>(Poll-based)"]
        SP --> SQ --> SC
    end
    style Kafka_Arch fill:#3b82f6,color:#fff
    style Rabbit_Arch fill:#8b5cf6,color:#fff
    style SQS_Arch fill:#f59e0b,color:#fff
    style KP fill:#3b82f6,color:#fff
    style KT fill:#3b82f6,color:#fff
    style KC fill:#3b82f6,color:#fff
    style RP fill:#8b5cf6,color:#fff
    style RE fill:#8b5cf6,color:#fff
    style RQ fill:#8b5cf6,color:#fff
    style RC fill:#8b5cf6,color:#fff
    style SP fill:#f59e0b,color:#fff
    style SQ fill:#f59e0b,color:#fff
    style SC fill:#f59e0b,color:#fff

Feature	Kafka	RabbitMQ	SQS
Model	Distributed log	Message broker	Managed queue
Throughput	Very high (millions/sec)	Moderate (tens of thousands/sec)	High (managed)
Ordering	Per partition	Per queue	FIFO queues only
Retention	Configurable (days/weeks)	Until consumed	14 days max
Delivery	Pull-based	Push-based	Poll-based
Replay	Yes (consumers re-read)	No (message deleted after ack)	No
Scaling	Add partitions	Add queues/consumers	Automatic
Operations	Self-managed or managed (Confluent)	Self-managed or managed	Fully managed (AWS)
Best for	Event streaming, logs, analytics	Task queues, RPC, routing	Simple async tasks, AWS-native

When to Choose What

Kafka: Event streaming, log aggregation, real-time analytics, data pipelines. When you need message replay and high throughput.

RabbitMQ: Task distribution, RPC patterns, complex routing. When you need flexible routing and push-based delivery.

SQS: Simple async processing, AWS-native workloads. When you want zero infrastructure management.

Event-Driven Architecture

In an event-driven architecture, services communicate by producing and consuming events. This pattern promotes loose coupling and independent scalability.

flowchart TB
    OS["Order Service"]
    EB["Event Bus<br>(Kafka)"]
    IS["Inventory Service"]
    PS["Payment Service"]
    NS["Notification Service"]
    AS["Analytics Service"]

    OS -->|"OrderPlaced"| EB
    EB -->|"OrderPlaced"| IS
    EB -->|"OrderPlaced"| PS
    EB -->|"OrderPlaced"| NS
    EB -->|"OrderPlaced"| AS

    style EB fill:#f59e0b,color:#fff
    style OS fill:#3b82f6,color:#fff
    style IS fill:#22c55e,color:#fff
    style PS fill:#8b5cf6,color:#fff
    style NS fill:#ef4444,color:#fff
    style AS fill:#22c55e,color:#fff

Benefits

Loose coupling: Services do not call each other directly
Independent scaling: Each service scales based on its own load
Resilience: If one service is down, events are queued and processed later
Extensibility: Add new consumers without modifying the producer

Challenges

Eventual consistency: Events take time to propagate
Debugging: Tracing a request across async services is complex
Ordering: Events may arrive out of order
Idempotency: Consumers must handle duplicate events gracefully

Pub/Sub Pattern

Publish-Subscribe allows multiple subscribers to receive the same message. Unlike point-to-point queues (where each message is consumed once), pub/sub broadcasts to all subscribers.

flowchart TB
    Pub["Publisher"]
    Topic["Topic: user-signup"]
    Sub1["Subscriber 1<br>Send welcome email"]
    Sub2["Subscriber 2<br>Create default settings"]
    Sub3["Subscriber 3<br>Track analytics"]

    Pub -->|"Publish"| Topic
    Topic -->|"Deliver"| Sub1
    Topic -->|"Deliver"| Sub2
    Topic -->|"Deliver"| Sub3

    style Topic fill:#f59e0b,color:#fff
    style Pub fill:#3b82f6,color:#fff
    style Sub1 fill:#22c55e,color:#fff
    style Sub2 fill:#8b5cf6,color:#fff
    style Sub3 fill:#22c55e,color:#fff

Pattern	Message Delivery	Use Case
Point-to-Point (Queue)	Each message consumed by one consumer	Task distribution, work queues
Pub/Sub (Topic)	Each message delivered to all subscribers	Event broadcasting, notifications
Fan-out	One message triggers multiple independent actions	Order processing (payment + inventory + notification)

Delivery Semantics

Message delivery guarantees determine how many times a consumer processes each message.

Semantic	Description	Duplicates?	Message Loss?
At-most-once	Message delivered 0 or 1 times	No	Possible
At-least-once	Message delivered 1 or more times	Possible	No
Exactly-once	Message delivered exactly 1 time	No	No

At-Most-Once

The producer sends the message and does not retry. If delivery fails, the message is lost. Fast but unreliable.

Use case: Metrics collection where occasional data loss is acceptable.

At-Least-Once

The producer retries until the message is acknowledged. The consumer may receive duplicates if the acknowledgment is lost.

Use case: Order processing (duplicates handled by idempotency).

Exactly-Once

The hardest guarantee to achieve. Requires idempotent consumers or transactional mechanisms. Kafka supports exactly-once semantics through transactions and idempotent producers.

Use case: Financial transactions, inventory updates.

flowchart TB
    subgraph AtMost["At-Most-Once"]
        AM1["Send"]
        AM2["No retry"]
        AM1 --> AM2
    end
    subgraph AtLeast["At-Least-Once"]
        AL1["Send"]
        AL2["Retry until ACK"]
        AL3["Consumer must be<br>idempotent"]
        AL1 --> AL2 --> AL3
    end
    subgraph Exactly["Exactly-Once"]
        EO1["Send with<br>transaction ID"]
        EO2["Deduplicate<br>on consumer"]
        EO1 --> EO2
    end
    style AtMost fill:#22c55e,color:#fff
    style AtLeast fill:#f59e0b,color:#fff
    style Exactly fill:#ef4444,color:#fff
    style AM1 fill:#22c55e,color:#fff
    style AM2 fill:#22c55e,color:#fff
    style AL1 fill:#f59e0b,color:#fff
    style AL2 fill:#f59e0b,color:#fff
    style AL3 fill:#f59e0b,color:#fff
    style EO1 fill:#ef4444,color:#fff
    style EO2 fill:#ef4444,color:#fff

Idempotency

Since at-least-once is the most practical guarantee, your consumers should be idempotent - processing the same message twice produces the same result.

Techniques for idempotency:

Idempotency key: Store processed message IDs; skip duplicates
Database constraints: Use UPSERT or unique constraints
Conditional updates: Only update if current state matches expected state

Designing an Order Processing System

Let us apply these concepts to a real-world example.

flowchart TB
    Client["Client"]
    API["API Gateway"]
    OS["Order Service"]
    Queue["Message Queue<br>(Kafka)"]
    Pay["Payment Service"]
    Inv["Inventory Service"]
    Ship["Shipping Service"]
    Notif["Notification Service"]
    DLQ["Dead Letter Queue"]

    Client --> API --> OS
    OS -->|"OrderCreated"| Queue
    Queue --> Pay
    Queue --> Inv
    Pay -->|"PaymentCompleted"| Queue
    Queue --> Ship
    Queue --> Notif
    Pay -->|"PaymentFailed"| DLQ

    style Queue fill:#f59e0b,color:#fff
    style DLQ fill:#ef4444,color:#fff
    style OS fill:#3b82f6,color:#fff
    style Pay fill:#8b5cf6,color:#fff
    style Inv fill:#22c55e,color:#fff
    style Ship fill:#22c55e,color:#fff
    style Notif fill:#22c55e,color:#fff

Design decisions:

Kafka for the message queue (high throughput, event replay for debugging)
At-least-once delivery with idempotent consumers (order ID as idempotency key)
Dead letter queue for failed payments (manual review and retry)
Event-driven decoupling (each service processes events independently)
Separate topics for different event types (OrderCreated, PaymentCompleted, etc.)

Practice Problems

Exercise 1: Basics

Explain the difference between these three scenarios and which delivery semantic is appropriate:

Sending marketing push notifications to mobile users
Processing credit card charges for online purchases
Logging page view events for analytics

Exercise 2: Applied

Design an order processing system for an e-commerce platform that handles:

50,000 orders per day
Each order triggers: payment processing, inventory update, email confirmation, analytics event
Payment failures must be retried up to 3 times
Orders must not be lost or double-processed

Specify: message queue choice (Kafka, RabbitMQ, or SQS), topic/queue design, delivery guarantee, idempotency strategy, and dead letter queue handling.

Challenge

Design a real-time notification system for a social media platform that handles:

500 million users
10 billion events per day (likes, comments, follows, mentions)
Notifications must be delivered within 1 second for online users
Offline users receive notifications when they come online
Users can configure notification preferences (mute, digest, real-time)

Include: event ingestion pipeline, message queue architecture, consumer design, delivery strategy for online vs offline users, and how to handle the thundering herd when a celebrity posts.

Summary

Concept	Description
Async Communication	Producer sends message and continues; consumer processes later
Message Queue	Buffer between producers and consumers
Topic/Partition	Logical channel subdivided for parallel processing
Consumer Group	Set of consumers sharing workload
Kafka	High-throughput distributed log with replay
RabbitMQ	Traditional broker with flexible routing
SQS	Fully managed AWS queue service
Event-Driven	Services communicate through events, not direct calls
Pub/Sub	One message delivered to all subscribers
At-Least-Once	Retry until acknowledged; consumer must be idempotent
Dead Letter Queue	Collects messages that fail processing repeatedly

Key Takeaways

Use async processing for anything that does not need an immediate response - it improves resilience and scalability
Kafka is the default choice for event streaming in system design interviews
At-least-once with idempotent consumers is the most practical delivery guarantee
Event-driven architecture promotes loose coupling but requires careful handling of eventual consistency
Always include a dead letter queue for messages that cannot be processed
Design consumers to be idempotent - this is the single most important principle for reliable message processing

References

Kleppmann, Martin. Designing Data-Intensive Applications. O'Reilly Media, 2017.
Narkhede, Neha et al. Kafka: The Definitive Guide. O'Reilly Media, 2017.
Apache Kafka Documentation
RabbitMQ Tutorials
AWS SQS Documentation

Next up: On Day 6, we explore API Design and Rate Limiting - covering REST vs GraphQL, API gateway patterns, rate limiting algorithms, and authentication strategies for distributed systems.