Day 5: Message Queues & Async Processing
What You'll Learn Today
- Synchronous vs asynchronous communication and when to use each
- Message queue core concepts (Producer, Consumer, Topic, Partition)
- Kafka vs RabbitMQ vs SQS comparison
- Event-driven architecture patterns
- Pub/Sub pattern and its applications
- Delivery semantics: exactly-once, at-least-once, at-most-once
Synchronous vs Asynchronous Communication
In a synchronous system, the caller waits for a response before proceeding. In an asynchronous system, the caller sends a message and continues without waiting.
flowchart TB
subgraph Sync["Synchronous"]
SA["Service A"]
SB["Service B"]
SA -->|"1. Request"| SB
SB -->|"2. Wait..."| SB
SB -->|"3. Response"| SA
end
subgraph Async["Asynchronous"]
AA["Service A"]
Q["Message Queue"]
AB["Service B"]
AA -->|"1. Send message"| Q
Q -->|"2. Process later"| AB
end
style Sync fill:#ef4444,color:#fff
style Async fill:#22c55e,color:#fff
style SA fill:#ef4444,color:#fff
style SB fill:#ef4444,color:#fff
style AA fill:#22c55e,color:#fff
style Q fill:#f59e0b,color:#fff
style AB fill:#22c55e,color:#fff
| Aspect | Synchronous | Asynchronous |
|---|---|---|
| Coupling | Tight (caller depends on receiver) | Loose (decoupled by queue) |
| Latency | Caller waits for full processing | Caller returns immediately |
| Failure handling | Caller fails if receiver is down | Message is queued; retried later |
| Scalability | Limited by slowest service | Services scale independently |
| Complexity | Simple to implement | Requires message infrastructure |
| Debugging | Easy (linear flow) | Harder (distributed, async) |
When to Use Async
- Long-running tasks: Video transcoding, report generation, email sending
- Spiky workloads: Flash sales, viral content
- Decoupling services: Order placement triggers inventory, payment, notification independently
- Reliability: Tasks that must not be lost even if a service is temporarily down
When Sync Is Fine
- Low-latency user-facing requests: Login, search, page loads
- Simple request-response: REST APIs where the client needs an immediate answer
- Strong consistency required: Payment verification before confirming an order
Message Queue Concepts
A message queue is a buffer that sits between producers and consumers, enabling asynchronous communication.
flowchart LR
P1["Producer 1"]
P2["Producer 2"]
Q["Message Queue"]
C1["Consumer 1"]
C2["Consumer 2"]
C3["Consumer 3"]
P1 --> Q
P2 --> Q
Q --> C1
Q --> C2
Q --> C3
style Q fill:#f59e0b,color:#fff
style P1 fill:#3b82f6,color:#fff
style P2 fill:#3b82f6,color:#fff
style C1 fill:#22c55e,color:#fff
style C2 fill:#22c55e,color:#fff
style C3 fill:#22c55e,color:#fff
Core Terminology
| Term | Definition |
|---|---|
| Producer | Sends messages to the queue |
| Consumer | Reads and processes messages from the queue |
| Queue/Topic | Named destination where messages are stored |
| Partition | Subdivision of a topic for parallel processing |
| Offset | Position of a message within a partition |
| Consumer Group | Set of consumers that share the workload of a topic |
| Broker | Server that stores and delivers messages |
| Dead Letter Queue | Queue for messages that repeatedly fail processing |
Topics and Partitions
A topic is a logical channel. Partitions allow a topic to be split across multiple brokers for parallel processing.
flowchart TB
subgraph Topic["Topic: orders"]
P0["Partition 0<br>msg1, msg4, msg7"]
P1["Partition 1<br>msg2, msg5, msg8"]
P2["Partition 2<br>msg3, msg6, msg9"]
end
subgraph CG["Consumer Group"]
C0["Consumer 0 β P0"]
C1["Consumer 1 β P1"]
C2["Consumer 2 β P2"]
end
P0 --> C0
P1 --> C1
P2 --> C2
style Topic fill:#8b5cf6,color:#fff
style P0 fill:#8b5cf6,color:#fff
style P1 fill:#8b5cf6,color:#fff
style P2 fill:#8b5cf6,color:#fff
style CG fill:#22c55e,color:#fff
style C0 fill:#22c55e,color:#fff
style C1 fill:#22c55e,color:#fff
style C2 fill:#22c55e,color:#fff
Each partition is consumed by exactly one consumer within a consumer group. To increase parallelism, add more partitions and consumers.
Kafka vs RabbitMQ vs SQS
Apache Kafka
A distributed event streaming platform designed for high-throughput, durable message processing. Messages are persisted to disk and retained for a configurable period.
Architecture: Producers write to topic partitions; consumers pull messages and track their own offsets.
RabbitMQ
A traditional message broker implementing AMQP. Messages are pushed to consumers and removed from the queue after acknowledgment.
Architecture: Producers send to exchanges; exchanges route to queues based on routing rules; consumers receive messages from queues.
Amazon SQS
A fully managed message queue service. No infrastructure to manage. Offers Standard (at-least-once, best-effort ordering) and FIFO (exactly-once, strict ordering) queues.
flowchart TB
subgraph Kafka_Arch["Kafka"]
KP["Producer"]
KT["Topic<br>(Partitioned Log)"]
KC["Consumer<br>(Pull-based)"]
KP --> KT --> KC
end
subgraph Rabbit_Arch["RabbitMQ"]
RP["Producer"]
RE["Exchange"]
RQ["Queue"]
RC["Consumer<br>(Push-based)"]
RP --> RE --> RQ --> RC
end
subgraph SQS_Arch["Amazon SQS"]
SP["Producer"]
SQ["Queue<br>(Managed)"]
SC["Consumer<br>(Poll-based)"]
SP --> SQ --> SC
end
style Kafka_Arch fill:#3b82f6,color:#fff
style Rabbit_Arch fill:#8b5cf6,color:#fff
style SQS_Arch fill:#f59e0b,color:#fff
style KP fill:#3b82f6,color:#fff
style KT fill:#3b82f6,color:#fff
style KC fill:#3b82f6,color:#fff
style RP fill:#8b5cf6,color:#fff
style RE fill:#8b5cf6,color:#fff
style RQ fill:#8b5cf6,color:#fff
style RC fill:#8b5cf6,color:#fff
style SP fill:#f59e0b,color:#fff
style SQ fill:#f59e0b,color:#fff
style SC fill:#f59e0b,color:#fff
| Feature | Kafka | RabbitMQ | SQS |
|---|---|---|---|
| Model | Distributed log | Message broker | Managed queue |
| Throughput | Very high (millions/sec) | Moderate (tens of thousands/sec) | High (managed) |
| Ordering | Per partition | Per queue | FIFO queues only |
| Retention | Configurable (days/weeks) | Until consumed | 14 days max |
| Delivery | Pull-based | Push-based | Poll-based |
| Replay | Yes (consumers re-read) | No (message deleted after ack) | No |
| Scaling | Add partitions | Add queues/consumers | Automatic |
| Operations | Self-managed or managed (Confluent) | Self-managed or managed | Fully managed (AWS) |
| Best for | Event streaming, logs, analytics | Task queues, RPC, routing | Simple async tasks, AWS-native |
When to Choose What
Kafka: Event streaming, log aggregation, real-time analytics, data pipelines. When you need message replay and high throughput.
RabbitMQ: Task distribution, RPC patterns, complex routing. When you need flexible routing and push-based delivery.
SQS: Simple async processing, AWS-native workloads. When you want zero infrastructure management.
Event-Driven Architecture
In an event-driven architecture, services communicate by producing and consuming events. This pattern promotes loose coupling and independent scalability.
flowchart TB
OS["Order Service"]
EB["Event Bus<br>(Kafka)"]
IS["Inventory Service"]
PS["Payment Service"]
NS["Notification Service"]
AS["Analytics Service"]
OS -->|"OrderPlaced"| EB
EB -->|"OrderPlaced"| IS
EB -->|"OrderPlaced"| PS
EB -->|"OrderPlaced"| NS
EB -->|"OrderPlaced"| AS
style EB fill:#f59e0b,color:#fff
style OS fill:#3b82f6,color:#fff
style IS fill:#22c55e,color:#fff
style PS fill:#8b5cf6,color:#fff
style NS fill:#ef4444,color:#fff
style AS fill:#22c55e,color:#fff
Benefits
- Loose coupling: Services do not call each other directly
- Independent scaling: Each service scales based on its own load
- Resilience: If one service is down, events are queued and processed later
- Extensibility: Add new consumers without modifying the producer
Challenges
- Eventual consistency: Events take time to propagate
- Debugging: Tracing a request across async services is complex
- Ordering: Events may arrive out of order
- Idempotency: Consumers must handle duplicate events gracefully
Pub/Sub Pattern
Publish-Subscribe allows multiple subscribers to receive the same message. Unlike point-to-point queues (where each message is consumed once), pub/sub broadcasts to all subscribers.
flowchart TB
Pub["Publisher"]
Topic["Topic: user-signup"]
Sub1["Subscriber 1<br>Send welcome email"]
Sub2["Subscriber 2<br>Create default settings"]
Sub3["Subscriber 3<br>Track analytics"]
Pub -->|"Publish"| Topic
Topic -->|"Deliver"| Sub1
Topic -->|"Deliver"| Sub2
Topic -->|"Deliver"| Sub3
style Topic fill:#f59e0b,color:#fff
style Pub fill:#3b82f6,color:#fff
style Sub1 fill:#22c55e,color:#fff
style Sub2 fill:#8b5cf6,color:#fff
style Sub3 fill:#22c55e,color:#fff
| Pattern | Message Delivery | Use Case |
|---|---|---|
| Point-to-Point (Queue) | Each message consumed by one consumer | Task distribution, work queues |
| Pub/Sub (Topic) | Each message delivered to all subscribers | Event broadcasting, notifications |
| Fan-out | One message triggers multiple independent actions | Order processing (payment + inventory + notification) |
Delivery Semantics
Message delivery guarantees determine how many times a consumer processes each message.
| Semantic | Description | Duplicates? | Message Loss? |
|---|---|---|---|
| At-most-once | Message delivered 0 or 1 times | No | Possible |
| At-least-once | Message delivered 1 or more times | Possible | No |
| Exactly-once | Message delivered exactly 1 time | No | No |
At-Most-Once
The producer sends the message and does not retry. If delivery fails, the message is lost. Fast but unreliable.
Use case: Metrics collection where occasional data loss is acceptable.
At-Least-Once
The producer retries until the message is acknowledged. The consumer may receive duplicates if the acknowledgment is lost.
Use case: Order processing (duplicates handled by idempotency).
Exactly-Once
The hardest guarantee to achieve. Requires idempotent consumers or transactional mechanisms. Kafka supports exactly-once semantics through transactions and idempotent producers.
Use case: Financial transactions, inventory updates.
flowchart TB
subgraph AtMost["At-Most-Once"]
AM1["Send"]
AM2["No retry"]
AM1 --> AM2
end
subgraph AtLeast["At-Least-Once"]
AL1["Send"]
AL2["Retry until ACK"]
AL3["Consumer must be<br>idempotent"]
AL1 --> AL2 --> AL3
end
subgraph Exactly["Exactly-Once"]
EO1["Send with<br>transaction ID"]
EO2["Deduplicate<br>on consumer"]
EO1 --> EO2
end
style AtMost fill:#22c55e,color:#fff
style AtLeast fill:#f59e0b,color:#fff
style Exactly fill:#ef4444,color:#fff
style AM1 fill:#22c55e,color:#fff
style AM2 fill:#22c55e,color:#fff
style AL1 fill:#f59e0b,color:#fff
style AL2 fill:#f59e0b,color:#fff
style AL3 fill:#f59e0b,color:#fff
style EO1 fill:#ef4444,color:#fff
style EO2 fill:#ef4444,color:#fff
Idempotency
Since at-least-once is the most practical guarantee, your consumers should be idempotent - processing the same message twice produces the same result.
Techniques for idempotency:
- Idempotency key: Store processed message IDs; skip duplicates
- Database constraints: Use UPSERT or unique constraints
- Conditional updates: Only update if current state matches expected state
Designing an Order Processing System
Let us apply these concepts to a real-world example.
flowchart TB
Client["Client"]
API["API Gateway"]
OS["Order Service"]
Queue["Message Queue<br>(Kafka)"]
Pay["Payment Service"]
Inv["Inventory Service"]
Ship["Shipping Service"]
Notif["Notification Service"]
DLQ["Dead Letter Queue"]
Client --> API --> OS
OS -->|"OrderCreated"| Queue
Queue --> Pay
Queue --> Inv
Pay -->|"PaymentCompleted"| Queue
Queue --> Ship
Queue --> Notif
Pay -->|"PaymentFailed"| DLQ
style Queue fill:#f59e0b,color:#fff
style DLQ fill:#ef4444,color:#fff
style OS fill:#3b82f6,color:#fff
style Pay fill:#8b5cf6,color:#fff
style Inv fill:#22c55e,color:#fff
style Ship fill:#22c55e,color:#fff
style Notif fill:#22c55e,color:#fff
Design decisions:
- Kafka for the message queue (high throughput, event replay for debugging)
- At-least-once delivery with idempotent consumers (order ID as idempotency key)
- Dead letter queue for failed payments (manual review and retry)
- Event-driven decoupling (each service processes events independently)
- Separate topics for different event types (OrderCreated, PaymentCompleted, etc.)
Practice Problems
Exercise 1: Basics
Explain the difference between these three scenarios and which delivery semantic is appropriate:
- Sending marketing push notifications to mobile users
- Processing credit card charges for online purchases
- Logging page view events for analytics
Exercise 2: Applied
Design an order processing system for an e-commerce platform that handles:
- 50,000 orders per day
- Each order triggers: payment processing, inventory update, email confirmation, analytics event
- Payment failures must be retried up to 3 times
- Orders must not be lost or double-processed
Specify: message queue choice (Kafka, RabbitMQ, or SQS), topic/queue design, delivery guarantee, idempotency strategy, and dead letter queue handling.
Challenge
Design a real-time notification system for a social media platform that handles:
- 500 million users
- 10 billion events per day (likes, comments, follows, mentions)
- Notifications must be delivered within 1 second for online users
- Offline users receive notifications when they come online
- Users can configure notification preferences (mute, digest, real-time)
Include: event ingestion pipeline, message queue architecture, consumer design, delivery strategy for online vs offline users, and how to handle the thundering herd when a celebrity posts.
Summary
| Concept | Description |
|---|---|
| Async Communication | Producer sends message and continues; consumer processes later |
| Message Queue | Buffer between producers and consumers |
| Topic/Partition | Logical channel subdivided for parallel processing |
| Consumer Group | Set of consumers sharing workload |
| Kafka | High-throughput distributed log with replay |
| RabbitMQ | Traditional broker with flexible routing |
| SQS | Fully managed AWS queue service |
| Event-Driven | Services communicate through events, not direct calls |
| Pub/Sub | One message delivered to all subscribers |
| At-Least-Once | Retry until acknowledged; consumer must be idempotent |
| Dead Letter Queue | Collects messages that fail processing repeatedly |
Key Takeaways
- Use async processing for anything that does not need an immediate response - it improves resilience and scalability
- Kafka is the default choice for event streaming in system design interviews
- At-least-once with idempotent consumers is the most practical delivery guarantee
- Event-driven architecture promotes loose coupling but requires careful handling of eventual consistency
- Always include a dead letter queue for messages that cannot be processed
- Design consumers to be idempotent - this is the single most important principle for reliable message processing
References
- Kleppmann, Martin. Designing Data-Intensive Applications. O'Reilly Media, 2017.
- Narkhede, Neha et al. Kafka: The Definitive Guide. O'Reilly Media, 2017.
- Apache Kafka Documentation
- RabbitMQ Tutorials
- AWS SQS Documentation
Next up: On Day 6, we explore API Design and Rate Limiting - covering REST vs GraphQL, API gateway patterns, rate limiting algorithms, and authentication strategies for distributed systems.