Day 8: Design Social Media Feed & Chat
What You'll Learn Today
- News Feed: push model vs pull model vs hybrid approach
- Feed ranking and timeline generation
- Celebrity/hotkey problem and solutions
- Media storage for social platforms
- Chat system with WebSocket connections
- Message delivery guarantees
- Online presence tracking
- Group chat design considerations
Part 1: News Feed System
The Core Problem
When a user opens their feed, they need to see recent posts from people they follow, ranked by relevance β all in under 200ms.
Fan-Out Strategies
flowchart TB
subgraph Push["Push Model (Fan-Out on Write)"]
direction TB
P1["User creates post"]
P2["Write to all followers' feeds"]
P3["Followers read pre-built feed"]
P1 --> P2 --> P3
end
subgraph Pull["Pull Model (Fan-Out on Read)"]
direction TB
R1["User opens feed"]
R2["Query all followed users' posts"]
R3["Merge and rank on the fly"]
R1 --> R2 --> R3
end
subgraph Hybrid["Hybrid Model"]
direction TB
H1["Regular users β Push"]
H2["Celebrities β Pull"]
H3["Merge at read time"]
H1 --> H3
H2 --> H3
end
style Push fill:#3b82f6,color:#fff
style Pull fill:#f59e0b,color:#fff
style Hybrid fill:#22c55e,color:#fff
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| Push (Fan-out on write) | When user posts, write to every follower's feed cache | Fast reads, pre-computed | Slow writes for popular users, wasted work for inactive followers |
| Pull (Fan-out on read) | When user opens feed, fetch from all followed users | No wasted computation | Slow reads, heavy at read time |
| Hybrid | Push for normal users, pull for celebrities | Balances both | More complex |
The Celebrity/Hotkey Problem
A user with 10 million followers posting once triggers 10 million writes in the push model. This is the "hotkey" or "celebrity" problem.
Solution: Use hybrid fan-out.
- Users with fewer than N followers (e.g., 10,000): push model
- Users with more than N followers: pull model at read time
- Merge both sources when the reader loads their feed
Feed Architecture
flowchart TB
U["User Posts"]
PS["Post Service"]
MQ["Message Queue"]
FW["Fan-Out Workers"]
FC[("Feed Cache (Redis)")]
FS["Feed Service"]
RS["Ranking Service"]
R["Reader"]
U --> PS --> MQ --> FW --> FC
R --> FS
FS --> FC
FS --> RS
RS --> R
style PS fill:#3b82f6,color:#fff
style MQ fill:#f59e0b,color:#fff
style FC fill:#ef4444,color:#fff
style RS fill:#8b5cf6,color:#fff
Feed Ranking
Modern feeds are not purely chronological. They use ranking signals:
| Signal | Weight | Description |
|---|---|---|
| Recency | High | Newer posts rank higher |
| Engagement | High | Posts with many likes/comments |
| Relationship | Medium | Posts from close friends |
| Content type | Medium | User's preferred content type |
| Creator quality | Low | Verified or high-quality creators |
A simple scoring formula:
score = w1 * recency + w2 * engagement + w3 * affinity + w4 * content_type
In production, this is typically a machine learning model trained on user interactions.
Media Storage
flowchart LR
U["Upload"]
IS["Image Service"]
VS["Video Service"]
S3["Object Storage (S3)"]
CDN["CDN"]
R["Reader"]
U --> IS & VS
IS -->|"Resize, compress"| S3
VS -->|"Transcode"| S3
S3 --> CDN --> R
style S3 fill:#8b5cf6,color:#fff
style CDN fill:#22c55e,color:#fff
- Images: Store multiple resolutions (thumbnail, medium, full)
- Videos: Transcode to multiple bitrates
- Delivery: Always serve through CDN
- Storage: Object storage (S3) is ideal for media blobs
Part 2: Chat / Messaging System
Requirements
- 1:1 messaging and group chats
- Online/offline status
- Read receipts
- Message history
- Real-time delivery
WebSocket Connections
sequenceDiagram
participant A as User A
participant WS as WebSocket Server
participant B as User B
A->>WS: Connect (WebSocket handshake)
B->>WS: Connect (WebSocket handshake)
Note over WS: Both connections maintained
A->>WS: Send message to B
WS->>B: Push message to B (real-time)
B->>WS: Ack (delivered)
WS->>A: Delivery receipt
Why WebSocket?
| Protocol | Direction | Latency | Use Case |
|---|---|---|---|
| HTTP Polling | Client β Server | High (interval) | Legacy fallback |
| Long Polling | Client β Server | Medium | Moderate real-time |
| Server-Sent Events | Server β Client | Low | One-way updates |
| WebSocket | Bidirectional | Very Low | Chat, gaming |
Chat Architecture
flowchart TB
UA["User A"] & UB["User B"]
WS1["WebSocket Server 1"]
WS2["WebSocket Server 2"]
MQ["Message Queue"]
MS["Message Service"]
DB[("Message Store")]
PS["Presence Service"]
CA["Redis (Sessions)"]
UA -->|"WebSocket"| WS1
UB -->|"WebSocket"| WS2
WS1 --> MQ
MQ --> WS2
MQ --> MS --> DB
WS1 & WS2 --> PS --> CA
style MQ fill:#f59e0b,color:#fff
style DB fill:#8b5cf6,color:#fff
style PS fill:#22c55e,color:#fff
style CA fill:#ef4444,color:#fff
Message Delivery Guarantees
Messages can be lost at several points. Here's how to handle each:
- Sender β Server: Client retries on timeout with idempotency key
- Server β Recipient (online): Push via WebSocket, recipient sends ACK
- Server β Recipient (offline): Store in DB, deliver when recipient reconnects (push notification as alert)
Message Storage
| Approach | Best For | Technology |
|---|---|---|
| Wide-column store | 1:1 messages, time-ordered | Cassandra, HBase |
| Document store | Group messages, flexible schema | MongoDB |
| Relational | Small scale, complex queries | PostgreSQL |
Partition key: conversation_id Sort key: message_timestamp
This allows efficient range queries: "Get messages in conversation X between time A and B."
Online Presence
flowchart TB
U["User A"]
WS["WebSocket Server"]
PS["Presence Service"]
RD["Redis"]
SUB["Subscribers (Friends)"]
U -->|"Heartbeat every 30s"| WS
WS --> PS
PS -->|"Update last_seen"| RD
PS -->|"Notify status change"| SUB
style PS fill:#22c55e,color:#fff
style RD fill:#ef4444,color:#fff
- Online: Heartbeat received within threshold (e.g., last 30 seconds)
- Away: No heartbeat for 1-5 minutes
- Offline: No heartbeat beyond threshold or explicit disconnect
- Store in Redis:
{user_id: last_heartbeat_timestamp}
For users with many friends, don't push presence updates to all friends in real-time. Instead, query presence lazily when a user opens a chat or friend list.
Group Chat Design
| Aspect | Small Groups (<100) | Large Groups (100+) |
|---|---|---|
| Delivery | Push to all members | Push to online members only |
| Storage | Store per-group | Store per-group |
| Read status | Track per-member | Simplified (last read pointer) |
| Notifications | Notify all | Mention-based notifications |
Key considerations:
- Message ordering: Use server-assigned timestamps or logical clocks
- Fan-out: For small groups, push to all; for large groups, pull on demand
- Admin controls: Roles, permissions, mute, remove members
Summary
| Concept | Description |
|---|---|
| Push model | Pre-compute feeds on write; fast reads |
| Pull model | Compute feeds on read; no wasted writes |
| Hybrid model | Push for normal users, pull for celebrities |
| Feed ranking | ML model with recency, engagement, affinity signals |
| WebSocket | Bidirectional real-time communication |
| Message delivery | Retry + ACK + offline storage |
| Presence | Heartbeat-based with Redis |
| Group chat | Fan-out strategy varies by group size |
Key Takeaways
- The hybrid fan-out approach is used by most large social platforms β it's the answer interviewers expect
- The celebrity/hotkey problem is a common follow-up question β always address it proactively
- Chat systems need WebSocket for real-time delivery and a message queue for reliability
- Design for offline users: store-and-forward is essential for messaging
Practice Problems
Problem 1: Basic
Design a simple notification system that supports multiple channels (push notification, email, SMS). Define the data model and API.
Problem 2: Intermediate
You need to show "typing..." indicators in a chat app. Design the mechanism β how do you propagate typing status in real-time without overwhelming the server? Consider both 1:1 and group chats.
Challenge
Design a social media platform's "Stories" feature (posts that disappear after 24 hours). Consider: storage, feed generation, view tracking, and efficient expiration. How does this differ from the main news feed design?
References
- Facebook News Feed Architecture
- WhatsApp Architecture β High Scalability
- Discord Engineering Blog
- WebSocket Protocol - RFC 6455
Next up: In Day 9, we'll design Video Streaming (like YouTube) and Distributed File Storage (like Google Drive).