Day 8: Design Social Media Feed & Chat

What You'll Learn Today

News Feed: push model vs pull model vs hybrid approach
Feed ranking and timeline generation
Celebrity/hotkey problem and solutions
Media storage for social platforms
Chat system with WebSocket connections
Message delivery guarantees
Online presence tracking
Group chat design considerations

Part 1: News Feed System

The Core Problem

When a user opens their feed, they need to see recent posts from people they follow, ranked by relevance — all in under 200ms.

Fan-Out Strategies

flowchart TB
    subgraph Push["Push Model (Fan-Out on Write)"]
        direction TB
        P1["User creates post"]
        P2["Write to all followers' feeds"]
        P3["Followers read pre-built feed"]
        P1 --> P2 --> P3
    end
    subgraph Pull["Pull Model (Fan-Out on Read)"]
        direction TB
        R1["User opens feed"]
        R2["Query all followed users' posts"]
        R3["Merge and rank on the fly"]
        R1 --> R2 --> R3
    end
    subgraph Hybrid["Hybrid Model"]
        direction TB
        H1["Regular users → Push"]
        H2["Celebrities → Pull"]
        H3["Merge at read time"]
        H1 --> H3
        H2 --> H3
    end
    style Push fill:#3b82f6,color:#fff
    style Pull fill:#f59e0b,color:#fff
    style Hybrid fill:#22c55e,color:#fff

Approach	How It Works	Pros	Cons
Push (Fan-out on write)	When user posts, write to every follower's feed cache	Fast reads, pre-computed	Slow writes for popular users, wasted work for inactive followers
Pull (Fan-out on read)	When user opens feed, fetch from all followed users	No wasted computation	Slow reads, heavy at read time
Hybrid	Push for normal users, pull for celebrities	Balances both	More complex

The Celebrity/Hotkey Problem

A user with 10 million followers posting once triggers 10 million writes in the push model. This is the "hotkey" or "celebrity" problem.

Solution: Use hybrid fan-out.

Users with fewer than N followers (e.g., 10,000): push model
Users with more than N followers: pull model at read time
Merge both sources when the reader loads their feed

Feed Architecture

flowchart TB
    U["User Posts"]
    PS["Post Service"]
    MQ["Message Queue"]
    FW["Fan-Out Workers"]
    FC[("Feed Cache (Redis)")]
    FS["Feed Service"]
    RS["Ranking Service"]
    R["Reader"]

    U --> PS --> MQ --> FW --> FC
    R --> FS
    FS --> FC
    FS --> RS
    RS --> R

    style PS fill:#3b82f6,color:#fff
    style MQ fill:#f59e0b,color:#fff
    style FC fill:#ef4444,color:#fff
    style RS fill:#8b5cf6,color:#fff

Feed Ranking

Modern feeds are not purely chronological. They use ranking signals:

Signal	Weight	Description
Recency	High	Newer posts rank higher
Engagement	High	Posts with many likes/comments
Relationship	Medium	Posts from close friends
Content type	Medium	User's preferred content type
Creator quality	Low	Verified or high-quality creators

A simple scoring formula:

score = w1 * recency + w2 * engagement + w3 * affinity + w4 * content_type

In production, this is typically a machine learning model trained on user interactions.

Media Storage

flowchart LR
    U["Upload"]
    IS["Image Service"]
    VS["Video Service"]
    S3["Object Storage (S3)"]
    CDN["CDN"]
    R["Reader"]

    U --> IS & VS
    IS -->|"Resize, compress"| S3
    VS -->|"Transcode"| S3
    S3 --> CDN --> R

    style S3 fill:#8b5cf6,color:#fff
    style CDN fill:#22c55e,color:#fff

Images: Store multiple resolutions (thumbnail, medium, full)
Videos: Transcode to multiple bitrates
Delivery: Always serve through CDN
Storage: Object storage (S3) is ideal for media blobs

Part 2: Chat / Messaging System

Requirements

1:1 messaging and group chats
Online/offline status
Read receipts
Message history
Real-time delivery

WebSocket Connections

sequenceDiagram
    participant A as User A
    participant WS as WebSocket Server
    participant B as User B

    A->>WS: Connect (WebSocket handshake)
    B->>WS: Connect (WebSocket handshake)
    Note over WS: Both connections maintained

    A->>WS: Send message to B
    WS->>B: Push message to B (real-time)
    B->>WS: Ack (delivered)
    WS->>A: Delivery receipt

Why WebSocket?

Protocol	Direction	Latency	Use Case
HTTP Polling	Client → Server	High (interval)	Legacy fallback
Long Polling	Client → Server	Medium	Moderate real-time
Server-Sent Events	Server → Client	Low	One-way updates
WebSocket	Bidirectional	Very Low	Chat, gaming

Chat Architecture

flowchart TB
    UA["User A"] & UB["User B"]
    WS1["WebSocket Server 1"]
    WS2["WebSocket Server 2"]
    MQ["Message Queue"]
    MS["Message Service"]
    DB[("Message Store")]
    PS["Presence Service"]
    CA["Redis (Sessions)"]

    UA -->|"WebSocket"| WS1
    UB -->|"WebSocket"| WS2
    WS1 --> MQ
    MQ --> WS2
    MQ --> MS --> DB
    WS1 & WS2 --> PS --> CA

    style MQ fill:#f59e0b,color:#fff
    style DB fill:#8b5cf6,color:#fff
    style PS fill:#22c55e,color:#fff
    style CA fill:#ef4444,color:#fff

Message Delivery Guarantees

Messages can be lost at several points. Here's how to handle each:

Sender → Server: Client retries on timeout with idempotency key
Server → Recipient (online): Push via WebSocket, recipient sends ACK
Server → Recipient (offline): Store in DB, deliver when recipient reconnects (push notification as alert)

Message Storage

Approach	Best For	Technology
Wide-column store	1:1 messages, time-ordered	Cassandra, HBase
Document store	Group messages, flexible schema	MongoDB
Relational	Small scale, complex queries	PostgreSQL

Partition key: conversation_id Sort key: message_timestamp

This allows efficient range queries: "Get messages in conversation X between time A and B."

Online Presence

flowchart TB
    U["User A"]
    WS["WebSocket Server"]
    PS["Presence Service"]
    RD["Redis"]
    SUB["Subscribers (Friends)"]

    U -->|"Heartbeat every 30s"| WS
    WS --> PS
    PS -->|"Update last_seen"| RD
    PS -->|"Notify status change"| SUB

    style PS fill:#22c55e,color:#fff
    style RD fill:#ef4444,color:#fff

Online: Heartbeat received within threshold (e.g., last 30 seconds)
Away: No heartbeat for 1-5 minutes
Offline: No heartbeat beyond threshold or explicit disconnect
Store in Redis: {user_id: last_heartbeat_timestamp}

For users with many friends, don't push presence updates to all friends in real-time. Instead, query presence lazily when a user opens a chat or friend list.

Group Chat Design

Aspect	Small Groups (<100)	Large Groups (100+)
Delivery	Push to all members	Push to online members only
Storage	Store per-group	Store per-group
Read status	Track per-member	Simplified (last read pointer)
Notifications	Notify all	Mention-based notifications

Key considerations:

Message ordering: Use server-assigned timestamps or logical clocks
Fan-out: For small groups, push to all; for large groups, pull on demand
Admin controls: Roles, permissions, mute, remove members

Summary

Concept	Description
Push model	Pre-compute feeds on write; fast reads
Pull model	Compute feeds on read; no wasted writes
Hybrid model	Push for normal users, pull for celebrities
Feed ranking	ML model with recency, engagement, affinity signals
WebSocket	Bidirectional real-time communication
Message delivery	Retry + ACK + offline storage
Presence	Heartbeat-based with Redis
Group chat	Fan-out strategy varies by group size

Key Takeaways

The hybrid fan-out approach is used by most large social platforms — it's the answer interviewers expect
The celebrity/hotkey problem is a common follow-up question — always address it proactively
Chat systems need WebSocket for real-time delivery and a message queue for reliability
Design for offline users: store-and-forward is essential for messaging

Practice Problems

Problem 1: Basic

Design a simple notification system that supports multiple channels (push notification, email, SMS). Define the data model and API.

Problem 2: Intermediate

You need to show "typing..." indicators in a chat app. Design the mechanism — how do you propagate typing status in real-time without overwhelming the server? Consider both 1:1 and group chats.

Challenge

Design a social media platform's "Stories" feature (posts that disappear after 24 hours). Consider: storage, feed generation, view tracking, and efficient expiration. How does this differ from the main news feed design?

References

Next up: In Day 9, we'll design Video Streaming (like YouTube) and Distributed File Storage (like Google Drive).