Day 6: Microservices & API Design

What You'll Learn Today

Monolith vs microservices tradeoffs
API design: REST vs gRPC vs GraphQL
API Gateway pattern
Service discovery mechanisms
Rate limiting and throttling strategies
Authentication with OAuth 2.0 and JWT
Idempotency in API design

Monolith vs Microservices

flowchart LR
    subgraph Monolith["Monolith Architecture"]
        direction TB
        UI1["UI Layer"]
        BL1["Business Logic"]
        DB1[("Single Database")]
        UI1 --> BL1 --> DB1
    end
    subgraph Micro["Microservices Architecture"]
        direction TB
        GW["API Gateway"]
        S1["User Service"]
        S2["Order Service"]
        S3["Payment Service"]
        DB2[("User DB")]
        DB3[("Order DB")]
        DB4[("Payment DB")]
        GW --> S1 & S2 & S3
        S1 --> DB2
        S2 --> DB3
        S3 --> DB4
    end
    style Monolith fill:#f59e0b,color:#fff
    style Micro fill:#3b82f6,color:#fff

Aspect	Monolith	Microservices
Deployment	Single unit, simple	Independent, complex
Scaling	Scale everything	Scale individual services
Development	Easy to start	Better for large teams
Data consistency	ACID transactions	Eventual consistency
Latency	In-process calls	Network calls (higher)
Debugging	Simpler stack traces	Distributed tracing needed
Technology	Single stack	Polyglot possible
Failure	Entire app fails	Partial failures

When to Choose Each

Start with a monolith when you have a small team, unclear domain boundaries, or are building an MVP.
Move to microservices when you need independent scaling, have distinct team boundaries, or need different tech stacks per service.

API Design: REST vs gRPC vs GraphQL

REST (Representational State Transfer)

GET    /api/v1/users/123          → Get user
POST   /api/v1/users              → Create user
PUT    /api/v1/users/123          → Update user
DELETE /api/v1/users/123          → Delete user
PATCH  /api/v1/users/123          → Partial update

Key principles:

Resource-based URLs
HTTP methods for actions
Stateless
JSON payloads (typically)
HTTP status codes for responses

gRPC (Google Remote Procedure Call)

service RideService {
  rpc RequestRide(RideRequest) returns (RideResponse);
  rpc StreamLocation(stream LocationUpdate) returns (stream DriverLocation);
}

message RideRequest {
  string user_id = 1;
  Location pickup = 2;
  Location dropoff = 3;
}

Key features:

Protocol Buffers (binary serialization)
HTTP/2 with multiplexing
Bidirectional streaming
Code generation for multiple languages

GraphQL

query {
  user(id: "123") {
    name
    email
    rides(last: 5) {
      id
      status
      driver {
        name
        rating
      }
    }
  }
}

Key features:

Client specifies exact data needed
Single endpoint
No over-fetching or under-fetching
Strong type system with schema

Comparison

Feature	REST	gRPC	GraphQL
Protocol	HTTP/1.1+	HTTP/2	HTTP
Data format	JSON	Protobuf (binary)	JSON
Performance	Good	Excellent	Good
Streaming	Limited	Bidirectional	Subscriptions
Browser support	Native	Requires proxy	Native
Best for	Public APIs	Service-to-service	Mobile/frontend
Learning curve	Low	Medium	Medium
Caching	HTTP caching	Custom	Complex

API Gateway Pattern

flowchart TB
    C1["Mobile App"] & C2["Web App"] & C3["Third Party"]
    GW["API Gateway"]
    C1 & C2 & C3 --> GW
    subgraph Services["Backend Services"]
        S1["User Service"]
        S2["Ride Service"]
        S3["Payment Service"]
        S4["Notification Service"]
    end
    GW --> S1 & S2 & S3 & S4
    subgraph GWFeatures["Gateway Responsibilities"]
        F1["Authentication"]
        F2["Rate Limiting"]
        F3["Load Balancing"]
        F4["Request Routing"]
        F5["Response Aggregation"]
        F6["SSL Termination"]
    end
    style GW fill:#8b5cf6,color:#fff
    style Services fill:#3b82f6,color:#fff
    style GWFeatures fill:#22c55e,color:#fff

The API Gateway acts as a single entry point for all clients. It handles cross-cutting concerns so individual services don't have to.

Popular implementations: Kong, AWS API Gateway, Netflix Zuul, Envoy

Service Discovery

In a microservices architecture, services need to find each other. Services scale up and down, and IP addresses change.

flowchart TB
    subgraph Client["Client-Side Discovery"]
        C1["Service A"] -->|"1. Query"| R1["Service Registry"]
        R1 -->|"2. Return addresses"| C1
        C1 -->|"3. Direct call"| S1["Service B (instance 1)"]
    end
    subgraph Server["Server-Side Discovery"]
        C2["Service A"] -->|"1. Request"| LB["Load Balancer"]
        LB -->|"2. Query"| R2["Service Registry"]
        LB -->|"3. Forward"| S2["Service B (instance 2)"]
    end
    style Client fill:#3b82f6,color:#fff
    style Server fill:#8b5cf6,color:#fff

Approach	How It Works	Example
Client-side discovery	Client queries registry, picks instance	Netflix Eureka
Server-side discovery	Load balancer queries registry	AWS ELB, Kubernetes
DNS-based	Services register DNS entries	Consul, CoreDNS
Service mesh	Sidecar proxy handles routing	Istio, Linkerd

Rate Limiting & Throttling

Rate limiting protects services from being overwhelmed. It's critical for public APIs and shared resources.

Common Algorithms

flowchart LR
    subgraph TB["Token Bucket"]
        direction TB
        T1["Tokens added at fixed rate"]
        T2["Request consumes a token"]
        T3["No token → rejected"]
        T1 --> T2 --> T3
    end
    subgraph SW["Sliding Window"]
        direction TB
        W1["Track requests in time window"]
        W2["Count requests"]
        W3["Over limit → rejected"]
        W1 --> W2 --> W3
    end
    style TB fill:#3b82f6,color:#fff
    style SW fill:#22c55e,color:#fff

Algorithm	Pros	Cons
Token Bucket	Allows bursts, smooth	Memory for tokens
Leaky Bucket	Smooth output rate	No burst handling
Fixed Window	Simple	Burst at window edges
Sliding Window Log	Precise	High memory usage
Sliding Window Counter	Good balance	Approximate

Rate Limit Headers

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1625097600
Retry-After: 60

Authentication: OAuth 2.0 & JWT

OAuth 2.0 Flow

sequenceDiagram
    participant U as User
    participant A as App (Client)
    participant AS as Auth Server
    participant RS as Resource Server
    U->>A: 1. Click "Login"
    A->>AS: 2. Redirect to auth page
    U->>AS: 3. Enter credentials
    AS->>A: 4. Authorization code
    A->>AS: 5. Exchange code for tokens
    AS->>A: 6. Access token + Refresh token
    A->>RS: 7. API call with access token
    RS->>A: 8. Protected resource

JWT (JSON Web Token)

A JWT has three parts: Header.Payload.Signature

eyJhbGciOiJIUzI1NiJ9.           ← Header (algorithm)
eyJ1c2VyX2lkIjoiMTIzIn0.       ← Payload (claims)
SflKxwRJSMeKKF2QT4fwpM...      ← Signature (verification)

Aspect	Session-based	JWT
Storage	Server-side	Client-side
Scalability	Requires shared store	Stateless, scales easily
Revocation	Easy (delete session)	Hard (need blocklist)
Size	Small session ID	Larger token
Best for	Traditional web apps	Microservices, APIs

Idempotency

An idempotent operation produces the same result regardless of how many times it's called. This is critical in distributed systems where retries are common.

HTTP Method	Idempotent?	Example
GET	Yes	Fetch user profile
PUT	Yes	Update entire resource
DELETE	Yes	Remove resource
POST	No	Create new resource
PATCH	It depends	Partial update

Idempotency Key Pattern

POST /api/v1/payments
Idempotency-Key: "abc-123-unique-key"

{
  "amount": 50.00,
  "currency": "USD"
}

The server stores the result keyed by the idempotency key. If the same key is sent again, the server returns the stored result instead of processing again. This prevents duplicate payments, duplicate orders, etc.

Practice Problem: Design APIs for a Ride-Sharing Service

Core Entities

User (riders and drivers)
Ride (a trip from pickup to dropoff)
Payment (transaction for a ride)
Location (real-time GPS coordinates)

API Design

# User Service
POST   /api/v1/users                    → Register
POST   /api/v1/auth/login               → Login (returns JWT)
GET    /api/v1/users/{id}/profile       → Get profile

# Ride Service
POST   /api/v1/rides                     → Request a ride
GET    /api/v1/rides/{id}                → Get ride details
PUT    /api/v1/rides/{id}/accept         → Driver accepts
PUT    /api/v1/rides/{id}/start          → Start ride
PUT    /api/v1/rides/{id}/complete       → Complete ride
PUT    /api/v1/rides/{id}/cancel         → Cancel ride
GET    /api/v1/rides/{id}/eta            → Get ETA

# Location Service (gRPC for real-time)
rpc UpdateDriverLocation(stream LocationUpdate) returns (Ack)
rpc SubscribeRiderLocation(RideId) returns (stream DriverLocation)

# Payment Service
POST   /api/v1/payments                  → Process payment
GET    /api/v1/payments/{id}             → Get payment status
POST   /api/v1/payments/{id}/refund      → Refund

Summary

Concept	Description
Monolith vs Microservices	Start simple, split when needed
REST	Resource-based, widely adopted
gRPC	High-performance service-to-service
GraphQL	Client-driven queries, reduces over-fetching
API Gateway	Single entry point, cross-cutting concerns
Service Discovery	Dynamically locate service instances
Rate Limiting	Protect services from overload
OAuth 2.0 / JWT	Secure authentication for distributed systems
Idempotency	Safe retries in unreliable networks

Key Takeaways

Choose your API style based on your use case: REST for public APIs, gRPC for internal services, GraphQL for flexible frontends
An API Gateway simplifies client interactions and centralizes cross-cutting concerns
Rate limiting is essential for any production API
Design every write API to be idempotent to handle retries safely

Practice Problems

Problem 1: Basic

Design a REST API for a simple blog platform with users, posts, and comments. Define the endpoints, HTTP methods, and response codes.

Problem 2: Intermediate

You're migrating a monolithic e-commerce app to microservices. Identify the service boundaries, define the APIs between services, and explain how you'd handle a transaction that spans multiple services (e.g., placing an order).

Challenge

Design a rate limiting system for a public API that supports: per-user limits, per-endpoint limits, and global limits. The system must work across multiple API server instances. Describe the algorithm, data store, and how you handle edge cases like clock skew.

References

Next up: In Day 7, we'll walk through a complete system design interview — designing a URL Shortener from scratch.