Design Instagram Feed

Designing an Instagram-like feed is a classic system design problem that tests your ability to balance scalability, latency, consistency, and user experience.

At first glance, showing posts in a feed seems simple—fetch posts from people a user follows and display them. However, when the system must serve hundreds of millions of users with near real-time updates, the problem becomes significantly more complex.

In this article, we will walk through a high-level design (HLD) of an Instagram feed, focusing on how to efficiently generate, store, and serve feeds while maintaining performance at scale.

Understanding Requirements

The core requirement of an Instagram feed is to show a list of posts created by users that a person follows. These posts should ideally be ordered by recency or relevance, depending on the product requirements. The system should support uploading posts, following/unfollowing users, and retrieving feeds quickly.

A naive approach where we query all followed users' posts at request time will not scale. If a user follows 1000 people, fetching posts dynamically becomes slow and expensive. Therefore, the system must optimize for read-heavy workloads, since feed retrieval happens far more frequently than post creation.

Functional Requirements

1. The system should allow users to create posts that may include images, videos, or captions.
2. Users must be able to follow and unfollow other users, forming a directed social graph that determines feed content.
3. A user should be able to retrieve their feed, which consists of posts from accounts they follow. The feed may be ordered either by recency or by relevance using ranking algorithms.
4. The system should support real-time or near real-time updates, meaning that newly created posts should appear in followers' feeds with minimal delay.
5. Users should also be able to interact with posts through likes, comments, and potentially shares.
6. Additionally, users may access the platform from multiple devices, so the feed experience should remain consistent across sessions.

Non-Functional Requirements

1. The system must be highly scalable, capable of supporting hundreds of millions of users and handling massive volumes of read and write operations. Since feed retrieval is the most frequent operation, the system must be optimized for low latency reads.
2. Availability is critical, as users expect the feed to load reliably at all times. The system should be designed with fault tolerance so that failures in individual components do not bring down the entire service.
3. The system should maintain eventual consistency, ensuring that new posts propagate to followers' feeds without requiring strict synchronization.
4. Additionally, performance under high fanout conditions—such as celebrity users with millions of followers—must be handled efficiently.
5. Finally, the system should support personalization and ranking, enabling machine learning models to reorder feed items based on user behavior, engagement patterns, and relevance signals.

Capacity Estimation

Assume the platform has 100 million daily active users. If each user creates an average of 2 posts per day, the system processes:

Total Posts per Day = 100 million × 2 = 200 million posts/day

To estimate throughput: Posts per Second = 200,000,000 / 86,400 ≈ 2,315 posts/second

However, the read workload is significantly higher. If each user opens the app 10 times per day and fetches their feed each time:

Feed Requests per Day = 100 million × 10 = 1 billion feed requests/day
Feed Requests per Second = 1,000,000,000 / 86,400 ≈ 11,574 requests/second

During peak hours such as evenings, traffic can spike by 3x to 5x:
Peak Read Throughput ≈ 35K to 60K feed requests/second

A major complexity arises from fanout. If an average user has 300 followers, then each post potentially generates:

Fanout Writes per Post = 300 feed updates
Total Fanout Operations per Day = 200 million × 300 = 60 billion feed insertions/day

This highlights why naive designs fail and why systems must rely on fanout optimization strategies, caching, and hybrid push-pull models.

If even 5% of users are active at the same time: Concurrent Active Users = 100 million × 5% = 5 million users

These numbers justify the need for distributed systems, horizontal scaling, load balancing, and high-performance caching layers.

Clearly stating such assumptions in an interview strengthens your design by making every architectural decision appear intentional and justified rather than arbitrary.

Request Flow

When a user opens the Instagram application (web or mobile), the system follows a structured sequence of steps to deliver a fast and personalized feed experience. Below is a typical request flow assuming a modern cloud-based deployment (for example, AWS-like architecture).

DNS Resolution: The client (browser or mobile app) needs to resolve the backend endpoint. It first checks the local DNS cache; if unavailable, it queries a DNS resolver. A managed DNS service returns the optimal endpoint based on latency and health.
Client → DNS Resolver → DNS Service → CDN / API Endpoint

HTTPS Connection: The client establishes a secure TLS/HTTPS connection to the resolved endpoint. Typically, static assets are served via a CDN to reduce latency globally.

Static App Load: The frontend application (HTML, CSS, JavaScript) is loaded from the CDN, often backed by object storage. This ensures fast initial load times regardless of user location.
Client ← CDN (Static Storage Origin)

Application Bootstrap: The JavaScript bundle initializes the app, validates session state, and prepares API clients. If the user is not authenticated, login flow is triggered; otherwise, the app proceeds to fetch the feed.

Authentication & Session Validation: The client sends authentication tokens (JWT/session cookies) to backend APIs to validate identity and fetch user context such as profile and follow graph.
Client → API Gateway → Load Balancer → Auth/User Service

Feed Request: Once authenticated, the client requests the user's feed. This is one of the most performance-critical APIs in the system.
Client → API Gateway → Load Balancer → Feed Service

Feed Retrieval (Cache First): The Feed Service first checks a cache layer (e.g., Redis) for precomputed feed data. If available, it returns immediately. Otherwise, it falls back to the feed store or recomputes using a hybrid push-pull strategy.
Feed Service → Cache → (Miss) → Feed Store / Post Service

Ranking & Personalization: Before returning results, the system may pass candidate posts through a ranking layer that orders them based on engagement signals, user preferences, and machine learning models.

Response to Client: The ranked feed (list of post IDs + metadata) is returned to the client. Media content (images/videos) is typically served separately via CDN for efficiency.
Client ← Feed Service (via API Gateway)

Media Fetching: The client loads images/videos directly from the CDN using URLs embedded in the feed response. This avoids overloading backend services.
Client → CDN (Media Storage)

Real-Time Updates: For near real-time updates (likes, comments, new posts), the client may maintain a lightweight persistent connection (WebSocket/HTTP streaming) or rely on periodic polling.
Client ↔ Gateway / Notification Service ↔ Feed/Engagement Services

User Interactions: Actions such as likes, comments, or follows are sent asynchronously to backend services and may trigger feed updates or ranking changes.
Client → API Gateway → Engagement Service → Queue → Feed Update Pipeline

This flow highlights how the system is optimized for low-latency reads, high fanout writes, and efficient content delivery. By separating static content, feed metadata, and media delivery, the architecture ensures scalability while maintaining a smooth user experience.

Major Components

Now that we understand the request flow, the next step is to break down each major component of the Instagram Feed system and understand its role, responsibilities, and design considerations.

Client

From the client perspective (mobile app or web), the system interacts through multiple flows depending on the action being performed. When the user opens the app, the client loads the UI and immediately prepares to fetch the feed and user context.

For actions like login, profile fetch, follow/unfollow, and feed retrieval, the client makes standard HTTPS API calls. These are stateless request-response interactions.

For interactive features such as likes, comments, and notifications, the client may maintain lightweight persistent connections or polling mechanisms to receive near real-time updates. The client also handles pagination, lazy loading, and media rendering, ensuring a smooth scrolling experience.

CDN / Edge Layer

The CDN layer is responsible for serving static and media content efficiently. This includes HTML, CSS, JavaScript, images, and videos. Since Instagram is heavily media-driven, this layer is critical for performance.

Content is cached across globally distributed edge nodes, ensuring that users fetch data from the nearest location. This significantly reduces latency and offloads backend systems. Media URLs embedded in feed responses typically point directly to CDN endpoints.

This separation ensures that backend services focus only on metadata and business logic, while the CDN handles heavy content delivery.

API Gateway

The API Gateway acts as the single entry point for all client requests. It routes incoming traffic to appropriate backend services such as User Service, Feed Service, and Post Service.

It is responsible for authentication validation, rate limiting, request routing, and load balancing. By centralizing these concerns, the system ensures consistency and simplifies backend services.

In large-scale deployments, the gateway sits behind a load balancer and scales horizontally to handle millions of requests per second.

User Service

The User Service manages user identity, profiles, authentication state, and the follow graph. It uses a polyglot persistence approach, choosing different storage systems based on access patterns and consistency needs.

1. Registration & Profile

User accounts and profiles require strong consistency, constraints, and transactional guarantees, so they are stored in an RDBMS.

-- Table: users

CREATE TABLE users (
    user_id BIGSERIAL PRIMARY KEY,
    username TEXT UNIQUE NOT NULL,
    email TEXT UNIQUE NOT NULL,
    password_hash TEXT NOT NULL,
    created_at TIMESTAMP
);

-- Table: user_profiles

CREATE TABLE user_profiles (
    user_id BIGINT PRIMARY KEY,
    bio TEXT,
    avatar_url TEXT,
    privacy_setting TEXT,
    updated_at TIMESTAMP
);

2. Authentication

Authentication is handled using tokens (e.g., JWT or session tokens), with Redis used as a fast session store. Flow:

def login(username, password):
    user = db.find_user(username)

    if verify(password, user.password_hash):
        token = generate_token(user.user_id)
        redis.set(token, user.user_id, ttl=86400)  # 1 day expiry
        return token

On each request:

def authenticate(token):
    user_id = redis.get(token)
    return user_id

JWT can be stateless, but many systems still use Redis for revocation/control.

3. Follow Graph

The follow graph is highly read-heavy (fanout, feed generation) and can grow extremely large, making it a good fit for Cassandra. Since Cassandra is query-driven, we maintain two tables:

-- Who follows a user (used in fanout)

CREATE TABLE followers_by_user (
    user_id text,
    follower_id text,
    followed_at timestamp,
    PRIMARY KEY ((user_id), follower_id)
);

-- Whom a user follows (used in feed generation)

CREATE TABLE following_by_user (
    user_id text,
    following_id text,
    followed_at timestamp,
    PRIMARY KEY ((user_id), following_id)
);

def follow(user_id, target_user_id):
    cassandra.insert_follow(user_id, target_user_id)
    publish_event("FOLLOW_CREATED", user_id, target_user_id)


def unfollow(user_id, target_user_id):
    cassandra.delete_follow(user_id, target_user_id)
    publish_event("FOLLOW_REMOVED", user_id, target_user_id)

These events are consumed by the Fanout Service, which adjusts feed relationships by updating follower feeds and maintaining correct content distribution.

4. Caching Layer (Follower Optimization)

Follower lists are frequently accessed during fanout, so caching is critical:

followers = redis.get(user_id)

if not followers:
    followers = cassandra.get_followers(user_id)
    redis.set(user_id, followers, ttl=300)

This reduces load on Cassandra during high fanout operations.

Post Service

The Post Service is responsible for handling post creation, storage, retrieval, and lifecycle management. It sits at the core of content generation in the system and must be designed for high write throughput, durability, and efficient read access.

When a user uploads a post, the service does not directly store heavy media in the database. Instead, it follows a decoupled upload approach, the media is uploaded directly to object storage (such as S3-like systems), and the database only stores metadata and references (URLs/paths) to that media.

This separation ensures scalability and prevents the database from being bloated with large binary data. The typical flow is:

1. Client first calls /create_post_init on the Post Service with basic intent (caption, media type). The service creates a post placeholder (status = PROCESSING) and returns a pre-signed upload URL along with a storage path (e.g., user123/post456/original.jpg).

2. Post Service generates the pre-signed URL for object storage (S3-like), embedding permissions, expiry, and the exact storage key so the client can upload securely without exposing backend credentials.

3. Client uploads the file directly to S3 using this URL. This avoids routing large media through backend servers and enables high-throughput, scalable uploads.

4. Once the upload completes, S3 emits an ObjectCreated event to a configured destination (typically SQS/Kafka). This event acts as the source of truth that the file is fully available in storage.

5. A Media Processing Service consumes this event, generates multiple variants (thumbnail, medium, high-res, compressed formats), stores them back in object storage, and updates the Post Service (or database) with final media URLs.

6. Post Service then updates the post record from PROCESSING → READY, attaching all media references. Only after this step does the post become visible in feeds/profile.

This flow ensures non-blocking uploads, reliable processing (event-driven), and optimized media delivery across devices and networks.

A pre-signed URL is a temporary, secure URL generated by the backend that allows a client to directly upload or download a file from object storage (like S3) without exposing credentials.

It contains permissions (e.g., upload/download), an expiry time, and a signed token that authorizes the operation.

This lets clients transfer large files directly to storage, reducing load on backend servers while keeping access secure and controlled.

Data Model and Schema Design

Images and videos are stored in object storage systems, while databases only keep references to those files. In large-scale systems, storing binary files (images/videos) directly in databases is inefficient and does not scale. Instead, the system uses:

1. Relational / NoSQL Database → stores structured metadata
2. Object Storage (Blob Storage) → stores actual media files
3. CDN → serves media efficiently to users

So when you see media_urls in the schema, those are links pointing to files stored in object storage—not the files themselves. The post metadata is typically stored in a distributed database. The choice depends on scale and access patterns:

1. Relational DB (e.g., PostgreSQL/MySQL) → strong consistency, structured queries
2. NoSQL DB (e.g., Cassandra/DynamoDB) → high write throughput, horizontal scaling

At Instagram scale, systems often prefer NoSQL databases like Cassandra for write-heavy workloads such as post creation. Unlike relational systems, Cassandra requires a query-driven schema design, where we create multiple tables per access pattern instead of relying on joins or secondary indexes. This means data duplication is intentional to achieve low latency and high scalability.

-- Table: posts_by_user
-- Optimized for: "Fetch posts of a user ordered by latest"

CREATE TABLE posts_by_user (
    user_id text,
    created_at timestamp,
    post_id text,
    caption text,
    media_urls list,   -- final processed CDN URLs
    media_path text,         -- original S3 path (source of truth)
    media_type text,
    visibility text,
    status text,             -- PROCESSING / READY / FAILED / DELETED
    updated_at timestamp,
    PRIMARY KEY ((user_id), created_at, post_id)
) WITH CLUSTERING ORDER BY (created_at DESC);

-- PRIMARY KEY breakdown:
-- (user_id)     → Partition Key
-- created_at    → Clustering Column (sorted DESC)
-- post_id       → Uniqueness within same timestamp

In Cassandra, a row is uniquely identified by the full PRIMARY KEY, which combines the partition key and all clustering columns.

If two rows share the same values for this combination, Cassandra treats them as the same row, and the later write will overwrite the earlier one.

So to store multiple distinct posts safely, the key must be unique—for example:
PRIMARY KEY ((user_id), created_at, post_id), where post_id ensures uniqueness even if timestamps collide.

This table ensures that all posts of a user are stored together and are already sorted by recency, making profile and user-post queries extremely efficient without runtime sorting.

However, this table alone is not sufficient. Cassandra requires separate tables for different query patterns:

CREATE TABLE posts_by_id (
    post_id text PRIMARY KEY,
    user_id text,
    caption text,
    media_urls list,   -- processed variants
    media_path text,         -- original upload location
    media_type text,
    visibility text,
    status text,
    created_at timestamp,
    updated_at timestamp
);

A Snowflake ID is a globally unique 64-bit identifier generated without central coordination. It embeds timestamp, machine ID, and sequence number, making IDs both unique and time-ordered. This aligns perfectly with Cassandra's time-based clustering, enabling efficient retrieval of recent posts.

Overall, this design embraces Cassandra principles:

1. Multiple tables per query instead of joins
2. Denormalization and duplication for performance
3. Fast sequential reads within partitions
4. Horizontal scalability with predictable latency

This is exactly why Cassandra is widely used in large-scale feed systems where read latency and write throughput are critical.

Post Creation Flow

When a user creates a post, the system follows a structured pipeline to ensure reliability and scalability.

def create_post(user_id, caption, media_files):
    # Step 1: Upload media to object storage
    media_urls = []

    for file in media_files:
        url = upload_to_object_store(file)
        media_urls.append(url)

    # Step 2: Create post metadata
    post_id = generate_post_id()
    save_post_metadata(post_id, user_id, caption, media_urls)

    # Step 3: Publish event for downstream systems
    publish_event("POST_CREATED", post_id, user_id)

    return post_id

The critical design choice here is the use of asynchronous event publishing. Once a post is created, an event is pushed to a message queue (Kafka/SQS-like systems). This event triggers downstream services such as:

1. Media Processing Service → generate resized images, thumbnails, and compressed variants
2. Feed Service → perform fanout and update followers’ feeds
3. Notification Service → send alerts to followers
4. Analytics Pipeline → track engagement and usage metrics

This decoupling ensures low latency post creation while allowing heavy processing to happen asynchronously and scale independently.

Media Processing Service

While the Post Service handles upload and metadata storage (as described above), it does not directly optimize media for different devices and network conditions. For that, we introduce a dedicated Media Processing Service, which works asynchronously to transform raw uploads into multiple optimized variants.

This service ensures that users receive the right version of media based on device type, screen size, and network quality, significantly improving performance and user experience. The Media Processing Service is responsible for:

1. Image Resizing → Generate multiple resolutions (thumbnail, medium, high)
2. Compression → Reduce file size for faster delivery
3. Format Conversion → Convert to efficient formats (e.g., WebP, AVIF)
4. Video Transcoding → Create multiple bitrates/resolutions (240p, 480p, 720p, etc.)

The processing is asynchronous to keep post creation fast and responsive.

def create_post(user_id, media_file):
    # Step 1: Upload original file
    original_path = upload_to_object_store(media_file)

    # Step 2: Save metadata (initial state)
    post_id = save_post_metadata(user_id, original_path)

    # Step 3: Trigger async media processing
    publish_event("MEDIA_PROCESS", post_id, original_path)

    return post_id

The Media Processing Service consumes the event and generates variants:

def process_media(post_id, original_path):

    # Generate multiple variants
    thumbnail = resize(original_path, "150x150")
    medium = resize(original_path, "720p")
    high = resize(original_path, "1080p")

    # Upload processed files
    urls = upload_all([thumbnail, medium, high])

    # Update metadata
    update_post_media(post_id, urls)

The processed media is stored back in object storage, and the database stores only references to these variants.

-- Conceptual media variants (stored in DB)

media_urls = {
    "thumbnail": ".../image_150.jpg",
    "medium": ".../image_720.jpg",
    "high": ".../image_1080.jpg"
}

Alternatively, this can be normalized via a post_media table for better flexibility.

Feed Service

The Feed Service is responsible for serving the home feed under strict low-latency constraints. Instead of computing feeds on every request, it relies on a precomputed feed store (populated by Fanout Service) and performs lightweight enrichment before returning results.

This makes the system read-optimized, shifting heavy computation to write-time and keeping read paths fast and predictable. When a user opens the app, the Feed Service follows a step-by-step pipeline:

Step 1: Fetch Feed IDs (Fast DB Read)

Fetch a page of post IDs using cursor-based pagination.

post_ids = feed_store.fetch(user_id, cursor, limit=20)

Cursor-based pagination uses a pointer (cursor)—typically a value like created_at or post_id—to fetch the next set of results instead of using OFFSET.

Here's a simple, practical example of cursor-based pagination using created_at as the cursor.
First Request (Initial Load)
SELECT post_id, created_at FROM feed_by_user WHERE user_id = 123 LIMIT 3; 
Response:
posts = [ {post_id: "p1", created_at: 105}, {post_id: "p2", created_at: 100}, {post_id: "p3", created_at: 95} ] next_cursor = 95 
Here, 95 (last item's timestamp) becomes the cursor.
Next Request (Using Cursor)
SELECT post_id, created_at FROM feed_by_user WHERE user_id = 123 AND created_at < 95 LIMIT 3; 
Response:
posts = [ {post_id: "p4", created_at: 90}, {post_id: "p5", created_at: 85}, {post_id: "p6", created_at: 80} ] next_cursor = 80 

Step 2: Hydrate Metadata (Cache First)

Retrieve full post objects using cache-first strategy.

posts = post_cache.get_bulk(post_ids)

missing = find_missing(posts)

if missing:
    db_posts = fetch_from_post_service(missing)
    post_cache.set_bulk(db_posts)
    posts.update(db_posts)

Cache hit requests are served in few milliseconds, ensuring fast response times, while a cache miss triggers a fallback to the Post Service to fetch the required data.

Step 3: Apply Ranking / Personalization

Reorder posts based on engagement signals.

ranked = rank(posts, user_id)

Signals may include likes, comments, and shares, along with user interaction history and recency decay, which together help determine content relevance and ranking.

Note: This is usually a lightweight ranking layer in real-time; heavy ML runs offline.

Step 4: Return Response

Return final feed along with next cursor.

return { "posts": ranked, "next_cursor": ranked[-1].created_at }

Key Optimizations

The Feed Service achieves performance using:

1. Precomputed feeds: Feed is already materialized via fanout → no joins or aggregation
2. Cursor-based pagination: Uses created_at instead of OFFSET → scalable and efficient
3. Cache-first reads: Most metadata served from Redis → reduces DB load
4. Partial hydration: Only fetch required posts → avoids over-fetching

Fallback Path (Pull-Based)

In rare cases (cache miss, cold start, celebrity posts), the system may construct feed dynamically:

def fallback_feed(user_id):
    following = get_following(user_id)

    posts = []

    for user in following:
        posts.extend(fetch_recent_posts(user))

    return merge_and_sort(posts)

This is expensive and therefore used only as a fallback when faster paths (like cache) are not available.

Fanout Service

The Fanout Service is responsible for writing posts into followers' feeds. It is triggered asynchronously when a post is created via a message queue (Kafka/SQS). This ensures that post creation remains fast while heavy fanout work happens in the background.

Instead of computing feeds at read time, the system performs a write-time expansion: it takes a single post and distributes it to all followers by inserting entries into the feed store.

def handle_post_created(event):
    post_id = event.post_id
    author_id = event.author_id

    followers = user_service.get_followers(author_id)

    for follower in followers:
        feed_store.insert(
            user_id=follower,
            post_id=post_id,
            created_at=event.created_at,
            author_id=author_id
        )

Here's the clean, production-ready feed_by_user schema for the Feed Service:

-- Table: feed_by_user
-- Optimized for: "Fetch home feed (infinite scroll)"

CREATE TABLE feed_by_user (
    user_id text,
    created_at timestamp,
    post_id text,
    author_id text,
    PRIMARY KEY ((user_id), created_at, post_id)
) WITH CLUSTERING ORDER BY (created_at DESC);

-- PRIMARY KEY breakdown:
-- (user_id)     → Partition Key (all feed items of a user stored together)
-- created_at    → Clustering Column (sorted by recency DESC)
-- post_id       → Ensures uniqueness within same timestamp

Each insert writes into the follower's partition in feed_by_user:

INSERT INTO feed_by_user (user_id, created_at, post_id, author_id) VALUES (?, ?, ?, ?);

This ensures that when the follower opens the app, their feed is already precomputed and ready. This design trades expensive writes for extremely fast reads.

Feed Store Partition Hotspot Risk

The issue arises from using user_id as the sole partition key in feed_by_user. For highly active users, this leads to very large partitions over time, as all their feed data accumulates in a single location.

Additionally, frequent fanout writes targeting the same user can create hot partitions, where a single node becomes overloaded with read/write traffic. This can degrade performance, increase latency, and even cause instability under heavy load.

The solution is to introduce partition bucketing, where data is split into smaller chunks using a time-based bucket (e.g., day or week). Instead of a single partition per user, the key becomes a combination like:
PRIMARY KEY ((user_id, bucket), created_at, post_id)
Here, bucket (such as 2026-05-03) distributes data across multiple partitions, ensuring no single partition grows too large or becomes a hotspot. During reads, the Feed Service fetches data from recent buckets (e.g., last few days) and merges results.

This approach slightly increases read complexity but significantly improves write scalability, load distribution, and system stability at scale.

For users with millions of followers (celebrities), pushing to every follower is too expensive. So the system uses a hybrid model:

For normal users, use fanout-on-write (push) to precompute and store feeds, while for high-fanout users, use fanout-on-read (pull) to avoid excessive write amplification and compute feeds dynamically.

In the pull model, their posts are not pre-inserted into every feed. Instead, the Feed Service fetches them dynamically and merges them at read time.

Fanout operations must be reliable under retries and failures. A common technique is to rely on PRIMARY KEY uniqueness (user_id, created_at, post_id), so duplicate inserts are idempotent and naturally overwrite instead of creating duplicate rows.

Engagement Service

The Engagement Service is responsible for handling all user interactions on posts, including likes, comments, and optionally shares. This service is designed to handle high-frequency, write-heavy operations independently of the Post Service.

Separating this service ensures that spikes in engagement traffic (e.g., viral posts) do not impact post creation or feed retrieval.

Responsibilities

1. Record likes and unlikes
2. Store and retrieve comments
3. Maintain engagement counters (like_count, comment_count)
4. Emit events for notifications and ranking

Data Model (Cassandra)

-- Likes per post

CREATE TABLE likes_by_post (
    post_id text,
    user_id text,
    created_at timestamp,
    PRIMARY KEY ((post_id), user_id)
);

-- Comments per post (time-ordered)

CREATE TABLE comments_by_post (
    post_id text,
    created_at timestamp,
    comment_id text,
    user_id text,
    text text,
    PRIMARY KEY ((post_id), created_at, comment_id)
) WITH CLUSTERING ORDER BY (created_at DESC);

-- Engagement counters (high write throughput)

CREATE TABLE post_engagement_counters (
    post_id text PRIMARY KEY,
    like_count counter,
    comment_count counter,
    share_count counter
);

Interaction Flow

def like_post(user_id, post_id):
    cassandra.insert_like(post_id, user_id)
    cassandra.increment_counter(post_id, "like_count")

    publish_event("POST_LIKED", user_id, post_id)

def add_comment(user_id, post_id, text):
    comment_id = generate_id()

    cassandra.insert_comment(post_id, comment_id, user_id, text)
    cassandra.increment_counter(post_id, "comment_count")

    publish_event("COMMENT_ADDED", user_id, post_id)

These events are consumed by:

1. Notification Service → notify post owner
2. Ranking system → update relevance signals

Multi-Device Consistency

Users often access the platform from multiple devices (mobile, web, tablet), so the system must ensure a consistent feed and interaction state across sessions.

The system relies on centralized backend state, not client-side storage.

- Feed data → stored in feed store (Cassandra)
- Post data → stored in Post Service DB
- Engagement → stored in Engagement Service

Since all devices read from the same backend, they naturally converge to the same state, ensuring consistency across devices.

Feed Ranking Service

The Feed Ranking Service is responsible for ordering posts in a user's feed based on relevance rather than just recency. While the Feed Service retrieves a pool of candidate posts (typically a few hundred), the ranking layer selects and reorders the most important ones using personalization signals.

These signals include user engagement history, likes/comments, author affinity, and time decay. In practice, ranking is implemented as a two-stage pipeline: first, candidate generation (fast and broad), and second, scoring (compute-heavy but limited to top items).

The scoring logic is often powered by machine learning models trained offline and served online with strict low-latency constraints. To support this efficiently, systems use a feature store that provides precomputed user and post features in milliseconds.

The Feed Service integrates with this layer before returning results, ensuring users see the most engaging content first. If the ranking system is unavailable, the system gracefully falls back to time-based ordering to maintain availability. This design ensures a balance between performance, personalization, and reliability.

def rank_feed(user_id, candidates):
    features = feature_store.get(user_id, candidates)
    scores = model.predict(features)
    ranked = sort_by_score(candidates, scores)
    return ranked[:20]  # return top posts

Cache Layer

The Cache Layer is critical for achieving sub-100ms feed latency. It sits in front of the Feed Service and Post Service and absorbs the majority of read traffic so that databases are not hit on every request.

The system typically uses an in-memory store (e.g., Redis) to cache:

Feed results → user_id → [post_ids]
Post metadata → post_id → post object
Follower lists → user_id → [followers]

1. Cache Read Flow (Feed Fetch)

When a user opens the app, the system follows a cache-first strategy:

def get_feed(user_id, cursor):
    feed = redis.get(user_id)

    if not feed:
        feed = feed_store.fetch(user_id, cursor)
        redis.set(user_id, feed, ttl=300)   # cache for 5 mins

    return feed

2. Cache Write / Update Flow (Fanout)

When a new post is created, the Fanout Service updates both the database and optionally the cache. Two common strategies:

Option A: Write-through cache

feed_store.insert(user_id, post_id)

redis.prepend(user_id, post_id)   # update cache immediately

The cache stays fresh, ensuring up-to-date data, but it comes with a slightly higher write cost due to additional update operations.

Option B: Lazy update (most common)

feed_store.insert(user_id, post_id)

redis.delete(user_id)   # invalidate cache

This approach is simpler and safer, as the cache is rebuilt on the next read instead of being updated during writes.

3. Cache Invalidation Strategy

Keeping cache in sync is the hardest part. Common approaches:

1. TTL (Time-to-Live): Each cache entry expires automatically after a fixed duration (e.g., 5 minutes), making it simple and widely used for maintaining cache freshness.

2. Event-based invalidation: On a new post, invalidate follower caches, and on like/comment, optionally invalidate the post cache to keep data consistent.

3. Partial invalidation: Only the first N items (the hot part of the feed) are updated, while older items remain unchanged to reduce overhead.

In practice, systems combine TTL + partial invalidation.

4. Post Metadata Cache (Important Layer)

Feed only stores post IDs, so metadata caching is crucial:

posts = redis.get_bulk(post_ids)

if missing:
    posts.update(fetch_from_post_service(missing))
    redis.set_bulk(posts)

This avoids repeated DB calls for the same posts across users.

5. Follower Cache (Fanout Optimization)

Fetching followers repeatedly is expensive, so it is cached:

followers = redis.get(author_id)

if not followers:
    followers = user_service.get_followers(author_id)
    redis.set(author_id, followers)

This significantly speeds up fanout operations.

User opens app
   ↓
Feed Service → Redis (feed cache)
   ↓ (miss)
Cassandra (feed_by_user)
   ↓
Redis updated
   ↓
Post metadata cache (Redis)
   ↓
Final response

Notification Service

The Notification Service is responsible for delivering real-time and near real-time updates to users about activities such as likes, comments, follows, and new posts. Its goal is to ensure users remain engaged and informed, even when they are not actively using the application.

This service operates in an event-driven manner, consuming events generated by other services (Post Service, Feed Service, User Service, etc.) and transforming them into user-facing notifications.

When an action occurs (e.g., someone likes a post), the system publishes an event to a message queue.

def handle_like_event(event):
    actor = event.user_id
    post_owner = event.post_owner_id

    notification = {
        "type": "LIKE",
        "actor": actor,
        "target_user": post_owner,
        "post_id": event.post_id,
        "created_at": now()
    }

    notification_queue.publish(notification)

The Notification Service consumes this event:

def process_notification(notification):
    save_notification(notification)
    send_push(notification)

Notifications are stored so users can view them later (in-app notification tab).

-- Table: notifications_by_user

CREATE TABLE notifications_by_user (
    user_id text,
    created_at timestamp,
    notification_id text,
    type text,
    actor_id text,
    post_id text,
    is_read boolean,
    PRIMARY KEY ((user_id), created_at, notification_id)
) WITH CLUSTERING ORDER BY (created_at DESC);

For offline users, the service integrates with external push systems:

FCM (Firebase Cloud Messaging) → Android
APNs (Apple Push Notification Service) → iOS

def send_push(notification):
    device_tokens = get_user_devices(notification["target_user"])

    for token in device_tokens:
        push_provider.send(token, format_message(notification))

These systems deliver notifications even when the app is not running.

In-App (Real-Time) Notifications

If the user is online, notifications can be delivered instantly via WebSocket connections.

gateway.send(user_id, notification)

This enables instant UI updates (e.g., "Someone liked your post").

Notification Service: Add Aggregation

The problem is that without aggregation, the system may generate a large number of noisy and redundant notifications (e.g., "User1 liked", "User2 liked", …), which can overwhelm users and degrade experience. It also increases write load and unnecessary push traffic.

The solution is notification aggregation, where similar events within a time window are grouped into a single notification (e.g., "10 people liked your post"). This is typically implemented by maintaining an aggregation window and updating an existing notification record instead of creating new ones. This reduces noise, improves user experience, and lowers system load while keeping notifications meaningful and concise.

Scalability Considerations

At Instagram scale, the system must be designed for horizontal scalability from day one. Each service (Feed, Post, Fanout, Notification) should be stateless, meaning no request-specific data is stored in memory between calls. This allows instances to be replicated behind load balancers, enabling the system to handle sudden traffic spikes simply by adding more servers.

On the data layer, sharding is essential. For example, in Cassandra, data is naturally distributed using partition keys (like user_id), ensuring load is spread evenly across nodes. This avoids bottlenecks and enables near-linear scaling as more nodes are added.

The system also relies heavily on a multi-layer caching strategy. Redis caches frequently accessed data such as feeds and post metadata, drastically reducing database reads. Without caching, even a well-sharded database would struggle under read-heavy workloads.

Finally, message queues (Kafka/SQS) play a critical role in scaling writes. They decouple services and act as buffers, allowing systems like Fanout and Notification to process workloads asynchronously and absorb traffic spikes without failure.

Consistency vs Performance

The system intentionally prioritizes eventual consistency over strong consistency to achieve high performance and availability.

In practice, this means when a user creates a post, it may not appear instantly in all followers' feeds. Instead, it propagates asynchronously through the fanout pipeline. Some users may see the post slightly later, but the system guarantees it will appear eventually.

This trade-off avoids expensive coordination across distributed systems, which would otherwise increase latency and reduce throughput. For a feed system, slight staleness is acceptable, but slow performance is not.

Failure Handling

Failures are inevitable in any distributed system, so the architecture must be built for resilience.

The system uses retry mechanisms to handle transient failures. For example, if a fanout write fails for a subset of users, the message remains in the queue and is retried automatically.

Dead-letter queues (DLQs) are used to capture permanently failing events. These can be inspected and reprocessed without impacting the main system flow.

Operations are designed to be idempotent, meaning repeated execution does not cause incorrect results. This is crucial because retries may cause the same operation to run multiple times. For example:

- Duplicate fanout writes overwrite the same row (safe due to primary key)
- Notification retries do not create duplicate entries

This ensures the system remains stable even under partial failures.

Conclusion

Designing an Instagram feed is fundamentally about optimizing for fast reads while managing massive write fanout. The system shifts complexity to the write path (fanout) so that reads remain simple, fast, and scalable.

A hybrid approach—combining fanout-on-write for most users and fanout-on-read for high-fanout accounts—provides the best balance between performance and cost.

By leveraging distributed databases, caching layers, and asynchronous processing, the system can serve millions of users with low latency. The real challenge is not just retrieving posts, but doing so efficiently, reliably, and at massive scale while maintaining personalization and a seamless user experience.