Tautik Agrahari

bit.ly system design — building a url shortener

every system design conversation eventually circles back to bit.ly. it's the canonical "looks dead simple but isn't" service — take a long url, hand back a short one, redirect on click. the surface is straightforward. then you start asking how it scales to a billion urls and 100M daily active users, and the design gets way more interesting.

this is a full walkthrough — requirements, api, the dumb-first-design, then the deep dives where things actually get fun. i'm gonna walk through it as if i'm building it myself, talking through each decision as it comes up.

what you'll take away

quick pointers so you know what to look for as you read:

what we're building

a url shortener. user sends in https://www.example.com/some/very/long/url, we hand back something like short.ly/abc123. anyone who hits that short url gets bounced to the original. plus a couple of optional flavors:

scope-wise we're skipping user accounts, click analytics, and spam detection. they add complexity without changing the core architecture.

what the system has to do

requirements come in two flavors. functional — the features. non-functional — the qualities.

functional:

non-functional:

now the most important fact about this whole system, the one that drives every architectural decision later: read/write traffic is wildly asymmetric. one user creates a short url, then potentially millions click it. typical ratio is 1000 reads per 1 write. caching strategy, service topology, database choice — all of it falls out of that single number.

core entities

before drawing boxes, name the things our system moves around.

really two entities, since short and long urls live in the same row of the same table. user is auxiliary.

the api

two endpoints. one to shorten, one to redirect.

POST /urls
{
  "long_url": "https://www.example.com/some/very/long/url",
  "custom_alias": "optional",
  "expiration_date": "optional"
}
→ { "short_url": "http://short.ly/abc123" }
GET /{short_code}
→ HTTP 302 redirect to the original long url

dead simple. no fancy verbs, no nested resources. one post, one get.

the dumb-first design

start as small as possible — client, server, database. that's it.

basic bit.ly architecture — client posts to a primary server, server reads/writes a single database for both shortening and redirecting

on a write:

  1. client POSTs to the primary server with the long url
  2. server validates the url (something like an is-url check)
  3. server generates a short code (we'll get to how in a sec — this is the fun part)
  4. if the user gave us a custom alias, we use that — but we first check the db to make sure it's not already taken. nightmare scenario is a custom alias colliding with a generated code in the future. easy fix: prefix all generated codes with a character that aliases can't use, or keep them in different namespaces.
  5. server writes (short_code, long_url, created_at, expires_at, created_by) to the db
  6. server returns the short url to the client

on a read:

  1. user's browser hits short.ly/abc123
  2. server looks up abc123 in the db
  3. if it's there and not expired, server returns a 302 Found with the long url in the Location header
  4. browser follows the redirect, user lands on the original site
  5. if expired, return 410 Gone. if missing entirely, 404.

quick aside on 301 vs 302. 301 is "permanent" — browsers and intermediate caches will cache the redirect, future requests might never hit our server. 302 is temporary — every request comes through us.

we want 302. why?

right, working system. the requirements aren't fully met yet though — we hand-waved short code generation, redirects aren't fast, we can't scale. let's actually get to the interesting parts.

deep dive 1 — generating unique short codes

three properties we want:

let's walk a few options.

option 1: random + check

generate a random number, base62-encode it, slice the first 6 characters.

base62 encoding is a numbering system that uses 0-9, a-z, A-Z for 62 total symbols. 6 base62 characters gives 62^6 ≈ 56 billion combinations. lots of room.

problem: random isn't unique. how often will two random short codes collide? more often than you'd think. this is the birthday paradox in action — in a room of just 23 people there's already a 50% chance two share a birthday despite 365 possible birthdays. apply the same math to 1B short codes randomly chosen from 56B options and you get roughly 880k collisions. not catastrophic but not zero.

so we'd need a db check before saving — generate a candidate, look it up, if it exists try again, otherwise save. that adds a read on every write. not great.

option 2: hash the long url

hash(canonicalize(long_url)), take the first 6 base62 chars. md5, murmur, sha-256, whatever.

a good hash function has the avalanche property — change one bit on the input and the output looks completely different. so the collision behavior is the same as random: 56B possible outputs, same ~880k collisions at 1B urls. same db check needed.

what's nice about hashing — same long url always maps to the same short code, so we get deduplication for free. what's not nice — most url shorteners actually want multiple short codes per long url (different users want different aliases, different expirations, separate analytics). dedup is often a feature you don't want.

option 3: counter + base62 — the winner

just keep a counter. first url gets short code 1, second gets 2, third gets 3. base62-encode the counter to keep things compact.

the counter guarantees uniqueness by construction. no collision checks, no extra reads. the encoding keeps it short — at 1B urls, our short code is a 6-character string. quick math: 1,000,000,000 in base62 is 15ftgG. and 62^6 ≈ 56 billion, so we don't need to bump up to 7 characters until we cross that threshold (which would take a lifetime at any reasonable url shortener's growth rate).

three short-code strategies — random + check, hash long url, counter + base62 — counter wins because it skips the db read on every write

there's one fair concern with the counter: it's predictable. a competitor or scraper can iterate 1, 2, 3, ... and discover every short url we've generated. two ways to deal with this:

we're going with the counter. cleanest, fastest, no read amplification on writes.

deep dive 2 — making redirects fast

the read path is short_code → long_url. without optimization the server walks every row of the urls table on each lookup. at 1B rows that's a non-starter — full table scan for every redirect. dead.

step 1: index the short code

stick a primary key (or unique index) on short_code. now the database keeps a b-tree of short codes pointing to row locations on disk. lookups become O(log n) — typically a handful of memory reads followed by one disk seek to fetch the row.

postgres does this automatically for primary keys. for a WHERE short_code = ? lookup this is plenty fast — well under 10ms even at 1B rows.

step 2: cache the hot path

even with an index, every lookup eventually touches disk. and a huge fraction of our traffic is going to be a small number of viral short codes. that's a caching layup.

stick a redis (or memcached) instance in front of the db. on every read:

  1. check redis for short_code → long_url
  2. hit? return immediately — sub-millisecond
  3. miss? read from db, populate redis, return

eviction policy is least recently used — if the cache fills, kick whatever's been quiet longest. natural fit for url shorteners because old links go cold fast.

key:   "abc123"
value: "https://www.example.com/long/url"

dead simple key-value lookup. redis hits sub-ms, db hits maybe 10ms. for hot urls we basically never touch the database. tie the cache TTL to the url's expiration time so expired entries fall out automatically.

step 3 (optional): cache at the edge

for global users, even a redis hit means a round-trip to whatever region our service runs in. a user in tokyo hitting a virginia data center is eating 200ms just on the network.

a CDN (cloudflare, fastly, akamai) caches responses at edge servers worldwide. for popular short codes the redirect can be served from the tokyo edge in 10–20ms without ever reaching our origin.

trade-offs:

worth it for the most-clicked links and globally distributed audiences. for everything else, a single redis layer in your primary region is plenty.

deep dive 3 — scaling to 1B urls and 100M DAU

let's do the back-of-envelope math.

writes:

reads:

storage:

so the dataset isn't the problem. the read throughput is. and we already solved most of that with caching. let's wire up the rest.

split read and write services

reads and writes have completely different traffic profiles:

scaling them together is wasteful. we'll split them into two services behind an api gateway. the gateway routes POST /urls to the write service and GET /{short_code} to the read service.

each service horizontally scales independently. read service runs hot — many instances, all behind a load balancer, hitting redis cache and falling back to db. write service stays small — one or two instances handle the entire write load.

is splitting really worth it? honestly, for a service this small, sometimes no. running two services means two deployments, two dashboards, two on-call rotations. but the read/write asymmetry here is so extreme that the split is genuinely useful — you can autoscale the read service on cpu/memory thresholds without ever touching the write service.

the global counter problem

now that we've split into multiple write service instances, the counter has to live somewhere shared. you can't have each instance keeping its own count — they'd all hand out short_code 1 simultaneously.

the answer is a central redis instance holding the counter. redis is single-threaded, so its INCR command is atomic — two simultaneous calls always get different values. one gets 1000, the next gets 1001, never the same one twice.

every time the write service needs a new short code:

  1. INCR counter on redis → returns next value
  2. base62-encode it
  3. write (short_code, long_url, ...) to the db
  4. return short url to client

counter batching — kill the per-write network hop

doing a redis call on every single write is fine but wasteful. we can do better.

batch counter ranges to each write service instance. when a write service starts, it asks redis for a chunk of 1000 counter values:

INCRBY counter 1000  → returns N (start of the batch)

the instance now owns counters N through N+999 locally. it serves urls from this range with zero cross-network calls. when it runs out, it asks for the next 1000.

if a write service crashes mid-batch, those unused counters are lost forever. who cares — 56 billion total slots, losing a few hundred is invisible. we just need uniqueness, not continuity.

multi-region — split the counter space

if the service runs in multiple regions (US, EU, APAC), having every region hit a single global counter is a latency disaster. instead, partition the counter space so each region owns a slice:

each region runs its own redis with its own slice of the namespace. no cross-region coordination on the hot path. a UNIQUE constraint on short_code in the database is the ultimate safety net if anything ever drifts.

the database

we said 500 GB on a single instance. that's fine for postgres or mysql today — vertically scale to instances with multi-TB SSDs and hundreds of GB of ram easily. so we don't actually need to shard.

if we ever do need to shard (say, data grows past 5 TB), we'd shard by short_codehash(short_code) % N to pick a shard. but for the next decade of growth, one well-tuned postgres handles this whole system.

high availability matters though. one box dying takes down the entire product. so:

for redis (counter and cache), redis sentinel or cluster mode handles failover. if redis loses the latest counter values during a failover, we lose a few short codes — fine, the database's unique constraint catches anything that would actually collide.

the final design

zooming out, here's everything wired together.

full bit.ly architecture — api gateway routing to read and write services, redis cache and global counter, postgres database with replicas

every component scales independently. read service autoscales on traffic spikes without touching writes. cache absorbs the hot path. database serves only cache misses. counter is durable, atomic, and never a bottleneck thanks to batching.

what i'd take away from all this

a few thoughts after walking through it.

nothing is best. everything depends on the actual traffic, the actual constraints, the actual money. but for a url shortener at this scale, this design holds up. you're paid to solve a problem, not to ship the fanciest architecture.