A System Designer’s Guide to CDNs
A CDN is a globally distributed cache and routing layer that sits between users and your origin. It moves copies of your content into hundreds of points of presence (POPs) close to users, terminates TLS there, and uses anycast or DNS routing to get each user to the nearest one. In a system design interview, “how do you handle global users” and “how do you keep p99 down for static assets” almost always cash out to a CDN. The credible answer names which of three concrete problems you’re solving before saying the word.
1. The three problems a CDN solves
A CDN solves three problems at once: round-trip latency to faraway users, egress bandwidth at your origin, and request load on your origin servers. All three trace back to the same physical fact: light is slow, and your origin lives in one place.
Latency is the loudest of the three. Light in fiber travels at roughly two-thirds of its vacuum speed, around 200,000 km/s. A round trip from Sydney to a Virginia origin is about 250ms even on a perfect link. Add the TCP three-way handshake and a TLS 1.3 handshake on top, and a cold connection is closer to 500ms before the first byte of HTML moves. 5ms of application work is negligible against a 500ms first-byte budget. Putting a copy of the response closer to the user is the only lever physics leaves you.
Bandwidth is the quiet one that shows up on the invoice. Serving a 2MB hero image to 100,000 users is 200GB of egress, regardless of how fast your origin streams it. Cloud egress is priced at roughly 0.12 per GB on the big providers, falling sharply at CDN volume. The CDN is doing arbitrage on its own peering and transit deals, then reselling the bandwidth at a lower marginal cost than your origin pays. Bandwidth offload is a real line item.
Origin load is the third. A CDN turns N requests to origin into one request per POP per TTL window, two-to-three orders of magnitude fewer at any meaningful scale. Every request that hits origin is a request your application has to serve, even if the response is byte-identical to one it served a millisecond ago. For uncacheable content that’s unavoidable, but most web traffic is cacheable in principle: images, scripts, fonts, HTML for pages that don’t depend on session, even JSON for catalog endpoints.
Know which of the three you’re naming before you reach for a CDN. The rest of this post is what’s behind it.
2. What a CDN actually is
A CDN is a geographically distributed cluster of cache servers sitting between users and your origin, plus a routing layer that gets each user to the nearest cluster. Two parts: the cache fleet, and the routing mechanism that points users at it. They fail in different ways and they trade off differently, and they’re worth keeping separate when you reason about the system.
The cache fleet runs tens to low hundreds of POPs (Fastly around 90; Cloudflare and Akamai several hundred each, across many more cities than a single-vendor count suggests). A POP, Point of Presence, is a cluster of servers inside a datacenter or carrier-neutral colo, often co-located with an IXP (Internet Exchange Point, where many networks peer cheaply). Each POP runs reverse proxies, often a customized nginx or a vendor-built proxy, backed by RAM and SSD caches. The same content is held in many POPs; the design assumption is that replicating content many times is cheaper than transferring it long distances on every request.
The routing layer is what turns “the CDN has a POP in Mumbai” into “the Mumbai user actually reaches the Mumbai POP.” The internet doesn’t ship with a “send this user to their nearest server” primitive, so CDNs solve it using either anycast routing or DNS-based routing, which is the subject of the next section.
The name “CDN” is older than what the product does today. It comes from the late 1990s when Akamai and its peers built “content delivery networks” to distribute static files closer to users. Modern CDNs still do that, but they also terminate TLS, run web application firewalls, absorb DDoS attacks, accelerate uncacheable API traffic, run serverless code, and deliver video. The name stuck while the surface area grew. When someone says “we put it behind a CDN” in 2026, the cache is often the smallest of the things they mean.
3. How a client finds the nearest edge
Two strategies dominate routing: anycast and DNS-based routing. Which one a CDN picked years ago shapes how it behaves under failure and how much control you have over routing decisions today.
3.1. Anycast
Anycast advertises the same IP address from every POP via BGP, the protocol that runs internet routing. Cloudflare and Fastly are anycast-first networks. When a user’s packet enters the internet, transit providers’ routers see multiple paths to that IP and pick whichever they consider closest. “Closest” in BGP terms means fewest autonomous-system hops, weighted by per-AS preferences and the local routing policy of whichever ISP the packet is currently in. It approximates network distance, which usually correlates with latency, but not always.
The strength of anycast is automatic failover. When a POP goes down, its BGP route is withdrawn from peering announcements, and packets that would have gone there flow to the next-best POP within seconds. No client cache to bust, no DNS TTL to wait out. The weakness is loss of control: transit providers make the routing decision, optimizing for their own cost rather than your latency. A user in São Paulo can end up at a Miami POP because their ISP’s transit is cheaper that way, even when a São Paulo POP exists.
3.2. DNS-based routing
DNS-based routing has the CDN run an authoritative DNS server that returns a different IP depending on where the resolver is located. The user resolves cdn.example.com, the CDN’s DNS sees the resolver IP, picks a POP, and returns its IP. The user then connects to that specific POP. CloudFront and the classical Akamai network use this approach.
The strength of DNS routing is fine-grained control. The CDN can route by user geography (or rather, by resolver geography), by POP load, by real-time health signals, by paid customer tier. The weakness is the indirection. DNS lookups add a step before the connection starts. TTLs on the CDN’s DNS records have to be short to enable fast failover, which fights against client caches.
Resolver geography is not user geography. A user in Brazil using Google’s 8.8.8.8 may be routed based on where Google’s nearest resolver lives, not where they live. EDNS Client Subnet (ECS) partially fixes this by passing the user’s prefix to the authoritative server, but it leaks user IP prefixes downstream and isn’t universally honored.
Naming both, and the asymmetric weakness of each, is the credible answer when CDN routing comes up. Rule of thumb: anycast for simplicity and resilience, DNS-based for control, hybrid in practice. Most large CDNs do both depending on the product line.
4. Pull CDNs: lazy fetch from origin
A pull CDN lazily fetches assets from your origin the first time someone asks for them, caches the response at the edge POP, and serves subsequent requests directly from cache. You don’t upload anything ahead of time. You point the CDN at your origin and tell it what to cache, typically via response headers. This is the default model for almost every CDN: Cloudflare, CloudFront’s default behavior, Fastly, KeyCDN.
The request flow on a cold miss:
- The user requests
/banner.jpgfrom the edge POP. - The POP doesn’t have it cached.
- The POP fetches from origin and stores the response.
- The POP returns the response to the user.
- The next user routed to the same POP gets a cache hit.
Two consequences fall out of this model. First, somebody pays the first-request penalty. The cold user waits for the full POP-to-origin round trip on top of their own POP latency, which can be hundreds of milliseconds. Second, the contract between origin and edge is the response headers, mostly Cache-Control. The edge needs to know how long to keep the entry and under what conditions, and Cache-Control is how origin says so. Vendor-specific overrides exist (Surrogate-Control, CDN-Cache-Control), but Cache-Control is the lingua franca.
The basic shape is Cache-Control: public, max-age=86400. Anyone can cache it (public), for one day (max-age=86400). For shared caches like a CDN, s-maxage overrides max-age: Cache-Control: public, max-age=60, s-maxage=86400 tells browsers to cache for a minute but tells the CDN to cache for a day. That asymmetry is useful when you want browsers to recheck often but the CDN to absorb the load.
stale-while-revalidate is the header that does most of the heavy lifting in production. Setting Cache-Control: max-age=60, stale-while-revalidate=600 lets the edge serve the stale entry for up to ten minutes past expiry while it asynchronously refreshes from origin. The first user past expiry doesn’t pay the refresh latency, and the next user gets the fresh entry. This single header smooths the cold-miss spikes that otherwise show up in p99 latency graphs every TTL boundary. Its companion stale-if-error extends the same idea to origin failure: serve stale up to N seconds if origin returns 5xx. Cheap availability win for content that’s tolerable when slightly stale.
When pull is the right default: anything where the asset universe is larger or less predictable than what you’d want to fan out everywhere. Most web traffic fits this. You don’t know which images on a 50,000-product catalog will be hot today, so you let the cache learn.
5. Push CDNs: pre-positioned at the edge
A push CDN inverts the model. You upload assets to the CDN ahead of time, and the CDN propagates them to its POPs before any user requests them. No first-request penalty. The CDN never talks to your origin during a user request, because the assets already live at the edge.
In practice, the pure-push product is less common than it used to be. Most “push” deployments today are pull CDNs in front of object storage (S3, R2, Google Cloud Storage), where the object store acts as a globally available origin and the CDN is still lazily caching. Strict push, where you explicitly upload to each POP or to a CDN-managed origin replica, shows up in narrower workloads. KeyCDN’s “push zones” and some Akamai NetStorage configurations are the textbook examples.
Three workloads where push pays for itself:
- Software releases. Game patches, OS updates, mobile app binaries. The release time is known, the asset set is known, the access pattern is “every user hits this within the same hour.” Pre-positioning at every POP turns the launch from “DDoS your own origin” into “saturate the CDN’s fan-out capacity,” which is what the CDN is built for.
- Video on demand. HLS and DASH segments are written once and served many times over months. Pushing them to the edge as part of the encoding pipeline matches the access pattern and avoids cold misses on long-tail content.
- Static sites with a small fixed asset set. A marketing site or a documentation portal with a few hundred files that change on deploy. Push at deploy time, never see a cache miss.
Where push loses: dynamic catalogs, user-generated content, any system where the asset universe is much larger than the working set. You’d waste storage at every POP holding assets nobody requests, and you’d be doing the cache’s job badly because you have to predict which content matters.
Push and pull are a tradeoff between first-request latency and asset universe size. Push optimizes for the first cold user at the cost of storing everything everywhere; pull optimizes for the asset universe at the cost of paying the cold-miss latency. Most systems use pull and lean on stale-while-revalidate to soften the cold-miss tax. Push is the answer for narrow, predictable, fan-out-heavy workloads.
6. Cache keys, TTLs, and the query-string trap
The cache key is whatever the CDN hashes to decide “is this the same request as one I’ve seen before.” The default key is the URL plus the Host header. The Vary response header lets origin extend the key with request-header values: Vary: Accept-Encoding produces separate cache entries for gzip and brotli clients, so each client gets the encoding it supports without mixing them up.
The pitfall that bites most teams is query strings. Most CDNs include the full URL including the query string in the cache key by default. That means /banner.jpg?utm_source=twitter and /banner.jpg?utm_source=facebook are different cache entries serving the same response. A campaign that sprays UTM params across a hot asset can shred your cache hit ratio. The fix is to configure the CDN to either strip query strings, sort them, or only key on a specific allowlist. Every major CDN exposes this, but you have to turn it on; the include-everything default is what catches teams.
TTLs are the other half of the model. Three knobs interact, and “which one wins” depends on the vendor:
| Knob | Set by | Scope |
|---|---|---|
Cache-Control: max-age=N | Origin response header | Browsers and shared caches |
Cache-Control: s-maxage=N | Origin response header | Shared caches only (overrides max-age for the CDN) |
| Vendor TTL override (cache rule) | CDN config | Last word, vendor-defined precedence |
The intent behind s-maxage is exactly the browser/CDN split from §4: short browser TTL so users get fresh-looking content, long CDN TTL so origin is offloaded. The vendor override exists because you don’t always control origin headers (legacy app, third-party service), and it gives operators a way to impose policy without touching application code.
stale-while-revalidate=N and stale-if-error=N deserve repeating here: they’re the only widely-supported way to get “soft” cache behavior, where the edge can hide origin slowness or transient failures behind a slightly stale response. Production deployments that don’t set them are leaving p99 on the table.
Cache key is “what counts as the same request,” TTL is “how long do I keep it.” Get the key right and your hit ratio reflects your traffic. Get it wrong and you can have a 0% hit ratio on what looks like the most cacheable workload in the world.
7. Invalidation: TTL, purge, and surrogate keys
Invalidation is where caching costs you sleep, and CDNs make the cost visible because every purge has to cross the network to hundreds of POPs. Three strategies, roughly in order of how often each shows up in production:
- TTL-based. Set a short TTL, let entries expire naturally. No invalidation API needed. Works when staleness is acceptable within the TTL window. This is what most teams default to for content that changes regularly.
- Purge by URL. Tell the CDN “evict
/banner.jpg.” Simple in concept, expensive at scale if you don’t know exactly which URLs are affected by a change. A single product update on an e-commerce site can affect dozens of URLs (the product page, the category page, the search index, the homepage, the sitemap). - Surrogate keys, also called tag-based purge. Tag entries with one or more labels when serving them (
Surrogate-Key: product-42 category-7 sale), then purge by tag. One PURGE onproduct-42invalidates every cached entry, anywhere in the CDN, that touched that product. Fastly introduced this in mainstream CDNs; most vendors now have an equivalent (Cloudflare’s Cache Tags, Akamai’s Fast Purge with tags).
Purge latency varies sharply by vendor, and it matters more than people expect:
- Fastly’s instant purge propagates a purge across their global fleet in around 150ms. The architecture that enables this is a globally replicated “purge log” separate from the cache itself, so every POP learns about purges immediately even if the cached object hasn’t physically moved. Tag-based purge is built on top.
- CloudFront invalidation was historically a multi-minute operation, in line with its S3-backed architecture. Recent improvements have brought it down significantly, but the design wasn’t built around sub-second global purge the way Fastly’s was.
The pattern that shows up in interviews and in real systems: if your write path is “publish content, then immediately read it,” purge latency is in your critical path. Most teams avoid that path. They use short TTLs (60s, 300s) for content that changes often, accept the staleness window, and reserve explicit purge for catastrophic cases: a security incident, a legal takedown, a published error that has to be wiped now.
Tagged purge changes the mental model: instead of tracking which URLs to evict, you tag cached entries by logical entity and purge by entity. “Product 42 changed, purge tag product-42”, and the CDN figures out which URLs were tagged with that entity. It collapses an N-URL invalidation into a single API call.
8. Origin shielding and tiered caching
Tiered caching inserts a shield POP between the edge and origin, so a cold-asset stampede across N edge POPs becomes one origin fetch instead of N. Without it, every POP that misses the cache hits origin independently, which is the cache stampede problem at CDN scale: origin sees N concurrent identical requests for the same uncached file, often during a traffic spike when origin can least afford it.
In detail: every edge POP that misses the cache goes to the shield first instead of going to origin directly. The shield deduplicates. If it has the asset cached, it serves the edge POP from cache. If it doesn’t, it fetches once from origin, stores the result, and serves it back to every edge POP that asks. Origin sees one request per shield miss, not N.
Vendors expose this as a configuration option:
- Cloudflare Tiered Cache with Argo or with the standard tiered cache topology.
- CloudFront Origin Shield, a single designated regional cache layer in front of origin.
- Fastly shielding, where you pick a POP to act as the shield.
The reason this isn’t on by default is that the shield adds a hop. On a cache hit at the edge you don’t want it, because the edge is already serving the user. On a miss you do, because the shield collapses concurrent misses. Vendors default it to off because at well-tuned hit ratios the extra hop hurts more than the deduplication helps.
When to turn it on: high-fanout traffic to a finite asset set. Software downloads, image catalogs, video segments. With shielding, origin sees something like a 10x to 100x reduction in miss traffic, because the shield does the deduplication work the CDN couldn’t do across POPs.
When it doesn’t help: highly personalized content where every URL is unique per user. The shield has nothing to collapse, because the requests aren’t for the same asset.
The deeper version is multi-tier hierarchies. CloudFront’s “Regional Edge Caches” sit between the edge POPs and origin shield as a third tier. The general pattern: each tier collapses misses for the tier below, trading an extra hop on cold paths for fewer origin requests overall. It’s the same idea as a CPU cache hierarchy, applied to HTTP.
9. TLS termination at the edge
The CDN holds your TLS certificate and terminates HTTPS at the edge POP, which means the CDN sees your traffic in plaintext. The TLS handshake the user performs is with the CDN, not with your origin. The CDN-to-origin connection is a separate TLS session, often reused across many user requests. From the user’s browser, the connection is encrypted end to end. From your origin’s perspective, the CDN is the client.
The performance win is large. A full TLS 1.3 handshake is at least one round trip on top of TCP, two on a cold connection without session resumption. Terminating at a POP 20ms from the user instead of an origin 250ms away saves that round-trip latency on every cold connection. With 0-RTT resumption on returning clients, the win compounds.
The operations win is also large. Certificate management gets centralized at the CDN. You upload your certificate once (or use the CDN’s auto-issued cert via ACME or a vendor-managed CA), and every POP serves it. No per-server cert rotation, no manual reload across a fleet. For most teams this is the bigger win in practice, because TLS rotation done badly is a recurring source of outages.
The plaintext-at-the-edge property has to be evaluated against your trust model. For most workloads this is fine, because you were already trusting the CDN with your traffic. For workloads under contractual or regulatory constraints (PCI DSS scope reduction, certain HIPAA configurations, some financial-services arrangements), it matters. The mitigation is a pass-through mode, sometimes called “SNI passthrough” or “TCP-mode” routing, where the CDN forwards encrypted bytes to origin without decrypting them. The cost of that mode is giving up everything that depends on decryption: caching, WAF, edge compute, content-aware routing, even basic HTTP-level logs. In practice this leaves the CDN doing little more than DDoS absorption and anycast routing, which is why teams that try pass-through usually revert it once they see the feature gap.
10. Dynamic content acceleration: what CDNs do for uncacheable traffic
What a CDN does for an API is dynamic acceleration, not caching. Even for uncacheable content (per-user pages, API responses, session-keyed data) the CDN adds value through pooled origin connections, route optimization over its backbone, and TLS termination near the user. This is “dynamic acceleration,” sometimes called “whole-site acceleration” or “DSA” (Dynamic Site Acceleration). It’s where CDNs differentiate beyond raw caching and where their value to API-heavy products lives.
Three primitives carry most of the win:
Persistent and pooled origin connections. Each POP maintains long-lived, multiplexed HTTP/2 or HTTP/3 connections to origin. A user’s request doesn’t pay TCP setup, TLS handshake, or connection-pool warm-up cost to origin; it borrows an existing pooled connection that’s already established and already warm. On a cold user, this saves 100ms to 300ms depending on origin distance. Across thousands of concurrent users it also reduces origin’s connection-table pressure to a handful of long-lived flows per POP.
Route optimization over the CDN’s backbone. The public internet routes packets between ISPs based on each ISP’s commercial peering decisions, not on latency. The CDN’s own network, by contrast, has direct routes between most major regions and known performance characteristics. The set of dynamic requests that hit Cloudflare’s Argo, Akamai’s SureRoute, or Fastly’s equivalent are shipped over the CDN’s private backbone for as much of the path as possible, hopping out to the public internet only at the last leg. The win varies by route: sometimes 10%, sometimes 40%, occasionally much more on paths the public internet routes poorly.
Request coalescing. When N users request the same uncacheable resource simultaneously and the response is slow, the CDN can collapse them into one origin request and fan the response back out. Not every vendor exposes this for dynamic content (it’s universal for cacheable content, where the cache itself does it). Where it does exist, it smooths traffic spikes on dynamic endpoints.
Even a global API with no cacheable surface gets meaningful latency wins this way: the per-region POP terminates TLS close to the user, reuses connections to origin, and uses better routes for the long haul. You’re paying for the POP topology, not for the cache.
11. Security at the edge
Security is the second product line at every major CDN, and at some it’s now larger than the delivery business. Four capabilities cover the bulk of what an interviewer expects you to mention.
DDoS absorption. Volumetric attacks (gigabits to terabits of garbage traffic) are absorbed by the CDN’s total network capacity, which is larger than any single origin’s connection to the internet. Anycast helps here too: a single attack source can only reach the POPs its packets happen to route to, so the volume is split across the network rather than concentrated. Cloudflare regularly reports defending attacks in the hundreds of millions of requests per second; an origin sized to handle anything close to that would be wildly overprovisioned.
This is also the pitch behind “free CDN tiers”: even teams that don’t need caching benefit from putting any origin behind the CDN’s DDoS shield.
WAF (Web Application Firewall). Pattern-matching at the edge against known attack signatures: SQL injection, XSS, path traversal, command injection, deserialization attacks. The OWASP Core Rule Set is the standard open baseline; every vendor layers proprietary signatures on top. WAF runs before requests reach origin, which means malicious traffic is blocked at the chokepoint instead of consuming origin CPU.
Bot management. Distinguishing humans, good bots (Googlebot, Bingbot, monitoring services), and bad bots (scrapers, credential-stuffers, ticket scalpers, ad-fraud crawlers). The signals used are behavioral and protocol-level: TLS fingerprints (JA3, an MD5 hash of selected ClientHello fields, and JA4, the newer scheme that emits a structured multi-part fingerprint covering TLS, HTTP/2, and more), HTTP header order and presence, mouse-movement and typing-cadence signals from in-page beacons, IP reputation databases. Cloudflare’s Bot Management and Akamai’s Bot Manager are the visible products.
Rate limiting. Per-IP, per-cookie, per-API-key, or per-session counters at the edge. The natural place for rate limiting, because every request passes through the CDN and most abuse is observable by volume. Rate-limit rules at origin work too, but they require requests to reach origin to be counted, which doesn’t help if origin is what you’re trying to protect.
The architectural reason all of this lives at the edge is that the edge is a chokepoint your origin can’t be. Origin sits behind one ASN, with finite uplink. The CDN sits in front of users worldwide, with hundreds of times the capacity, and it sees every request. Filtering at the chokepoint is cheaper, faster, and more effective than filtering at origin.
The credible structure for a security answer on a CDN-fronted system: the CDN absorbs the volumetric layer, a WAF plus rate limits sit on top, and origin is protected from direct internet access by firewall rules that only accept traffic from the CDN’s published IP ranges.
12. Streaming and video
Video is the largest single category of internet traffic, and the streaming protocols that dominate (HLS and DASH) were designed to be friendly to HTTP caches. Much of what modern CDNs do well at fan-out (tiered caching, long-lived connections, low-latency segment delivery) was driven by streaming workloads ahead of other use cases.
HLS (HTTP Live Streaming, originally Apple’s spec) and DASH (Dynamic Adaptive Streaming over HTTP, the ISO/IEC equivalent) work the same way. The encoder chops the source into short segments (6 seconds is the modern default for standard HLS; low-latency variants go down to fractions of a second) at multiple quality tiers, exposes each segment as a regular HTTP URL, and writes a manifest file (.m3u8 for HLS, .mpd for DASH) that lists them. The player downloads the manifest, then fetches segments one by one, choosing the quality tier per segment based on measured throughput. Every segment is a cacheable static file at a stable URL. Every manifest is a small file that updates on a known cadence for live streams or is immutable for VOD.
Four properties of this design make it a perfect fit for CDNs:
- Segments are immutable once written. A 6-second segment encoded at 1080p is the same bytes forever. Long TTL is safe.
- Manifests are tiny and predictable. A few kilobytes, updated every segment for live streams. Short TTL is cheap.
- Adaptive bitrate fans content out. A single stream becomes five to seven quality variants, each independently cached.
- Origin shielding wins big. A live event with a million concurrent viewers across a thousand POPs would otherwise be a million origin requests every segment. With shielding, it’s one origin fetch per segment.
Low-latency variants (LL-HLS, LL-DASH, and CMAF chunked encoding) push the live edge closer to wall-clock real time, from the traditional 15-to-30-second glass-to-glass latency down to 2-to-5 seconds. They do it by emitting partial segments and using HTTP/2 server push or HTTP/1.1 chunked transfer encoding to stream them as they’re encoded. The CDN has to support holding open a chunked-transfer connection and forwarding partial bytes to viewers as the segment is being written. Not every general-purpose CDN does this well, which is why specialized media-delivery offerings exist:
- Akamai’s Media Delivery.
- Cloudflare Stream.
- AWS Elemental MediaPackage paired with CloudFront.
- Mux, a specialized vendor running on top of multi-CDN.
For a system design question about a live streaming service, the credible architecture is: encoder writes segments and manifests to object storage, CDN in front of object storage with long TTL on segments and short TTL on manifests, origin shielding on to collapse fan-out. If sub-five-second latency matters, add a media-specific CDN tier that supports chunked transfer.
13. Edge compute: code at the POP
Edge compute moves your code from origin to the POP. Instead of “cache the HTML origin produces,” it’s “run code at the POP that produces the HTML.” Three offerings define the space today: Cloudflare Workers, AWS Lambda@Edge and CloudFront Functions, and Fastly Compute@Edge.
The three offerings differ on three axes that matter when picking one: cold-start latency, runtime model (V8 isolates vs. full Lambda vs. WebAssembly), and language support.
- Cloudflare Workers. V8 isolates, sub-millisecond cold start, JavaScript and TypeScript natively, plus anything that compiles to WebAssembly. The tradeoff is tight per-request limits: 50ms CPU on the free tier, more on paid plans, capped memory. Designed for the hot path, not for heavy work.
- Lambda@Edge and CloudFront Functions. Two tiers. CloudFront Functions are lightweight JavaScript with very low latency for header rewriting and routing. Lambda@Edge is full Lambda at a subset of CloudFront’s POPs, with longer cold starts (hundreds of milliseconds) and broader language support. Designed for heavier work, less suited to per-request hot path.
- Compute@Edge (Fastly). WebAssembly via Lucet (now Wasmtime), cold start in microseconds, more languages via the WASM toolchain (Rust, Go via TinyGo, AssemblyScript, more).
What edge compute is good for:
- A/B test variant selection at the edge, with no origin round trip to decide which variant to serve.
- Request and response rewriting: header injection, URL normalization, query parameter sanitization, image-on-the-fly transformation.
- Authentication and authorization, validating JWTs or session tokens at the edge so only authenticated requests reach origin.
- Personalization layered on a cacheable template, where the template is cached and the edge composes user-specific bits.
- Image and video transformation on demand (resize, format conversion and watermarking).
What it’s not good for: anything that needs your primary database on every request. Edge POPs don’t have low-latency access to a Postgres instance in us-east-1. If your code calls origin’s database on every request, you’re back to a full origin round trip and you’ve spent the cold-start budget for nothing.
Cloudflare’s Workers KV, D1, R2, and Durable Objects, and Fastly’s KV store, exist to give edge code something locally fast. They’re useful for edge-shaped workloads (low-cardinality lookups, session state, edge-side feature flags) but they’re not general-purpose databases.
Edge compute belongs between user and cache, not between cache and origin. If your logic depends on origin data every time, you’ve picked the wrong layer.
14. Multi-CDN strategies
Multi-CDN means running your site or service through two or more CDN vendors simultaneously, with traffic either split between them or failed over between them based on health.
Three reasons to do it:
[[3-Topics/Reliability Engineering/Break Things on Purpose — An Introduction to Chaos Engineering/Index|Failover]] for availability. Major CDNs do go down. A few well-known cases:
- Cloudflare’s global outage in July 2019.
- Fastly’s June 2021 outage, which briefly took down a large fraction of the public web.
- Akamai’s July 2021 outages.
Customers on a single CDN went dark when their vendor did. Multi-CDN with health-based failover, typically managed at the DNS layer by a service like NS1, AWS Route 53, or DNS Made Easy, lets you steer traffic to the surviving vendor within seconds.
Regional optimization. No CDN is the fastest everywhere. Akamai is often strongest in some Asian markets where its long-running peering pays off. Cloudflare is often strongest in markets where it’s invested heavily in POP density. CloudFront has the deepest US footprint. Splitting traffic by region, sending Asian users to one vendor and European users to another, can squeeze out the geographic best at each region.
Vendor leverage on price. A multi-CDN deployment makes you a credible defection risk to either vendor. Discount structures usually require traffic commitments, and a vendor that knows you can shift 30% of your traffic to a competitor by changing a DNS record tends to keep prices honest.
Four things it costs:
- Configuration drift. Each CDN has its own config syntax, WAF rule format, cache rule semantics, log format, and feature names. Multi-CDN means maintaining all of them and keeping behavior consistent. The diff between Cloudflare’s WAF and Fastly’s WAF is not zero.
- Certificate management. Both CDNs need your TLS certificate, or you accept vendor-issued certs from each. The latter complicates HSTS pinning, certificate transparency log entries, and any pinning your client apps do.
- Cache fragmentation. Each CDN has its own cache. A cache miss on CDN A is not a hit on CDN B. Effective cache size is one CDN’s worth, not the sum. Splitting traffic 50/50 means each CDN’s cache is warmed by half the traffic, which can lower hit rates more than expected.
- Cost structure. You pay both vendors. Volume discount tiers usually require committed traffic; halving your traffic at each vendor halves your tier qualification.
Active-active 50/50 with real-time steering across two CDNs is what the largest streaming and e-commerce companies actually run, because at that scale the operational overhead pays back. Below that scale, most companies settle on a primary CDN handling 95% or more of traffic with a secondary CDN warmed and ready as failover, switched in by managed DNS based on health checks. The simpler model is usually enough.
15. When not to use a CDN
CDNs aren’t free, and the conventional answer of “always put it behind a CDN” is wrong in a few specific shapes of workload. Worth knowing for the interview question that goes “and when would you not?”
Strong personalization with no cacheable surface. A SaaS dashboard where every page is per-user, every JSON response is per-account, and there are no shared static assets large enough to matter. You still get dynamic acceleration (connection pooling, route optimization) and you still get TLS termination at the edge, but the gains are smaller than the operational cost of adding the layer. The break-even depends on user geography. If users are clustered near origin, the dynamic-acceleration win is small.
Internal services. A backend behind a VPN or service mesh, accessed only by employees, machines, or partners inside your network. Adding a public-facing CDN expands the attack surface without serving more users. Internal API gateways and service meshes are the right answer for this shape, not a public CDN.
Strict data locality requirements. Some regulatory regimes require traffic to stay within specific jurisdictions: China’s data-sovereignty rules, the EU’s evolving data-residency guidance, some healthcare arrangements under HIPAA, some financial regulations under MiFID II or local equivalents. A globally distributed CDN by default routes traffic to whichever POP is closest, which can cross jurisdictions in ways that violate the constraint. Vendor-specific regional offerings exist (Cloudflare’s Regional Services, Akamai’s regional compliance products) that pin traffic to specific countries or regions, but they need to be evaluated case by case and they usually cost more.
Very low traffic. A side project with a hundred visits a day doesn’t exercise any of the CDN’s value propositions. Origin can serve it. The free-tier DDoS protection is still a reason to use one even at low traffic (a single hostile actor can ruin a small site’s day cheaply), but it’s the only reason at that scale.
A CDN solves three concrete problems (latency, bandwidth, origin load) and adds capability beyond pure caching: TLS, security, edge compute, dynamic acceleration. The question isn’t whether to use one. It’s whether your workload exercises enough of those capabilities to justify the operational layer. For most user-facing systems with global users, the answer is yes. For backends, internal services, and tightly regulated workloads, it’s often no, and naming why is more interesting than reciting the default.