Short Polling, Long Polling and WebSocket Protocol
Short polling, long polling, and WebSockets are three ways to get server-side news to a client over HTTP. Short polling has the client ask on a fixed interval; long polling has the client ask once and the server hold the response open until something happens; WebSockets upgrade the connection to a separate protocol and keep it open for bidirectional messaging. They are three points on the same tradeoff surface, not a hierarchy.
1. Server-initiated updates over HTTP
HTTP is shaped for the client to ask and the server to answer. The moment you want the reverse, where the server has news the client did not request, you are pushing against the protocol’s request-response shape. Short polling, long polling, and WebSockets are three different accommodations for that fact, each trading a different mix of latency, connection cost and operational complexity.
Two costs are worth holding in your head as you read on:
- Latency, how long the client waits between an event happening on the server and the client noticing.
- Connection cost, everything that touches a socket: the TCP handshake, the TLS handshake on top of it, the bytes on the wire for each request, the file descriptor and memory the server holds while the connection is open, and the load balancer slot it occupies.
Every technique below pushes latency down by spending more on connections, or pushes connection cost down by tolerating more latency. No third option wins on both.
A third axis matters less often but is worth naming: directionality. Short polling and long polling are both client-initiated even when they carry server-pushed payloads. WebSockets are genuinely bidirectional, which only matters when the client also needs to push events at arbitrary moments rather than in response to user actions. Server-Sent Events sit on this axis too, one-way server-to-client over plain HTTP, and the last section comes back to them.
2. Short polling
Short polling is the client asking the server “anything new?” on a fixed interval, regardless of whether anything has happened. The simplest possible accommodation: nothing about HTTP changes, no connection is held open, and the only design decision is the poll interval.
2.1. The polling loop
On each tick of a client-side timer, the client fires a normal HTTP request and processes whatever comes back. If nothing has changed, the server returns an empty payload or a 304 and the client waits for the next tick. The poll interval is the only knob and it is the entire design.
There is nothing stateful about this on the server side. Each request is independent, can land on any instance behind a load balancer, and looks identical to any other request in your access log. That property is the whole reason short polling refuses to die.
2.2. Wasted requests and average lag
A short interval gives you low latency but high request volume, most of which carries no payload. A long interval cuts request volume but raises the average staleness of what the client sees. The two costs pull in opposite directions. The expected delay between an event and the client noticing is roughly half the poll interval, so a 10-second poll means a 5-second average lag.
Each empty request is not free either. You pay for a TCP handshake (unless the connection is kept alive), a TLS handshake (unless the session is resumed), request and response headers that often dwarf the empty body, a load balancer hop, an authentication check, and a log line. At a few hundred clients polling once a second, you are doing a meaningful amount of work to learn nothing.
The opposite failure is also worth naming. If the event the client is polling for happens in bursts, a fixed interval will batch several events into one response and the client will see them all at once with the staleness of the oldest. That is sometimes acceptable and sometimes the bug you are trying to fix.
2.3. Where short polling wins
Short polling is the right call when the freshness budget is loose compared to the poll interval, and when the operational simplicity is worth more than the wasted requests. Status pages that refresh every thirty seconds, dashboards that update once a minute, background sync that reconciles every five minutes. Anywhere the user does not perceive the lag, and anywhere the infrastructure already terminates HTTP cheaply, short polling wins on cost.
The understated win is that short polling is the only one of the three that survives every kind of intermediary without thought. Corporate proxies, ancient load balancers, serverless platforms with hard request timeouts, CDNs that buffer responses: short polling works through all of them because it’s just HTTP. When you are deploying into an environment you do not control, this matters more than the request volume.
3. Long polling
Long polling inverts short polling’s question. Instead of the client asking repeatedly, the client asks once and the server holds the response open until it actually has something to say. When the server replies, the client immediately opens another request and waits again. The latency cost of empty responses drops to zero, and the cost of held connections goes up to compensate.
3.1. How the hanging request works
The client sends a normal HTTP request. The server, instead of responding immediately, parks the request: it registers a handler, holds the socket, and waits for an event from whatever internal source produces them. When an event arrives, the server writes the response, the client receives it, and the client opens a new request before processing. The cycle repeats.
Two timing edges shape every long-polling implementation. The server needs a maximum hold time, after which it returns an empty response rather than holding the connection forever, because intermediaries on the path (load balancers, reverse proxies, CDNs) will close idle connections at their own thresholds and a server-initiated timeout is cleaner than a 502 from the load balancer. The client needs a reconnect policy for the case where the connection closes unexpectedly, with backoff to avoid hammering a server that just restarted.
3.2. What it costs
The cost moves from request volume to held state. Every connected client occupies a socket, a file descriptor, a chunk of memory for the request context, and a slot in whatever event loop is doing the parking. A server that comfortably handled a thousand short-polling clients per second may struggle to hold ten thousand simultaneously parked long-polling connections, not because the work is harder but because each client now consumes resources continuously rather than briefly.
The reconnect storm is the failure mode that bites people who haven’t seen it before. If a server restarts with ten thousand parked clients, all ten thousand reconnect at roughly the same instant, and the second wave of parking lands on a server that is also handling the first wave’s processing. Without client-side jitter on the reconnect delay, the server can be knocked over repeatedly by its own recovery. Jittered exponential backoff is mandatory.
Long polling is also harder to reason about in logs. A single logical “session” is now a chain of requests with no in-band identifier connecting them, and an event-to-response latency that depends on how long the request was parked before the event fired. Distributed tracing helps if you have it; plain access logs do not.
3.3. When it is the right call
Long polling is the right call when events are infrequent enough that holding connections is cheaper than polling, but the latency budget is tight enough that short polling’s average lag is not acceptable. Notifications, chat presence updates, alerting feeds where a five-second poll would feel sluggish but a held connection per user is affordable.
It is also the right call in environments where WebSockets are not available or not trusted. Some corporate proxies strip the Upgrade header. Some serverless platforms do not support long-lived bidirectional connections at all but do support requests up to a minute or two, which is plenty of hold time for many use cases. Older mobile webviews and restricted runtimes handle HTTP fine and WebSockets poorly. Long polling is the compromise that runs almost everywhere WebSockets do not.
4. WebSockets
WebSockets are a separate protocol that piggybacks on HTTP to get itself established. Once the handshake completes, the TCP connection is repurposed: HTTP framing is abandoned, both sides can send messages at any time, and the connection stays open until someone explicitly closes it. The connection cost is paid once, and the latency cost in both directions is the round-trip time and nothing else.
4.1. The upgrade handshake
The client opens what looks like a normal HTTP request to the server, but with a few extra headers. The important ones are Upgrade: websocket, Connection: Upgrade, and Sec-WebSocket-Key, a base64-encoded random value. The server, if it accepts the upgrade, responds with HTTP status 101 (Switching Protocols), echoes back Sec-WebSocket-Accept (the SHA-1 of the client’s key concatenated with a fixed magic string, also base64-encoded), and from the next byte onward the same TCP connection speaks the WebSocket framing protocol instead of HTTP.
The handshake-over-HTTP design is pragmatic rather than elegant. By looking like an HTTP request, the connection traverses firewalls, proxies, and load balancers that already know how to forward HTTP. By switching protocols after status 101, it escapes HTTP’s request-response shape entirely. The Sec-WebSocket-Accept ritual is there to confirm to the client that the server actually understood the upgrade rather than some HTTP middleware blindly echoing a 101.
After the handshake, the wire format is a binary frame: a few header bytes that describe the payload’s length, type (text, binary, ping, pong, close), and whether the frame is the final one in a logical message. Messages can be fragmented across frames, which matters for streaming but is usually invisible to application code because client libraries reassemble for you.
4.2. What full-duplex actually buys you
The headline feature is that the server can send a message to the client without the client having asked for one in this round-trip. That is what “push” actually means at the protocol level: not a single technique but the absence of the request-response constraint. The same connection also lets the client send messages whenever it wants without paying for a fresh HTTP request and the headers, auth checks, and load balancer hops that come with one.
Two practical wins follow. Latency drops to the network round-trip for both directions, because there is no waiting for the next poll cycle or for a parked request to be completed and replaced. Per-message overhead drops to a few bytes of framing rather than a full HTTP request, which makes high-frequency message streams (live cursors, collaborative edits, gameplay state) viable in a way they are not over polling.
The bidirectional part matters less often than people think. Many real-time apps are server-to-client dominant with the client only occasionally sending control messages, and for those long polling plus a separate POST endpoint for client-to-server messages works fine. The cases where bidirectionality is decisive are the ones where the client needs to push events at arbitrary moments with low latency: collaborative editors, multiplayer games, financial trading clients.
4.3. What it costs
The cost is statefulness, and statefulness has a long tail. Every connected client occupies a persistent TCP connection on the server, with the same per-connection costs long polling has but without the natural breakpoints that requests provide. There’s no obvious moment to swap an instance out for a new one, no obvious moment to apply an updated config, no obvious moment to migrate the connection to a less-loaded server. You build those mechanisms or you live without them.
Load balancers that work fine for HTTP often need configuration changes for WebSockets: longer idle timeouts, sticky session affinity if your application state is per-connection, and protocol-aware health checks. Some platforms (older serverless, certain CDN tiers) do not support WebSockets at all. Some corporate networks strip the Upgrade header and leave you debugging why the handshake works from your laptop but not from the office.
Debugging is harder than HTTP for two structural reasons. The request-response logging that comes free with HTTP does not exist: you have a long-lived connection with messages flowing in both directions, and capturing those requires either application-level logging or a protocol-aware tool. The failure modes are also continuous rather than discrete. Instead of a request that succeeded or failed, you have a connection that may have silently stalled, dropped a message, or fallen into a state where the client and server disagree about whether they are still connected. Heartbeats (ping/pong frames at the protocol level, application-level ping messages above that) are the standard fix and are effectively required in production.
4.4. When it is the right call
WebSockets are the right call when bidirectional, low-latency messaging is the central feature of the product rather than a nice-to-have on a feature. Collaborative editors, live multiplayer, real-time dashboards with user interaction, trading clients, chat. Anywhere the user perceives the lag and the lag budget is sub-second, anywhere the client needs to push as much as it receives, and anywhere the message rate is high enough that per-message HTTP overhead would dominate.
They are also the right call when you have outgrown long polling and the simplicity of HTTP is no longer compensating for the connection-per-event cost. The reverse transition (WebSockets to long polling) is rare because by the time you have built the operational story for one stateful protocol, the marginal cost of a second is small. Pick WebSockets when the design pressure is real rather than anticipated.
5. Choosing between short polling, long polling, and WebSockets
The decision rubric, in the order I actually apply it, is three questions.
First, what is the latency budget? If “noticed within a minute” is fine, short polling wins by default and you should not even read the rest of this list. The simplicity is worth more than any of the cleverness below, and the failure modes are all familiar HTTP failure modes.
Second, does the client need to push at arbitrary times? If yes, WebSockets, and skip the third question. The other two protocols can carry client-initiated messages, but only by paying for a fresh HTTP request each time, which defeats the latency budget that made you consider real-time in the first place.
Third, what does the infrastructure allow? If the answer is “anything”, and the events are infrequent enough that holding connections is cheaper than polling them, long polling is the well-trodden middle ground. If the answer is “anything”, and the events are frequent or the messaging is high-volume, WebSockets. If the answer is “we cannot hold connections” (some serverless platforms, some constrained client runtimes, some hostile proxy environments), you are back to short polling regardless of what you would have preferred.
One more argument for defaulting to the simpler thing. Every step up this ladder costs operational surface area: held connections need timeouts and reconnect policies, WebSockets need heartbeats and sticky sessions and protocol-aware load balancing, and debugging gets progressively less log-friendly. The fact that WebSockets are technically superior at low-latency bidirectional messaging does not mean they are the right call for a notification feed that fires twice an hour. Push back on real-time as a default — many features that look like they need real-time work fine with a five-second poll, and the support burden is dramatically lower.
One technique that is not in the title but deserves a mention: Server-Sent Events (SSE) is a streaming-response protocol where the server keeps an HTTP response open and writes events into it as they happen. It is server-to-client only, which is its limitation and also why it is simpler than WebSockets. For one-way push (notifications, live logs, dashboard updates) where the client never needs to send anything beyond the initial request, SSE is often the right answer and is easier to operate than either long polling or WebSockets. If your problem is one-directional and you reflexively reached for WebSockets, look at SSE first.
The three protocols in the title are not a hierarchy with WebSockets at the top. They are points on a tradeoff surface, and the best choice depends on the latency budget, the directionality, and what your infrastructure will tolerate. Pick the simplest one that meets the budget. Upgrade only when the budget actually changes.