DNS Resolution, End to End
1. What DNS solves
Every network connection ends at an IP address, but nobody wants to type one. DNS, the Domain Name System, is the layer that bridges that gap: it turns a name a person can remember, like example.com, into an address a machine can route to, like 93.184.216.34. Your browser, your mail client and curl all make that translation before a single byte of real traffic moves.
The obvious fix for naming would be one big table. The early ARPANET did exactly that. A single file, HOSTS.TXT, was maintained by hand at the Stanford Research Institute and downloaded by every machine on the network. Fine when the network had a few hundred hosts. It stopped working for the reasons any single shared file eventually does: one team had to approve every change, the download grew without bound, and every copy was stale the moment it landed. A naming system for a network that doubles in size every year cannot be a file.
DNS replaced that file with a system built on three ideas, and the rest of this post is really just those three ideas in detail:
- It is distributed. No single server holds the whole namespace. The data is spread across millions of servers run by different organizations, and no one of them needs to know about the others’ contents.
- It is delegated. Whoever owns a name controls the names beneath it, and can hand any branch to someone else. The operator of
comnever has to know whatexample.comdoes with its subdomains. - It is cached. Answers carry an expiry, so once a name is looked up, the result can be reused for a while without asking again. This is what keeps a system spread across millions of servers fast enough to sit in front of every connection you make.
2. The namespace is a tree
The domain namespace is a tree, and every domain name is a path through that tree read right to left. Take www.example.com. and notice the trailing dot, which is almost always invisible in practice: that dot is the root of the tree, the empty label at the very top. Everything else hangs underneath.
Directly under the root sit the top-level domains, or TLDs: com, org, net, country codes like uk and de, and the newer generic and brand TLDs like dev or google. Under each TLD sit the names people actually register, the second-level domains: example under com. Under a registered domain, the owner can create whatever they want: www, mail, api, and deeper names like eu.api.example.com, nesting as far down as they care to go. Each dot-separated piece is a label, and the full path from a label up to the root is a fully qualified domain name.
2.1. Zones versus domains
A domain and a zone are not the same thing, and the difference is the whole point of DNS. A domain is everything at and below a name in the tree; a zone is the part of a domain that a single party administers as one unit. They line up until delegation splits them apart.
A domain is everything at and below a given name in the tree. The example.com domain includes example.com itself and every name under it, no matter who runs those names.
A zone is the slice of a domain that one party actually administers as a single unit, in one place. The two differ because of delegation. Suppose example.com runs its own records but hands the entire eng.example.com branch to a separate engineering team with its own servers. Now there are two zones: the example.com zone, which stops at the eng boundary, and the eng.example.com zone, which the other team controls. The example.com domain still contains all of it; the example.com zone does not.
That boundary between a parent zone and a child zone is a delegation, and it is recorded explicitly. The parent zone holds NS records that say, in effect, “I do not answer for this branch; these other servers do.” Delegation is how the tree gets split into independently operated pieces, and it is what makes the whole namespace a federation rather than one enormous database.
3. The servers in a lookup
Resolving a name involves three kinds of participant, and keeping their jobs separate is the key to following what happens next.
The stub resolver is the small piece of code inside your operating system that applications call, usually through getaddrinfo. It does almost nothing itself: it knows the address of one resolver to ask, sends a single question there, and trusts whatever comes back. Every program on your machine that opens a network connection leans on this stub.
The recursive resolver does the actual legwork. This is the server your stub talks to: your ISP runs one, or you point at a public one like Google’s 8.8.8.8 or Cloudflare’s 1.1.1.1. Its job is to take a name, walk the tree until it finds the answer, and hand the final result back. It also caches everything it learns along the way, which makes it the single most important caching layer in the system.
The authoritative servers hold the real records, the actual source of truth for some zone. They come in three tiers, each answering for one level of the tree:
- Root servers sit at the top. There are 13 root server identities, named
athroughm, each actually a large fleet of machines sharing one address via anycast routing, where many machines advertise the same IP and the network delivers each query to the closest one. A root server knows one thing: which servers are authoritative for each TLD. - TLD servers answer for a single top-level domain. The
comTLD servers do not know whatexample.comresolves to, but they do know which authoritative servers were delegated theexample.comzone. - Domain-authoritative servers hold the final records for a registered domain. These are run by a registrar, a managed DNS provider like Route 53 or Cloudflare, or by the domain owner directly.
“Recursive” and “authoritative” describe roles, not products. A recursive resolver answers questions about names it does not own by going and finding out. An authoritative server answers only for the zones it was given, and never goes looking elsewhere.
4. Walking one resolution end to end
Here is a single cold lookup of www.example.com, traced hop by hop. “Cold” means nothing is cached anywhere, which is the worst case; in practice almost every lookup is warm and skips most of these steps. Section 6 is about why.
- An application calls
getaddrinfo("www.example.com"). The stub resolver builds one query and sends it to the configured recursive resolver, typically over UDP on port 53. The stub now waits for a single, final answer. - The recursive resolver checks its cache. On a cold cache, nothing is there. It has to start from the only thing it always knows: the addresses of the root servers. It asks a root server for
www.example.com. - The root server does not know
www.example.com, and does not pretend to. What it knows iscom. It returns a referral: the NS records forcomand their addresses. The message is “I cannot answer that, but ask one of these.” - The resolver asks a
comTLD server forwww.example.com. The TLD server does not knowwwweither, but it knows that theexample.comzone was delegated. It returns another referral, this time pointing atexample.com’s authoritative servers. - The resolver asks one of
example.com’s authoritative servers. That server is authoritative for the zone, so this time the response is not a referral but the answer: the A record forwww.example.com. - The resolver caches every record it picked up along the way, then returns the final answer to the stub resolver, which hands it to the application.
This is the distinction the word “recursive” actually names. The stub asked a recursive query: do all the work and give me one final answer. The resolver satisfied it with a chain of iterative queries, where each server it contacts answers with either the answer or a referral, and never does the chasing itself. The recursive resolver is the only participant that walks the tree. Root, TLD, and authoritative servers each answer exactly one question and point elsewhere for the rest, which is why they can stay fast under enormous load.
5. What lives in a zone: record types
A zone is a set of resource records, and every record is the same shape: a name, a type, a TTL, and a value. The type is what makes a record mean something. A zone file makes the structure concrete:
example.com. 3600 IN SOA ns1.example.com. admin.example.com. (...)
example.com. 3600 IN NS ns1.example.com.
example.com. 3600 IN MX 10 mail.example.com.
example.com. 300 IN A 93.184.216.34
www.example.com. 300 IN A 93.184.216.34
blog.example.com. 300 IN CNAME www.example.com.
example.com. 3600 IN TXT "v=spf1 include:_spf.example.com -all"The columns are name, TTL, class (IN for internet, effectively always), type, and value. The records you will meet most often:
| Type | Holds | Notes |
|---|---|---|
A | An IPv4 address | The most common record. |
AAAA | An IPv6 address | Same role as A, for IPv6. |
CNAME | Another name | An alias. The resolver restarts the lookup at the target name. |
NS | Authoritative server names | Marks delegation, both at a zone’s apex and at every child boundary. |
MX | A mail server, with a priority number | Lower priority number is tried first. |
TXT | Arbitrary text | Email authentication (SPF, DKIM) and domain-ownership verification. |
SOA | Zone administrative parameters | One per zone, at the apex; carries serial number and timing values. |
CNAME has two rules that trip people up. A name with a CNAME cannot also carry other record types, and a CNAME cannot sit at a zone apex (the bare example.com with no subdomain). That second rule is a real operational pain, because plenty of people want example.com itself to point at a load balancer’s hostname rather than a fixed IP. The newer records below were designed to solve exactly that.
5.1. Service and policy records: SRV and CAA
Two more record types show up once you move past web and mail.
SRV records locate a service, not just a host. The name follows a _service._protocol.domain convention, and the value carries a hostname plus a port number, along with priority and weight for load distribution. Protocols like SIP, XMPP, and Microsoft Active Directory use SRV so a client can discover both where a service lives and which port it listens on, in one lookup.
CAA records say which certificate authorities are allowed to issue TLS certificates for a domain. A CAA record on example.com naming only one CA means a compliant CA will refuse to issue a certificate for that domain to anyone else. It does not touch DNS resolution at all; it is a policy record that CAs are required to check before issuing, including during the automated issuance covered in this post.
5.2. The HTTPS and SVCB records
The newest records, SVCB and HTTPS, are service binding records, and they exist to fix two long-standing gaps at once.
SVCB (Service Binding) is the general form; HTTPS is the same record specialized for web traffic. The problem they solve: before these records existed, a browser connecting to a site resolved A/AAAA records to get an address, then opened a connection and only then discovered, through protocol negotiation, whether the server spoke HTTP/2 or HTTP/3. An HTTPS record carries that connection metadata in the DNS answer itself: which protocol versions the server supports, ALPN identifiers (the Application-Layer Protocol Negotiation tokens that name those protocols), IP address hints, and keys for Encrypted Client Hello. The client learns how to connect in the same lookup that tells it where to connect, which removes a round trip and makes privacy features like ECH possible.
These records also quietly solve the CNAME-at-apex problem from earlier. An HTTPS record at a zone apex can point example.com at a provider’s hostname, the thing a bare CNAME was never allowed to do. It is the same aliasing behavior people wanted, finally legal at the apex, and standardized rather than left to each DNS provider’s proprietary ALIAS or ANAME extension.
6. Caching and TTL: why the second lookup is free
The cold walk in section 4, root to TLD to authoritative, is the worst case, and almost no real lookup pays it. Caching is the reason, and the dial that controls caching is the TTL.
Every record carries a TTL, a time-to-live measured in seconds. It is a cache lifetime: when a recursive resolver learns that www.example.com has an A record with a TTL of 300, it may answer that question from memory for the next five minutes without contacting any authoritative server. The first lookup of a name pays the full walk; every lookup within the TTL window is a memory read.
Caching happens at every layer of the chain. Your browser keeps a small in-process cache. The OS stub resolver keeps one. The recursive resolver keeps the largest and most valuable cache, because it is shared across every user pointed at it. A popular domain’s records sit warm in a resolver like 8.8.8.8 essentially all the time, refreshed by someone’s lookup just before the TTL expires. Resolvers also do negative caching: an NXDOMAIN response, the answer for a name that does not exist, is cached too, with its lifetime governed by the zone’s SOA record. A mistyped hostname does not trigger a fresh tree walk on every retry.
The cost of all this caching is propagation delay. Change an A record while its TTL is 3600, and resolvers around the world may keep handing out the old address for up to an hour after the change. The familiar phrase “waiting for DNS to propagate” is a misnomer: nothing propagates anywhere. The new value is live the instant you save it. What you are actually waiting for is the old value to expire out of caches that already grabbed it. The practical move when planning a change is to lower the record’s TTL well in advance, to something like 60 seconds, wait for the old TTL to drain, make the change once caches are turning over quickly, confirm it, then raise the TTL back up. The same caching that makes DNS fast is what you have to plan around when DNS data changes.