$ emrebener
home topics reinventing the wheel series building a key-value store from scratch, vol. 3: extending features

Building a Key-Value Store From Scratch, Vol. 3: Extending Features

author: emre bener read time: 23 min about: key-value store, cache repository: https://github.com/Emrebener/Mini-Key-Value-Store
published: updated: mentions: memcached, redis, go, transport layer security, compare-and-swap, authentication

MiniKV is a small Go cache server I’ve been building (also see: vol 1 & vol 2 ). In this volume, I’m filling some gaps and extending general functionality of it. By the end of this volume, MiniKV will be configured by file rather than command-line flag, can terminate TLS at the listener and gate connections behind a shared-secret token when the operator configures either, answers mget and cas, has a stats command and two HTTP probes, and shuts down cleanly under SIGTERM with bounded in-flight work. The seventh gap, persistence, is deferred and remains the obvious next volume.

Note that MiniKV doesn’t ride on HTTP at all. The “TLS” I’m referring to is “s TLS-over-TCP” for the wire protocol. MiniKV speaks the Memcached text protocol over TCP. Apps interact with MiniKV by importing a Memcached client (libmemcached, gomemcache, pymemcache, etc.) and pointing it at the address.

The post also adds the “operational hygiene”: per-connection idle timeouts and a hard cap on concurrent connections. Both close real issues that a process running long enough would eventually hit.

The order of the sections is the order in which constraints chained. Configuration goes first because everything afterwards adds keys to it. TLS and auth go second because the connection edge should have settled before the protocol layer is extended. Multi-key and CAS came third because they’re pure protocol work once the edge is down. Visibility and lifecycle come last because they need everything else in place to demonstrate against.

1. Configuring without a framework

The first move of vol 3 is to read configuration from a file instead of command-line flags, with a parser small enough to keep in the project alongside the wire-protocol parser the series already has. The motivation is the awkwardness and inconvenience of the seven-flag command line:

go run ./cmd/minikv \
  -addr 127.0.0.1:11211 \
  -shards 16 \
  -max-value-bytes 1048576 \
  -max-memory-bytes 67108864 \
  -item-overhead-bytes 64 \
  -cleanup-interval 1m \
  -pprof-addr ""

Forgettable, awkward to copy across hosts, and only going to grow. I had to move to a file-driven config.

1.1. The dependency question

The natural reflex for “Go program needs config” is viper, koanf, or a dedicated TOML / YAML / JSON library. All three would have worked, but they didn’t feel right. So far, MiniKV deliberately added zero non-stdlib dependencies, on the grounds that this is a learning project and every dependency makes some part of it less legible. Pulling in ~200kB of viper and its transitive imports to read seven primitive keys is not what I wanted.

The answer was a custom key=value parser, scoped to the seven primitives MiniKV currently has plus the few it’s about to add. With bufio.Scanner, strings.Cut, and a switch on key name, the whole parser is short enough to read in one sitting.

1.2. The parser

30 ish lines, using standard library only, in internal/config/config.go:

func Load(path string) (Config, error) {
	f, err := os.Open(path)
	if err != nil {
		return Config{}, err
	}
	defer f.Close()

	cfg := Default()
	scanner := bufio.NewScanner(f)
	lineNo := 0
	for scanner.Scan() {
		lineNo++
		line := strings.TrimSpace(scanner.Text())
		if line == "" || strings.HasPrefix(line, "#") {
			continue
		}
		key, value, ok := strings.Cut(line, "=")
		if !ok {
			return Config{}, fmt.Errorf("%s:%d: expected key = value", path, lineNo)
		}
		key = strings.TrimSpace(key)
		value = strings.TrimSpace(value)
		if err := setField(&cfg, key, value); err != nil {
			return Config{}, fmt.Errorf("%s:%d: %w", path, lineNo, err)
		}
	}
	if err := scanner.Err(); err != nil {
		return Config{}, err
	}
	if err := cfg.validate(); err != nil {
		return Config{}, fmt.Errorf("%s: %w", path, err)
	}
	return cfg, nil
}

Really primitive work, but there’s a stack of sayings to make me feel good about it.

setField is a switch on key name that writes into the Config struct, parses numerics with strconv.Atoi, and parses durations with time.ParseDuration. It returns an error naming the offending key when a value doesn’t parse, and an unknown key %q error for any key that isn’t in the recognized set.

The config file looks something like:

# minikv config
addr                = 0.0.0.0:11211
pprof-addr          =
shards              = 16
max-value-bytes     = 1048576
max-memory-bytes    = 67108864
item-overhead-bytes = 64
cleanup-interval    = 1m

Whitespace around the key, the =, and the value is trimmed. Lines starting with # are comments. A line with no = is a parse error with the line number. There’s no escaping, no nesting, and no quoting because every value MiniKV currently needs is a primitive: a TCP address, an integer, or a Go duration.

1.3. Defaults and validation

Load starts from Default() and only overwrites the keys the file actually sets. Anything missing keeps its default. The shipped minikv.conf spells out every key so an operator skimming it sees the whole surface at once, but a one-liner that sets only pprof-addr works fine too — the rest falls through.

Validation runs at the end of Load, not in main. Rules like “shards must be positive” or “max-value-bytes can’t exceed max-memory-bytes” are config-package business; main shouldn’t have to know about them. I could have split this into a Parse and a Validate, but that’s two error paths nobody would ever take separately. One function, one error.

The same instinct shows up in unknown keys. The parser rejects them rather than silently ignoring:

$ go run ./cmd/minikv -config typo.conf
ERROR failed to load config error="typo.conf:5: unknown key \"max-memmory-bytes\""

A typo in max-memmory-bytes silently degrading to the default would be the worst class of config bug, the kind that ships to production and surfaces months later as an OOM. The line number lets an operator find it in a one-character edit.

1.4. The Docker integration

The vol 2 compose stack passed flags inline:

minikv:
  command:
    - -addr
    - 0.0.0.0:11211
    - -pprof-addr
    - 0.0.0.0:6060
    - -max-memory-bytes
    - "67108864"

That had to adapt to the new shape. The Dockerfile copies the repo’s minikv.conf into the image at /minikv.conf, so docker run mini-kv-store works out of the box with the same defaults the binary uses on the host. It just assumes a default location. The compose stack mounts a benchmark-flavoured config in to override it:

minikv:
  build: .
  image: mini-kv-store:latest
  volumes:
    - ./docker/minikv.bench.conf:/minikv.conf:ro
  ports:
    - "11211:11211"
    - "16060:6060"

The benchmark config only lists keys that differ from the in-image default (pprof on, larger memory budget). Everything else falls through. The brief alternative was to bake pprof-addr = 0.0.0.0:6060 into the published image’s default config so the compose stack didn’t need a mount, but pprof-on by default is information disclosure — heap profiles, goroutine stacks, the whole /debug/pprof/ tree on a port. Off in the image, on in the bench config, is the right side of that tradeoff.

With the configuration surface settled, the next section can extend it. TLS adds two file paths, auth adds a token, and the file format already accepts them.

2. TLS and auth

Currently, anyone reaching the listening port can read or write any key. Two missing pieces: an optional TLS at the listener, and an optional shared-secret token gating each new connection. I paired them because they both live at the connection edge.

2.1. The TLS gate

We need 2 new config files:

tls-cert = /etc/minikv/server.crt
tls-key  = /etc/minikv/server.key

If both are set, MiniKV wraps the listener in TLS. If both are empty, the listener stays plain TCP. Setting only one is rejected at startup:

if (c.TLSCert == "") != (c.TLSKey == "") {
    return errors.New("tls-cert and tls-key must be set together")
}

I’ve considered adding a separate tls = on boolean but decided it was unnecessary. The presence of two file paths is itself the switch; a third state for “TLS configured but switched off” is a knob nobody reaches for and a fourth combination to think about.

The listener itself is a 5-line branch in cmd/minikv/main.go:

func listen(cfg config.Config) (net.Listener, string, error) {
    if !cfg.TLSEnabled() {
        l, err := net.Listen("tcp", cfg.Addr)
        return l, "tcp", err
    }
    cert, err := tls.LoadX509KeyPair(cfg.TLSCert, cfg.TLSKey)
    if err != nil {
        return nil, "", fmt.Errorf("load tls keypair: %w", err)
    }
    tlsCfg := &tls.Config{
        Certificates: []tls.Certificate{cert},
        MinVersion:   tls.VersionTLS12,
    }
    l, err := tls.Listen("tcp", cfg.Addr, tlsCfg)
    return l, "tls", err
}

The accept loop on the other side of listen doesn’t change. net.Listener is the interface the rest of the server already wanted, and tls.Listen returns one. Setting MinVersion: tls.VersionTLS12 is the one piece of “TLS hygiene” worth the line. Go 1.18 made TLS 1.2 the server-side default, but pinning the floor explicitly future-proofs against a stdlib regression and documents the requirement at the call site.

2.2. AUTH on the wire

The auth gate is a new first-class command in the protocol parser:

AUTH <token>\r\n

The parser produces a Command{Op: OpAuth, Token: ...}; the server side decides what to do with it. The per-connection state is one bool, initialized from the configured token:

authenticated := s.authToken == ""

The 5 cases:

Server configFirst commandResponseConnection
no auth-tokenanynormal responsestays open
no auth-tokenauth ...CLIENT_ERROR auth not configuredclosed
auth-token setauth <correct>AUTHENTICATEDstays open
auth-token setauth <wrong>CLIENT_ERROR auth failedclosed
auth-token setanything elseCLIENT_ERROR auth requiredclosed

Token comparison goes through subtle.ConstantTimeCompare:

func constantTimeEqual(got, want string) bool {
    return subtle.ConstantTimeCompare([]byte(got), []byte(want)) == 1
}

The argument for skipping constant-time compare on a short token is that timing leaks won’t survive network jitter, and on most deployments that’s probably true. The argument for using it anyway is that subtle.ConstantTimeCompare costs a handful of nanoseconds per failed auth, removes a class of footgun that nobody should have to re-litigate per-deployment, and means the auth path has one less thing to be careful about when somebody adds the next feature.

Returning the verb AUTHENTICATED instead of the generic OK matches MiniKV’s existing response shape: STORED, DELETED, PONG, END are all specific to their command. A generic OK would be the only response in the protocol that doesn’t say what just happened.

2.3. The Server struct

Vols 1 and 2 had a free function server.ServeConn(conn, kv) and a free function server.Serve(input, output, kv). Both were enough when “the server” was just “the kv plus a parser.” Adding auth needs per-process state (the configured token) and per-connection state (authenticated or not), and the next two sections each want to add more: counters for stats, a semaphore for max connections. A struct is the obvious shape, and refactoring to it now is cheaper than threading config through more arguments later.

type Server struct {
    store     *store.Store
    authToken string
}

func New(kv *store.Store) *Server { return &Server{store: kv} }

func (s *Server) WithAuthToken(token string) *Server {
    s.authToken = token
    return s
}

func (s *Server) ServeConn(conn net.Conn) error { ... }

WithAuthToken follows the builder-style pattern the Go standard library uses for http.Request.WithContext and *tls.Config.Clone: a method that returns the receiver type so calls chain naturally. In main:

srv := server.New(kv).WithAuthToken(cfg.AuthToken)

When cfg.AuthToken is empty, the chain is a no-op and the server behaves identically to the vol 2 free function. Existing callers don’t change behavior; new behavior is opt-in.

The migration in cmd/minikv/main.go was a single line. The migration in the unit tests was the same shape: Serve(input, output, kv) becomes New(kv).Serve(input, output). The struct doesn’t pay for itself yet, but the next two sections do most of their work on it without further restructuring.

3. Two protocol extensions: mget and cas

With the connection edge closed, the protocol layer is next. mget is small. cas looks small at the wire, then turns out to need real bookkeeping at the store.

3.1. mget: a server-side loop

Multi-key GET is a single new command that wraps the existing single-key Get:

case protocol.OpMget:
    for _, key := range command.Keys {
        item, ok := kv.Get(key)
        if !ok {
            continue
        }
        if err := writeLine(writer, fmt.Sprintf("VALUE %s %d", key, len(item.Value))); err != nil {
            return err
        }
        if _, err := writer.Write(item.Value); err != nil {
            return err
        }
        if _, err := writer.WriteString("\r\n"); err != nil {
            return err
        }
    }
    return writeLine(writer, "END")

No store-level changes. The keys can be on any subset of shards; each Get call grabs that shard’s lock for the duration of one lookup, releases it, and the loop moves on. Missing or expired keys are silently skipped; the response is one terminator at the end:

mget a b c\r\n
VALUE a 1\r\n
A\r\n
VALUE c 1\r\n
C\r\n
END\r\n

b is absent from the response. The server skips it and the response goes from a straight to c, with one END closing the whole batch.

Two design choices worth naming.

The first is the single-END terminator, no per-miss markers. Memcached and Redis both treat absence as silence, and so do MiniKV’s clients-by-construction (every client knows which keys it asked for; the response has the present ones in order). Adding a MISS <key> line per missing key would double the response size without the client doing anything different with it.

The second is the absence of a store-level batch API. The natural alternative would have been a Store.GetMulti(keys []string) method that locks all relevant shards once and returns everything atomically. I didn’t go that route, for two reasons. First, atomic-across-keys isn’t a guarantee Memcached’s get makes either, so MiniKV has no obligation to invent it. Second, acquiring N locks in a defined order to avoid deadlock is the kind of feature that pays back only in workloads MiniKV doesn’t target. The loop is the right call.

3.2. CAS at the wire: gets and cas

CAS (compare-and-swap, or in Memcached’s exact phrasing, check-and-set) gives clients optimistic concurrency: read a value with its version token, do a computation, write back only if the value hasn’t changed in the meantime.

Two new commands. gets is get plus a CAS token in the response; it accepts the same multi-key shape as mget:

gets k\r\n
VALUE k 1 7\r\n          ← bytes is 1, CAS version is 7
v\r\n
END\r\n

cas is set plus a CAS version in the request; it succeeds only when the supplied version equals the value’s current version:

cas k 7 5\r\n            ← write to k expecting version 7, 5 bytes follow
hello\r\n
STORED\r\n               ← match → write succeeded, CAS bumps to 8

Three responses. STORED (matched, written), EXISTS (key present, version differs), NOT_FOUND (key absent or expired). The TTL token is optional in the same place set’s TTL is optional; with it, the parser sees five fields, without it, four.

The wire shape of cas deliberately puts the CAS version where it would be most readable for a human watching the protocol over nc:

cas <key> [<ttl-seconds>] <cas-version> <bytes>\r\n

The TTL is in the same slot it occupies in set, the version is right before the byte count, and the byte count is in the position the parser is already reading.

ClientServergets kVALUE k 1 7 (version = 7)client computes new valueusing observed version 7cas k 7 5 + payloadserver checks versionagainst stored CASSTORED (CAS bumps to 8)other outcomes:EXISTS — version differsNOT_FOUND — key goneClientServergets kVALUE k 1 7 (version = 7)client computes new valueusing observed version 7cas k 7 5 + payloadserver checks versionagainst stored CASSTORED (CAS bumps to 8)other outcomes:EXISTS — version differsNOT_FOUND — key gone

3.3. The CAS counter at the store

CAS at the store level is one atomic counter and a stamp on every successful mutation. The interesting question is what kind of counter, and the obvious-looking choice has a sharp edge.

The token is a uint64 stamped on every successful mutation, monotonically increasing for the lifetime of the process. Set, Cas, and Incr all bump it; Delete doesn’t (the item is gone, no token to keep). The counter lives on the Store as an atomic.Uint64:

type Store struct {
    shards []*shard
    cas    atomic.Uint64
}

Each shard carries a *atomic.Uint64 pointing at the same counter:

expiry := expiresAt(now, ttl)
newCAS := sh.casCounter.Add(1)
if existing, ok := sh.items[key]; ok {
    existing.value = value
    existing.expiresAt = expiry
    existing.size = size
    existing.cas = newCAS
    ...
}

The counter is process-global, not per-key, and that choice is the load-bearing one. The natural-looking alternative is a per-key version: each item carries its own version number that starts at 1 on creation and bumps on each write. No global atomic, no cross-goroutine contention, looks elegant on paper.

It has a nasty failure mode. A client reads a key and observes version 1. The key gets deleted. A different client writes the same key fresh; the new item is also at version 1. The first client’s CAS with the old version 1 token now matches a value it has never seen. CAS no longer means what its name says.

A process-global counter sidesteps this entirely. The token uniquely identifies one specific value at one specific moment; once a token is minted, it’s never reused, even if the key is deleted and recreated. Memcached’s CAS-token uniqueness story is built around exactly this property. A per-shard counter packed into the token (one atomic per shard, shard ID in the high bits) is a viable middle path that preserves the never-reused property while spreading contention across N atomics, but the contention savings on a single uint64.Add is too small at MiniKV’s scale to justify the extra packing logic.

The performance cost of the process-global counter is one atomic increment per successful mutation. Vol 2‘s benchmarks topped out around 215k writes/s at concurrency 8. An uncontended LOCK XADD on x86 takes roughly 10ns; under contention from eight goroutines, the rough estimate climbs to 30-50ns. That’s an order-of-magnitude figure, not a measurement. Against the few hundred nanoseconds a write through a sharded mutex actually takes, the atomic is a single-digit-percent overhead. If a future bench shows it dominating, the per-shard counter is the next move, not the per-key one. The per-key shape stays wrong regardless.

3. When CAS hits eviction (memory pressure)

One subtle case: what does cas do when the version matches but the rewrite would push the shard over its memory budget? The shard’s existing eviction logic kicks in. The supplied version matched, so the value at this version is what’s about to be replaced; the eviction routine first removes expired items, then evicts least-recently-used items other than the key being written. If even after evicting everything else the new value won’t fit, cas returns SERVER_ERROR memory limit exceeded and the original value stays intact at its original version.

projected := sh.memoryBytes - existing.size + size
if projected > sh.maxMemoryBytes {
    if sh.ttlCount > 0 {
        sh.removeExpiredLocked(now)
        existing, ok = sh.items[key]
        if !ok {
            return ErrNotFound
        }
        projected = sh.memoryBytes - existing.size + size
    }
    if projected > sh.maxMemoryBytes {
        if !sh.evictUntilFitsLocked(key, projected) {
            return ErrMemoryLimitExceeded
        }
        projected = sh.memoryBytes - existing.size + size
    }
}

The lookup-after-cleanup is necessary because removeExpiredLocked could have removed the very key being CAS’d, in which case the response is NOT_FOUND rather than EXISTS or STORED. That cascade matches set’s, which keeps both the bookkeeping and the prose explainable.

With multi-key reads and optimistic-concurrency writes in place, the protocol is the largest it will be in this volume. The next section stops adding commands and starts reporting on the ones that already exist.

4. Observability (stats and probes)

Vol 2 §6 put it bluntly: “MiniKV has slog and pprof. That is the entire surface.” This section adds three more pieces. STATS over the wire is the in-protocol view. /healthz is the does-it-respond probe, and /doctor is the deeper diagnostic. The split exists because each answers a different question, and conflating them produces a status board that’s loud about the wrong things.

4.1. The counters

The instinct when adding metrics to a server is to add too many — request body size histograms, p99 latencies per command, allocations per second. MiniKV starts with the smallest set that lets an operator answer the four questions a cache operator actually asks:

  1. Is the server up and processing commands?
  2. Is the working set fitting in memory?
  3. Is anything getting evicted faster than expected?
  4. What is this client actually doing?

The counters that follow from those questions:

QuestionCounters
up and processing?uptime_seconds, connections_opened, connections_active
memory fit?items, memory_bytes, max_memory_bytes, expirations
eviction surprise?evictions
client traffic shape?cmd_get, cmd_set, cmd_delete, cmd_incr, cmd_mget, cmd_gets, cmd_cas, cmd_stats, cmd_auth, cmd_ping
problems?client_errors, server_errors, auth_failures

All counters are monotonic for the lifetime of the process. None of them reset when STATS is called. Memcached works the same way: the rate-of-change is what a dashboard subtracts when it cares about a window, the kind of pattern standard observability tooling expects from any service it scrapes.

The counters live in two places. Server-level counters (commands, connections, errors) are atomic.Uint64 fields on the Server struct, bumped from the connection goroutines:

case protocol.OpGet:
    s.cmdGet.Add(1)

Store-level counters (items, memory, evictions, expirations) live on the shards and get aggregated on demand by Store.Stats(). The eviction and expiration counters are atomic.Uint64 per shard, bumped under the shard mutex when items get removed but read without locking. That’s a deliberate best-effort: a Stats() call during heavy traffic can see slightly inconsistent numbers across shards, and that’s fine because nobody is making correctness decisions from STATS output.

What’s deliberately missing: per-byte counters (bytes_read, bytes_written). Adding them would mean wrapping every bufio.Reader.ReadByte and every bufio.Writer.Write to thread through a counter, or interposing a counted Conn at the listener. The cost is real and the value is small. The byte counters that operators actually act on come from the load balancer or service mesh in front of the cache, not from inside the cache itself.

4.2. STATS over TCP

The wire format mirrors Memcached’s: one STAT <name> <value>\r\n line per counter, terminated by END\r\n.

stats\r\n
STAT uptime_seconds 142\r\n
STAT connections_opened 17\r\n
STAT connections_active 3\r\n
STAT cmd_get 9876\r\n
STAT cmd_set 2104\r\n
... (counters elided)
STAT items 1024\r\n
STAT memory_bytes 2097152\r\n
STAT max_memory_bytes 67108864\r\n
STAT evictions 0\r\n
STAT expirations 12\r\n
STAT shards 16\r\n
END\r\n

Same shape as Memcached’s stats because clients with Memcached parsers (libmemcached, the Prometheus memcached_exporter) will read MiniKV’s STATS without modification. The names are MiniKV-specific: Memcached uses cmd_get and cmd_set exactly, and MiniKV adds cmd_mget, cmd_gets, cmd_cas for its protocol extensions, plus cmd_stats to count requests against itself. A client that doesn’t recognize a counter name skips its line, so the wire format tolerates new STAT names without breaking older parsers.

STATS is gated by the auth gate from §2 but not by a special role. Anyone authenticated can read counters. A real operations surface often wants a separate “operator” role with read-only access to STATS and write access to nothing else; that’s authorization, which is out of scope for this volume.

4.3. /healthz and /doctor

The two HTTP probes split a question that often gets conflated. /healthz is “the process is responsive”. If the listener accepted the request and the handler ran, the answer is 200 OK. No internal checks were implemented.

func healthzHandler(w http.ResponseWriter, _ *http.Request) {
    w.Header().Set("Content-Type", "text/plain; charset=utf-8")
    w.WriteHeader(http.StatusOK)
    _, _ = w.Write([]byte("ok\n"))
}

That’s the entire handler. Use it for liveness probes: Kubernetes’ livenessProbe, an HAProxy option httpchk line, the load balancer’s “is this instance up” check. If /healthz returns 200, the process is responding to HTTP, full stop. If it doesn’t, the process is gone or wedged and should be restarted.

/doctor is the deeper diagnostic, and it does work. It snapshots the server’s stats, runs a list of named checks, and returns 200 with one line per check if everything is in green range, 503 with the failures listed first if anything tripped. The current checks:

  • memory_pressure: green when memory_bytes / max_memory_bytes < 0.95. Above that, the cache is one big write away from triggering eviction on every Set, which is a quiet way to lose latency.
  • shard_balance: green when max_items / min_items < 4. The FNV-1a hash MiniKV uses is uniform enough that anything past 4× at a meaningful scale is a real signal — usually a workload that hashes badly against the shard count, occasionally a hot single-key hammer. Suppressed when total items < 100 because the ratio is dominated by hash noise at small counts.
  • uptime: informational, never trips.

A /doctor body during normal operation:

OK   memory_pressure: ok (12% of budget)
OK   shard_balance: ok (max/min ratio 1.3x)
OK   uptime: 4h17m

A /doctor body when the workload is hot-keyed:

FAIL shard_balance: imbalanced (max/min ratio 7.2x)
OK   memory_pressure: ok (43% of budget)
OK   uptime: 17m

/doctor is meant to surface configuration mismatches early. It’s a tool for the operator who has just set up MiniKV in a new environment and wants to confirm the workload looks sane, not a continuous health board. A monitoring system that polls /doctor every 30 seconds will mostly see green; the value is that the one time it goes red, the failure has a name.

4.4. The operational listener

The HTTP probes share a listener with /debug/pprof/. Through vols 1 and 2, that listener was created via net/http/pprof’s side-effect import, which registers handlers on http.DefaultServeMux. That works for one set of routes; adding more without contaminating the global mux is the canonical Go pattern of building a private mux and registering pprof’s handlers explicitly:

mux := http.NewServeMux()
mux.HandleFunc("/healthz", healthzHandler)
mux.HandleFunc("/doctor", doctorHandler(srv))
mux.HandleFunc("/debug/pprof/", pprof.Index)
mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
mux.HandleFunc("/debug/pprof/trace", pprof.Trace)

httpSrv := &http.Server{Addr: addr, Handler: mux}

Two changes from before. The blank import becomes a real one (the routes need to be referenced as values, not registered as side effects), and http.Server gets an explicit Handler instead of falling through to http.DefaultServeMux. The DefaultServeMux is now empty for this process, which is the correct behavior for any program with a public HTTP surface.

The config key for this listener stayed pprof-addr. A more accurate name today would be ops-addr or http-addr, but renaming would break vol 1 and vol 2’s published references to pprof-addr in their config tables, and the cost of a slightly imprecise name is smaller than the cost of changing a flag and apologizing for it. The behavior of the key didn’t change either: empty disables the listener entirely; non-empty binds it.

What this section deliberately doesn’t add: per-key stats (Memcached’s stats items and stats slabs), a slowlog (Redis’ SLOWLOG), or a per-shard breakdown over the TCP STATS surface. The first is a different exercise in observability cost (per-key counters are a memory tax that scales with the working set). The other two are useful but go past the frame here, which is “the smallest surface that lets an operator confirm the basics.”

What MiniKV has now: an answer to the cache operator’s four questions, on three different surfaces appropriate to who’s asking. The next section changes nothing about the surface and a great deal about how the process behaves at the edges of its life.

5. Process vs. service

A process accepts work, runs until it crashes, and at some point stops. A service does the same things, but it also has answers for the boundary conditions that processes ignore: clients that hang up halfway through a command, clients that arrive faster than the server can serve them, and the moment the operator sends SIGTERM. Each is a different shape of saturation, and each gets its own defense. One timeout knob across all three would be a category error.

The three defenses, with the saturation class each closes:

Saturation classDefenseNew config key
One client holds a socket open and stops talkingper-connection idle timeoutidle-timeout
Many clients arrive at oncecounted-semaphore max-connectionsmax-connections
The process is asked to terminategraceful shutdown with bounded drainshutdown-timeout

5.1. Idle timeout: deadlines on every read

Idle timeout sets a per-connection read deadline before each command. The simplest of the three defenses to implement, and it closes two saturation classes for the price of one because the TLS handshake is also a read. Before each ReadCommand:

for {
    if s.idleTimeout > 0 {
        _ = conn.SetReadDeadline(time.Now().Add(s.idleTimeout))
    }
    command, err := parser.ReadCommand()
    ...
}

If the read returns a net.Error whose Timeout() is true, the connection has been idle past the configured window. The handler closes it quietly:

var netErr net.Error
if errors.As(err, &netErr) && netErr.Timeout() {
    return nil
}

Closing on idle reaps connections that legitimate clients forgot about (a process exited mid-request, a network partition stranded a TCP socket, a load balancer’s keep-alive pool shrank) and it bounds the resource footprint of the slow-loris case where a hostile client opens a connection and never sends a command. Memcached has no idle timeout by default and many deployments suffer for it. MiniKV’s default is five minutes, generous enough not to surprise legitimate clients, tight enough to reap the dead ones.

The implementation is one SetReadDeadline per command line. There’s no separate slow-loris timer, no half-open connection sweeper. The deadline mechanism the standard library already exposes does the work.

The TLS handshake closes through the same mechanism. A tls.Conn’s handshake happens inside the first Read against it, so SetReadDeadline set before ReadCommand bounds the handshake too. A hostile client that opens a TLS connection and dribbles handshake bytes gets evicted at the same window as one that completes the handshake and stops talking. One mechanism, two problems closed.

The corresponding write deadline is deliberately absent. Writes from MiniKV are bounded in size (responses are short) and sit in the kernel’s TCP send buffer; a stalled write means the OS’s TCP keepalive will eventually surface a dead client, and the cost of getting that wrong is small compared to the cost of getting read deadlines wrong on slow-loris reads. The threat model is on reads.

5.2. Max connections: reject, don’t queue

The second defense is a counted semaphore. The Server holds a buffered channel of N slots, and ServeConn does a non-blocking acquire before anything else:

func (s *Server) acquireSlot() bool {
    if s.semaphore == nil {
        return true
    }
    select {
    case s.semaphore <- struct{}{}:
        return true
    default:
        return false
    }
}

On failure, the connection gets a one-line response and a close:

if !s.acquireSlot() {
    s.connectionsRejected.Add(1)
    _, _ = conn.Write([]byte("SERVER_ERROR max connections reached\r\n"))
    _ = conn.Close()
    return nil
}
defer s.releaseSlot()

Reject, don’t queue. The kernel’s TCP accept backlog (SOMAXCONN, typically 4096 on Linux) already buffers pending connections. A userland queue on top is a second queue between the same two endpoints, with its own timeout policy and its own way to silently drop work. One queue, at the right layer, beats two queues stacked.

The rejection writes a structured error before closing, rather than dropping the TCP connection on the floor like Memcached does on -c saturation. The cost is one extra Write per refused connection (and a finished TLS handshake when TLS is on), but the alternative is legitimate clients seeing an inscrutable EOF and guessing whether they were refused, throttled, or partitioned.

The default cap is 1024, matching Memcached’s -c; 0 disables it. A small cap is a real operational tool: max-connections = 64 says, in advance, that the orchestrator should scale horizontally rather than vertically.

5.3. Graceful shutdown: deadlines, then force

The third defense lives at the other end of the connection’s life. When the operator sends SIGTERM, MiniKV needs to:

  1. Stop accepting new connections.
  2. Let in-flight commands finish.
  3. Disconnect connections that are merely idle.
  4. Force-close anything left after a deadline.

(3) is the tricky one: a connection between commands has no in-flight work and can close cleanly, but a connection mid-command needs to finish so the response reaches the client. An immediate read deadline on every active connection splits the difference:

func (s *Server) Shutdown(ctx context.Context) error {
    s.connsMu.Lock()
    for c := range s.conns {
        _ = c.SetReadDeadline(time.Now())
    }
    s.connsMu.Unlock()

    done := make(chan struct{})
    go func() {
        s.wg.Wait()
        close(done)
    }()

    select {
    case <-done:
        return nil
    case <-ctx.Done():
        s.connsMu.Lock()
        for c := range s.conns {
            _ = c.Close()
        }
        s.connsMu.Unlock()
        <-done
        return ctx.Err()
    }
}

Connections waiting on ReadCommand see the deadline fire and exit through the idle-timeout path. Connections mid-execute finish writing the response, loop back to the next ReadCommand, see the expired deadline, and exit. The semantic is “in-flight commands complete; everything else stops.”

If ctx expires before the drain finishes, every still-active connection gets a hard Close; the forced closes surface as I/O errors and the handlers exit through the same path. The trailing <-done ensures Shutdown only returns once every handler goroutine has actually exited.

The connection registry behind this is a map[net.Conn]struct{} on the Server, guarded by a mutex; ServeConn registers on entry and deregisters on exit. Iteration is rare enough that a plain mutex beats sync.Map on simplicity.

SIGTERMstop accepting new connectionsSetReadDeadline(now) on every activeconnwg.Wait() racing ctx.Done()drain donebefore ctx?return nil (clean exit)conn.Close() on remainingconns, wait for handlers,return ctx.Err()yesno (ctx expired)SIGTERMstop accepting new connectionsSetReadDeadline(now) on every activeconnwg.Wait() racing ctx.Done()drain donebefore ctx?return nil (clean exit)conn.Close() on remainingconns, wait for handlers,return ctx.Err()yesno (ctx expired)

5.4 Operator-facing changes

A SIGTERM with the new code path now produces a quiet log:

time=... level=INFO msg="shutdown initiated" timeout=10s
time=... level=INFO msg="shutdown clean"

instead of “the process exited and we hope the connections recovered.” A connection cap log line shows up in stats as STAT connections_rejected <n>, so an operator dashboarding the server can see saturation as a counter rather than as TCP errors at the load balancer. Idle connections drop on a defined window rather than at the kernel’s whim.

This volume closed the gap between “a process running a cache” and “a service that runs a cache and gets stopped, started, observed, and authenticated”.

What still isn’t there: replication, clustering, pipelining, pub/sub, transactions, and the whole shape of multi-process orchestration. None of those would change the lessons MiniKV exists to demonstrate; all of them would multiply the volume count past what any one series can sensibly produce. The point of the series was to make the hard parts visible. After three volumes, the hard parts are visible. The easy parts turn out to be surprisingly numerous too, even in a project this small.