1. Where typical agent memory breaks

Most agent runtimes give you two memory regions and both are bad. Short-term memory is “stuff the entire transcript into the context window and pray it fits.” Long-term memory is “chunk past conversations, embed them, drop the embeddings into a vector store, top-k at recall time.” Each fails in a different way, and the failures compound on long-running tasks.

Short-term blows up on tool output. A single search result, a file read, a stack trace from a failed build — any of these can run to tens of thousands of tokens. After a few dozen tool calls the window is full, the older content is gone, and the agent loses access to the very thing it just learned. The usual fix is lossy summarization: an LLM rewrites the transcript into a shorter paragraph and replaces the originals. Cheap, but un-auditable. If recall later goes wrong, there’s no path back to what actually happened.

Long-term has the opposite failure. Chunked vector recall returns disconnected fragments ranked by cosine similarity, with no structural map of what’s being remembered. Two facts that belong to the same project, the same user, the same workflow scenario end up in unrelated slots in the index. At recall time the agent gets a bag of nearest-neighbor sentences and is supposed to reason over them — but it doesn’t know which neighborhood it’s in.

TDAI (TencentDB Agent Memory) addresses both regions with a layered long-term store and a pointer-based short-term offload. I’ve been running it under OpenClaw and Hermes for a while and the design choices it makes are worth pulling out, separately from whether the library itself is the one you should pick up.

2. Layered long-term memory, L0 to L3

TDAI’s long-term memory is a four-layer pyramid (L0 to L3) where each layer is derived from the one below and stored in a medium that fits its role.

L0 is the raw conversation log: every turn, every tool call, every result, persisted verbatim. L1 is atomic memory: an extractor scans recent L0 turns and pulls out individual facts (“the user prefers Postgres for new services”, “the deployment target is Cloudflare Pages”). L2 clusters those atoms into scene blocks grouped by project, topic, or workflow. L3 is the persona: the stable profile of preferences and habits that holds across scenes. Higher layers carry orientation; lower layers carry evidence.

Two design choices matter more than the layer count. First, the storage is heterogeneous. The bottom layers live in SQLite, with a vector index via sqlite-vec, where keyword and embedding recall both work well. The top layers are plain Markdown files on disk: persona.md, scene blocks per cluster, all human-readable. Second, the path between layers is preserved. Every atomic memory carries a result_ref back to the L0 turn that produced it, every scene block lists the atoms it summarized and every persona claim traces back to scenes. Recall never has to choose between “summarized form” and “evidence.” You get the summary by default, and can drill down if it surprises you.

The visible payoff in daily use isn’t the pyramid shape on its own. It’s that the high layers get auto-injected into context at the start of each turn. The agent doesn’t decide when to call a memory lookup; the relevant persona and scene material is already there. Compare that to the default long-term memory in either OpenClaw or Hermes, which expects the agent to invoke a search tool and frequently won’t, because the agent doesn’t know what it doesn’t know. The agent acts like it remembers.

3. Short-term offload with a Mermaid map

The short-term half makes one specific bet: most of what’s eating your context window is verbose tool output the agent has already digested, and you’d rather replace it with a pointer than a summary.

When offload is enabled, TDAI watches the context-window fill ratio. At the mild trigger (default 50% of the window) it starts moving the verbose middles of past tool calls out to refs/*.md on disk. At the aggressive trigger (default 85%) it compresses harder. What replaces the moved content is a node in a Mermaid graph that names what happened and carries a node_id — not a paragraph of prose. The agent reads the graph as its working map of the task; if it needs the actual bytes from any step, it greps the node_id and gets the original file back.

The Mermaid representation looks like this, lifted from the project README:

graph LR
    Log["Verbose Logs<br/>(hundreds of thousands of tokens)"] -->|"1. Offload full text"| FS[("External FS<br/>(refs/*.md)")]
    Log -->|"2. Extract relations"| MMD["Mermaid Canvas<br/>(with node_id)"]
    MMD -->|"3. Light injection"| Agent(("Agent Context<br/>(a few hundred tokens)"))
    Agent -. "4. Recall via node_id" .-> FS

The reason this works better than summarization on long-horizon tasks is that a pointer is lossless in a way a paragraph isn’t. A summary commits to an interpretation of the original; a node_id defers it. If the agent later needs to re-read the original error message, the original error message is still there, character-for-character. The compression is structural rather than semantic, so nothing’s been thrown away, only moved.

Of the two pillars this is the one TDAI gets most right. On long-running tasks the context stays manageable.

4. The curation gap

TDAI solves a compression problem well; it does not solve a curation problem, and curation is the harder half.

Compression asks: given that the agent has more context than fits, how do we keep what matters without losing access to the rest? Curation asks the earlier question: of everything the agent has seen, what was worth remembering at all? TDAI is up-front about being a memory layer rather than a memory curator, but the gap is worth naming because it shows up in practice.

The first symptom is that there’s no memory invalidation. A failed attempt at a task, an assumption that was true at the start of a session and is no longer true, a one-off mistake the agent later corrected, all of it can get extracted into L1, clustered into L2, and eventually promoted into L3. Once it’s up the pyramid, nothing retracts it. The agent will recall its own past wrong answer as if it were an established fact and act on it next time. Mature memory designs need some notion of “this is now stale” or “this turned out to be wrong,” and TDAI doesn’t have one yet.

The second symptom is plain noise. Scene blocks accumulate things that aren’t load-bearing: incidental phrasings, transient state, side-channel comments. The agent doesn’t get confused so much as distracted. Recall surfaces something true and unhelpful, and the response shape shifts to accommodate it.

The white-box design partially redeems both problems. Because L2 and L3 are plain Markdown files on disk, you can open persona.md, find the bad line, and delete it. That isn’t memory invalidation in any automated sense, but it’s the difference between an agent that’s quietly confused for opaque reasons and an agent that’s confused because of a specific sentence you can read and remove. Treating the memory store as a file you edit rather than a database you query is the part of TDAI’s design I’d take with me even into a different library.

I still run TDAI over the default long-term memory in either OpenClaw or Hermes. The auto-injection of persona and scene context is a big enough win that the curation gap is a livable tax, and the readable-files design at least makes the gap correctable by hand instead of opaque. Half of agent memory, TDAI answers well. The other half is open.