I maintained a file called skippy-context-may.md for months. Four hundred lines of project state, architectural decisions, tool versions, things that broke, things I fixed. Every new AI session started the same way: open the file, select all, paste, then fill in whatever happened since my last edit. It was fifteen minutes of overhead every single day and I told myself it was just part of the workflow.
Then I automated it away and realized how much time I'd been wasting. Six hours a month, minimum, on a ritual that felt productive but was really just me being my AI's secretary.
The system runs four hooks during normal Claude Code usage. One fires at session start, one on prompt submission, one on stop, one on compaction. The stop hook is the one that does the heavy lifting. After a session ends, it grabs the full conversation transcript and runs an extraction pipeline against it.
The extraction pulls seven categories of information out of the raw text:
Each category matters because it serves a different retrieval pattern later. When the system needs to know what tools you use, it pulls preferences and technical context. When you're about to make an architecture decision, it surfaces prior decisions. When you corrected yourself three weeks ago, the correction overrides the old answer instead of competing with it.
Raw extraction produces too many candidates. Fifteen sessions a day across three projects means hundreds of potential memories, most of which are noise. "Fix the indentation on line 47" is not worth storing. "The indentation convention for this project is tabs" probably is.
The encoding gate handles the filtering. Three signals score each candidate memory:
The scores combine into a threshold decision. Clear the gate, you get stored. Don't clear it, you get dropped. Not archived, dropped. Because storing everything makes retrieval worse, not better. Every stored memory competes with every other memory during search.
The prediction error signal does something counterintuitive with contradictions. When a new memory contradicts an existing one, the prediction error spikes and actually lowers the storage threshold. Contradictions get stored more easily because they mean the user changed their mind. I said I preferred npm in March. By May I was using Bun and never explicitly said "I switched." The system caught the behavioral shift and encoded the new preference without me having to announce it.
By day three I had over two hundred extracted memories from about a dozen sessions. Zero saved manually. The skippy-context-may.md file was already stale and I kept opening it out of habit before realizing the system already knew everything in there plus a hundred things I never wrote down.
The real proof came around week three. I was debugging a Pi node that kept dropping its NAS connection every two hours. Couldn't figure it out. Before I'd even finished describing the problem, the system surfaced a memory from eleven days earlier: during an unrelated router configuration session, I'd mentioned I changed the DHCP lease time from 24 hours to 2 hours. One throwaway sentence. The system stored it as technical context, and eleven days later it turned out to be the root cause.
By week four the system was flagging contradictions in my own decisions. I'd said I wanted SQLite for everything early on, then started quietly exploring Postgres for one specific use case. When I was making a related architecture decision, the system surfaced both positions and asked if I wanted to update the earlier one. A context file can't do that. It doesn't know the difference between you changing your mind and you forgetting what you decided.
Cross-project connections are the other thing you can't replicate manually. A debugging insight from one project showed up as relevant context in a different project weeks later because the underlying pattern was the same. That only works because TrueMemory stores memories without scoping them to a single project.
Honestly, it's because search is easier. Vector embeddings, reranking, RAG pipelines, these are well-understood problems with a dozen open-source implementations. You can get 86% on LoCoMo with off-the-shelf tools.
Ingestion is a judgment problem. You're deciding what's worth keeping before you know what future query will need it. That's architecturally harder than matching queries to documents. It requires real-time evaluation of novelty, salience, and contradiction state, plus scale handling as the memory store grows.
Most companies skip the hard part and build better search on top of whatever the user manually saves. It works, but it's not memory, it's a notebook with good search. The full architecture, including the encoding gate design and benchmark results, is in the arXiv paper.
Your brain doesn't have a paste button. It doesn't need one. And after a month of running a system that actually handles ingestion automatically, I can't go back to the old way.
Josh Adler is a researcher at TrueMemory, a Sauron company. Research: arXiv:2605.04897. More at joshadler.com.