The bottleneck in AI is not intelligence. It hasn't been for a while.
Claude can reason through complex codebases. GPT can generate working applications from a paragraph of description. The models are smart, genuinely smart, and they keep getting smarter. So why does every personal AI setup still feel like a demo? Why does every "AI assistant" still need you to re-explain your entire project every time you open a new chat?
Because the models have no memory. Every conversation starts from zero. Every session is a clean slate. The reasoning works, the persistence doesn't exist.
I spent three months building a product called Skippy before I understood this. And I spent another three months trying every existing memory solution before I accepted that none of them worked.
Skippy was an AI assistant that captured your screen via OCR, synced your email and calendar, and tried to build a behavioral model of your daily workflow. The idea was that an AI that could observe everything would eventually understand everything.
We had real traction before launch. Investors were interested, 20k people on Reddit engaged with the concept, beta requests flooding in. And I walked away from it because after three months of building, every single feature felt like a worse version of the standalone tool it was replacing. I was trying automations, making reservations, ordering food on DoorDash, and none of it felt like an improvement over just... doing those things normally. An everything-app that does everything at 60% is worse than six tools that each do one thing at 100%.
That's not a new insight. Google has proved it repeatedly by killing Inbox, Wave, Allo, Hangouts, Stadia, Google+. Billions in engineering behind each one, all dead. But nobody in the AI space wants to learn that lesson because the Jarvis fantasy is too compelling.
The feature sprawl was a problem, but it wasn't the real problem. The real problem was that I couldn't make Skippy remember anything that mattered.
I tried every approach I could find. Here's what happened with each one.
Full-text search (FTS5) over conversation logs retrieves based on keyword overlap. Ask it about a "database migration" and it pulls every conversation that ever mentioned those words, whether you were making a decision, asking a question, or complaining about a coworker's migration script. No sense of what matters. No ranking by importance. Twenty results, maybe one useful.
Semantic search with vector databases is better at finding related content but has a deeper problem: it treats everything with equal weight. A critical architectural decision you made three weeks ago gets the same embedding treatment as a throwaway debugging comment from the same session. When retrieval can't distinguish importance, the system degrades as you add more data. The more you store, the worse it gets.
RAG pipelines combine retrieval with generation, but they inherit every limitation of the retrieval layer underneath. If the retriever pulls irrelevant context, the generator just hallucinates on top of it. RAG doesn't solve the memory problem, it makes the failure mode less obvious.
Manual context management is what most people actually end up doing. Markdown files full of project notes, copy-pasted conversation history, summary documents you maintain by hand. I had files called "skippy-context-may.md" that were just walls of text I'd feed into every new session. At some point the context curation itself became more work than the work it was supposed to support.
None of these are memory. They're all variations of search, and search is not memory. Memory requires deciding what matters before you store it, not after you try to retrieve it.
And the infrastructure problems go deeper than software. There's a growing community convinced you can run competitive AI on consumer hardware.
I built a home lab. 42U rack, RTX 5090, RTX 6000 PRO, 256GB DDR5, 128TB NAS. I run local models constantly for fine-tuning and experimentation, so I'm not coming at this from the outside. But the r/LocalLLaMA crowd running 70B models on MacBook Pros and claiming parity with frontier providers is not being honest about what the benchmarks actually say. The gap between open-source and frontier is structural: different training data, different RLHF pipelines, different scale entirely.
Your M5 Max is a beautiful machine but it is not a datacenter. If consumer hardware could match frontier quality, Anthropic and OpenAI wouldn't exist and NVIDIA wouldn't have the market cap it does. Local inference is phenomenal for prototyping and fine-tuning. It is not a replacement for frontier intelligence when the output has to be right.
And the delusion extends beyond hardware into what people think AI is doing for their engineering skills.
Most people using AI to write code are becoming worse developers, not better. They prompt, accept, ship, and can't explain what they built. When it breaks, and it always breaks, they paste the error back in and hope the next response fixes it. They don't understand the auth flow, can't explain the state management pattern, couldn't debug a race condition if their job depended on it. And increasingly, their job does depend on it.
The developers who will actually matter going forward are the ones who use AI harder than anyone else but understand every line that ships. They push back on bad suggestions, refactor aggressively, and treat AI as a tool that amplifies judgment rather than substituting for it. Prompt engineering is a skill, orchestration is a skill on top of that, and the gap between the people who understand this and the people who don't is widening fast.
AI is meant to speed up your pace of development. Not replace the need to understand what you built.
There's a book series called Expeditionary Forces where an alien AI named Skippy (yes, I named my project after it) can literally manipulate wormholes and crack alien military encryption. Omniscient-level intelligence. But it has a fatal design flaw: it only ever answers the exact question you ask. "Is there danger ahead?" No. Because the danger is to the left and you didn't ask about that.
That's every AI tool right now. Brilliant within the bounds of the prompt, completely blind to everything outside it. It can't flag that you're repeating a mistake from last month because it doesn't know last month happened. It can't connect today's debugging session to the architectural decision you made three weeks ago because every session starts from nothing.
Solving this requires more than better search. It requires an encoding gate, a mechanism that evaluates incoming information for novelty, salience, and prediction error before deciding whether to store it at all. Your brain does this through neurochemical signals: dopamine flags novelty, cortisol flags salience, oxytocin flags social relevance. Only what passes the gate gets encoded into long-term memory. Everything else gets dropped, not archived somewhere for later, dropped. Because a filing cabinet where every folder is labeled "miscellaneous" doesn't help you find anything. It just makes the search slower every time you add a file.
That's what I built TrueMemory to solve. Persistent cross-session memory with a biologically-inspired encoding gate. Incoming information gets evaluated computationally for the same signals the brain uses: novelty, salience, prediction error. High-signal memories get stored. Low-signal noise gets dropped. The system improves as it grows instead of drowning in its own data.
The full architecture and benchmarks are in my arXiv paper.
Three months of building the wrong thing taught me what the right thing was. Everyone's building Jarvis, the voice interface, the integrations, the features. Nobody's building the brain. The part that actually remembers who you are and what you've been working on. Until someone builds that, nobody's even close.
Josh Adler is a researcher and founder of TrueMemory. Research on cognitive memory architectures for AI: arXiv:2605.04897. More at joshadler.com.