There Are Cameras in Every Room of My House

## The hardware

5x Raspberry Pi Zero 2W ($15 each)
5x ArduCam IMX708 12MP 120-degree wide-angle cameras
5x WM8960 audio HATs
1x 13TB Ugreen NAS
Custom Python daemon: motion/audio detection, triggered MJPEG/H.264 + WAV recording, idle sleep

Total cost: under $500. Runs 24/7 across five rooms: office, kitchen, living room, bedroom, hallway.

Josh Adler — Paradox Pi nodes on the workbench during assembly

## Why

Your AI knows you through text. Only text. It has never seen your face, never watched you work, never noticed you pacing the room before a stressful call. That behavioral data is more valuable than anything you'll ever type into a prompt.

I spent weeks debugging device tree overlays, swapped camera modules three times (started with ov64a40 64MP, settled on IMX708 12MP after thermal issues killed the first setup), and burned through two Pi Zeros that couldn't handle the heat. This was infrastructure work, not a weekend hack.

Josh Adler — Paradox node wall-mounted in the apartment

## The stack

There are three layers to AI that actually understands you:

Observation — cameras, mics, sensors. Physical-world capture.
Memory — persistent, intelligent, cross-session. Not a vector dump.
Reasoning — the LLM. Already good enough.

Everyone builds layer 3. I built TrueMemory for layer 2 (arXiv paper). Now I'm building layer 1.

Josh Adler — Her movie reference, AI that observes the physical world

## What I learned

The capture hardware is the easy part. Cheap sensors on cheap boards, running cheap compute. The hard part is the pipeline between raw sensor data and LLM-ingestible context. Motion detection to recording to vision model inference to structured text to memory to retrieval. Every step introduces noise. Solving that pipeline is where the real work lives.

Josh Adler builds persistent memory and physical-world awareness for AI. joshadler.com | arXiv paper