Turning Vibecoding Debt into a Harness (00): Building a Local Harness in Public, on a Real Repo
If you also write real product code with AI, this feeling will be familiar: a feature ships fast, but a few days later you change something else and the earlier thing quietly breaks.
Every release needs a human in the loop — deciding by hand which tests to run, which logs to read, which failures to ignore, and which ones must be fixed. There’s no real confidence underneath it.
I’m building a harness now, not out of engineering perfectionism, but because these pain points have shown up too many times. So I decided to do this in public — a Build in Public series with one theme: standing up a harness inside a real monorepo.
It’s not that there are no tests. It’s the nagging sense that this change might have broken something again.
This first post is a baseline audit: measuring reality, completely. Not talking concepts, and not demoing on a clean toy repo.
I’ll keep updating it:
- From day one, lay every failure out in the open
- Then step by step turn it into a one-command local check
- that produces a report
- that can back a release
- and that lets an AI Agent enter a fixed self-check loop after coding
Important context
This repo has no GitHub Actions at all. Every build, test, and release runs purely locally on a single Mac mini.
So the goal here is not to put CI in the cloud first. It’s to build a fully local harness that:
- Reproduces reliably on this machine
- Gives a clear result before a release
- Surfaces the parts that can be made public
- Keeps the logs that can’t be public local
- Leaves a verification record an AI Agent can read later
This is not a CI best-practices guide, nor a testing-framework tutorial. It’s closer to a live experiment: after AI writes the code, how does local dev verification stop relying entirely on a human backstop?
After finishing a survey on Agent Harnesses recently, I’m more convinced of one thing: a lot of the time the model isn’t the bottleneck — it’s the environment, tools, context, logs, verification, and permissions around the model that were never set up. That outer layer is exactly what this series tries to fill in.
This series covers two things
Thing one: the local dev verification loop.
Every time an AI Agent finishes a round of changes, it can’t just stop at “I’m done.” It has to run the feature itself, look at the result, leave a record, and hand the uncertain parts back to a human.
The hard part is that the verification targets aren’t a single platform:
- Expo / React Native mobile
- Electron desktop
- Rspress sites
- An Obsidian plugin
- A Chrome extension
- server
- daemon
- CLI
These all start up differently, have different fixtures, and surface results differently. So a harness can’t just be “run the test command.” It has to gradually settle into a set of local self-check actions:
- How to prepare the environment
- How to launch the target
- How to operate the feature
- How to observe logs, screenshots, stdout, files, or network responses
- How to judge whether this round is actually correct
Thing two: the local pre-release gate.
When it’s really time to ship, no more “feels like nothing broke” — there’s a clear local verification record instead.
What I ultimately want to solve is concrete:
- How to reduce repeated rollbacks
- How to spend less time babysitting tests by hand
- How to make AI enter a fixed self-check loop after writing code
- How to turn “feels fine” before a release into “I know what I just checked”
Why this repo?
I didn’t pick a clean demo, and I didn’t pick a well-maintained project. That would be easy to write up and screenshot nicely, but not very convincing.
This time I picked one of my own private repos. It’s real enough, and messy enough.
This repo grew up alongside Vibecoding. Early on, for speed, lots of features just got something running. To validate ideas, lots of boundaries never got named carefully. To let AI keep building, scripts, tests, mocks, fixtures, and ad-hoc conventions piled up layer by layer.
More importantly: I no longer maintain it by reading all of the code. In many places I no longer remember why it was written that way; for some modules I only remember that they should work, not how they actually work.
That actually makes it a great harness target — because this is exactly where a harness earns its keep:
Not relying on the author’s memory to explain the repo, but letting the repo expose its own reality.

It’s not a single app either
Going by the product list on lifeos.vip/about, excluding Midscene, this repo maps to Aino, Vibelet, LifeOS, DeepAsk, and Calendar Pro.
All of these product lines live in this one monorepo:
- Aino is a native-notes and AI workspace.
- Vibelet is a Claude / Codex remote control on your phone.
- LifeOS is an Obsidian PKM system.
- DeepAsk spans Obsidian and Chrome.
- Calendar Pro is the calendar and task-planning layer inside Obsidian.

I recounted from the package.json files under apps:
- 18 top-level directories under
apps - 16 of them are app packages with a
package.json - plus 31 shared packages
These 16 app packages aren’t the same kind of thing:
- 3 mobile apps: Aino Mobile, Vibelet App, Remosidian Mobile
- 1 desktop app: Aino Desktop
- 4 web / docs sites: Aino Site, LifeOS Site, Remosidian Site, Vibelet Site
- 1 Obsidian plugin
- 1 Chrome extension
- 2 server entrypoints
- 2 CLIs
- 1 daemon
- 1 video / content build app

This isn’t a “weekend toy” personal project either. Some of these products are shipped mobile apps, an Obsidian plugin, a Chrome extension, and paid products.
Take the LifeOS line as an example — the about page publicly states: 1k open-source stars, 46k downloads, 1000+ paying users.
So behind this repo there are real users, real revenue, and real maintenance pressure. This harness isn’t facing one page, one service, one package — it’s facing a set of things that have already grown into products.
Repo scale: a day-0 audit
- pnpm + Nx monorepo
- 16 app packages
- 31 shared packages
- 874 test files on the source side
- 8066 test cases identified statically
That 8066 isn’t a precise runtime count. After excluding node_modules, dist, build, out, and .tmp, I counted direct test(...) / it(...) calls in the test sources. It misses dynamically generated and parameterized cases, but it’s already enough to show the scale.
Distribution highlights:
- apps/aino-desktop: 300 test files, 1737 cases
- apps/vibelet-app: 130 test files, 1397 cases
- apps/vibelet-daemon: 75 test files, 1152 cases
- apps/obsidian-plugin: 83 test files, 869 cases

There’s a very real contradiction here
You can’t blindly run the full test suite on every release. Not because the full suite doesn’t matter, but because the cadence of development and release has changed in the AI era.
It used to take a person days to accumulate a batch of changes. Now an AI Agent can finish multiple features and multiple packages in a single day, sometimes touching mobile, desktop, plugin, extension, and server all at once.
Once releases get this frequent, running all 8066 static cases plus every build, e2e, and fixture check, plus capturing live run logs, in full every single time — the local flow quickly gets so slow that nobody wants to use it.
But you can’t skip it just because it’s slow, either. This repo has real users and paying users. If a release rests only on “feels fine this time,” sooner or later you re-break something that was already fixed.
So the problem the harness solves is not “run fewer tests.”
What it really does is layer the pre-release verification:
- Which checks are the smoke that runs every time
- Which are selected by the change impact between two releases
- Which situations must escalate to a full regression
That way a local release is neither superstition nor hostage to the full suite.
What day 0 actually produced
The repo has Vitest, Node test, tsx --test, Jest, and Playwright Electron all at once.
The root looks like it has a unified entrypoint, pnpm run test:unit — but when I actually ran it, it didn’t pass.
Running the existing entrypoint, I found a few representative failures:
- caldav-core: misaligned mock boundaries
- vibelet-cli: unstable fixtures
- Aino Desktop Electron e2e: a React alias path assumption that doesn’t hold
But plenty of checks did pass:
- scripts-unit passes
- workspace exports passes
- dependency version passes
- Aino Desktop already has Playwright fixtures, a temp vault, and a temp userDataDir
Day-0 conclusion
This repo isn’t short on tests. What’s missing is a harness layer that organizes those test assets.
Next I’m not going to chase coverage first. I’m going to make the initial failures:
- Reproducible
- Classifiable
- Reportable
- Continuously feedable into the local check flow
I started with a minimal Smoke Harness. Right now 6 suites all pass, taking 27.3 seconds on the Mac mini, with a summary plus artifacts. And this is only the starting point.
One last thing
The question really worth asking is: why didn’t these problems surface reliably before?
The answer: my past workflow was too fragmented. There was no fixed place where all the checks showed up together.
The point of the first smoke version isn’t how much it covers — it’s that it starts to change the shape of the problem: from “I don’t know what to run or which failures matter” to “these 6 suites are the current first gate.”
The essence of this series is a test: can a real repo, pushed to grow by Vibecoding, become controllable again through a local harness?
The debt Vibecoding brings doesn’t necessarily have to be paid off by a human re-reading all the code. Maybe you can start with a harness, and let the repo learn to expose its own problems.
If you also maintain real products with AI, follow along — let’s compare notes.
Next post I’ll write about how those three failures were handled. But the point isn’t how AI fixes bugs — it’s how to turn a one-off fix into a local check that stays visible every time afterward.
Turning Vibecoding Debt into a Harness (00): Building a Local Harness in Public, on a Real Repo
http://quanru.github.io/2026/06/03/Turning-Vibecoding-Debt-into-a-Harness-00-Baseline-Audit

