Posted 2026-06-03Updated 2026-06-03Engineering12 minutes read (About 1775 words)

Turning Vibecoding Debt into a Harness (00): Building a Local Harness in Public, on a Real Repo

If you also write real product code with AI, this feeling will be familiar: a feature ships fast, but a few days later you change something else and the earlier thing quietly breaks.

Every release needs a human in the loop — deciding by hand which tests to run, which logs to read, which failures to ignore, and which ones must be fixed. There’s no real confidence underneath it.

I’m building a harness now, not out of engineering perfectionism, but because these pain points have shown up too many times. So I decided to do this in public — a Build in Public series with one theme: standing up a harness inside a real monorepo.

This article is also available in 简体中文.

It’s not that there are no tests. It’s the nagging sense that this change might have broken something again.

This first post is a baseline audit: measuring reality, completely. Not talking concepts, and not demoing on a clean toy repo.

I’ll keep updating it:

From day one, lay every failure out in the open
Then step by step turn it into a one-command local check
that produces a report
that can back a release
and that lets an AI Agent enter a fixed self-check loop after coding

Important context

This repo has no GitHub Actions at all. Every build, test, and release runs purely locally on a single Mac mini.

So the goal here is not to put CI in the cloud first. It’s to build a fully local harness that:

Reproduces reliably on this machine
Gives a clear result before a release
Surfaces the parts that can be made public
Keeps the logs that can’t be public local
Leaves a verification record an AI Agent can read later

This is not a CI best-practices guide, nor a testing-framework tutorial. It’s closer to a live experiment: after AI writes the code, how does local dev verification stop relying entirely on a human backstop?

After finishing a survey on Agent Harnesses recently, I’m more convinced of one thing: a lot of the time the model isn’t the bottleneck — it’s the environment, tools, context, logs, verification, and permissions around the model that were never set up. That outer layer is exactly what this series tries to fill in.

This series covers two things

Thing one: the local dev verification loop.

Every time an AI Agent finishes a round of changes, it can’t just stop at “I’m done.” It has to run the feature itself, look at the result, leave a record, and hand the uncertain parts back to a human.

The hard part is that the verification targets aren’t a single platform:

Expo / React Native mobile
Electron desktop
Rspress sites
An Obsidian plugin
A Chrome extension
server
daemon
CLI

These all start up differently, have different fixtures, and surface results differently. So a harness can’t just be “run the test command.” It has to gradually settle into a set of local self-check actions:

How to prepare the environment
How to launch the target
How to operate the feature
How to observe logs, screenshots, stdout, files, or network responses
How to judge whether this round is actually correct

Thing two: the local pre-release gate.

When it’s really time to ship, no more “feels like nothing broke” — there’s a clear local verification record instead.

What I ultimately want to solve is concrete:

How to reduce repeated rollbacks
How to spend less time babysitting tests by hand
How to make AI enter a fixed self-check loop after writing code
How to turn “feels fine” before a release into “I know what I just checked”

Why this repo?

I didn’t pick a clean demo, and I didn’t pick a well-maintained project. That would be easy to write up and screenshot nicely, but not very convincing.

This time I picked one of my own private repos. It’s real enough, and messy enough.

This repo grew up alongside Vibecoding. Early on, for speed, lots of features just got something running. To validate ideas, lots of boundaries never got named carefully. To let AI keep building, scripts, tests, mocks, fixtures, and ad-hoc conventions piled up layer by layer.

More importantly: I no longer maintain it by reading all of the code. In many places I no longer remember why it was written that way; for some modules I only remember that they should work, not how they actually work.

That actually makes it a great harness target — because this is exactly where a harness earns its keep:

Not relying on the author’s memory to explain the repo, but letting the repo expose its own reality.

Why this repo

It’s not a single app either

Going by the product list on lifeos.vip/about, excluding Midscene, this repo maps to Aino, Vibelet, LifeOS, DeepAsk, and Calendar Pro.

All of these product lines live in this one monorepo:

Aino is a native-notes and AI workspace.
Vibelet is a Claude / Codex remote control on your phone.
LifeOS is an Obsidian PKM system.
DeepAsk spans Obsidian and Chrome.
Calendar Pro is the calendar and task-planning layer inside Obsidian.

lifeos.vip/about products section

I recounted from the package.json files under apps:

18 top-level directories under apps
16 of them are app packages with a package.json
plus 31 shared packages

These 16 app packages aren’t the same kind of thing:

3 mobile apps: Aino Mobile, Vibelet App, Remosidian Mobile
1 desktop app: Aino Desktop
4 web / docs sites: Aino Site, LifeOS Site, Remosidian Site, Vibelet Site
1 Obsidian plugin
1 Chrome extension
2 server entrypoints
2 CLIs
1 daemon
1 video / content build app

Stack inventory

This isn’t a “weekend toy” personal project either. Some of these products are shipped mobile apps, an Obsidian plugin, a Chrome extension, and paid products.

Take the LifeOS line as an example — the about page publicly states: 1k open-source stars, 46k downloads, 1000+ paying users.

So behind this repo there are real users, real revenue, and real maintenance pressure. This harness isn’t facing one page, one service, one package — it’s facing a set of things that have already grown into products.

Repo scale: a day-0 audit

pnpm + Nx monorepo
16 app packages
31 shared packages
874 test files on the source side
8066 test cases identified statically

That 8066 isn’t a precise runtime count. After excluding node_modules, dist, build, out, and .tmp, I counted direct test(...) / it(...) calls in the test sources. It misses dynamically generated and parameterized cases, but it’s already enough to show the scale.

Distribution highlights:

apps/aino-desktop: 300 test files, 1737 cases
apps/vibelet-app: 130 test files, 1397 cases
apps/vibelet-daemon: 75 test files, 1152 cases
apps/obsidian-plugin: 83 test files, 869 cases

Baseline audit data

There’s a very real contradiction here

You can’t blindly run the full test suite on every release. Not because the full suite doesn’t matter, but because the cadence of development and release has changed in the AI era.

It used to take a person days to accumulate a batch of changes. Now an AI Agent can finish multiple features and multiple packages in a single day, sometimes touching mobile, desktop, plugin, extension, and server all at once.

Once releases get this frequent, running all 8066 static cases plus every build, e2e, and fixture check, plus capturing live run logs, in full every single time — the local flow quickly gets so slow that nobody wants to use it.

But you can’t skip it just because it’s slow, either. This repo has real users and paying users. If a release rests only on “feels fine this time,” sooner or later you re-break something that was already fixed.

So the problem the harness solves is not “run fewer tests.”

What it really does is layer the pre-release verification:

Which checks are the smoke that runs every time
Which are selected by the change impact between two releases
Which situations must escalate to a full regression

That way a local release is neither superstition nor hostage to the full suite.

What day 0 actually produced

The repo has Vitest, Node test, tsx --test, Jest, and Playwright Electron all at once.

The root looks like it has a unified entrypoint, pnpm run test:unit — but when I actually ran it, it didn’t pass.

Running the existing entrypoint, I found a few representative failures:

caldav-core: misaligned mock boundaries
vibelet-cli: unstable fixtures
Aino Desktop Electron e2e: a React alias path assumption that doesn’t hold

But plenty of checks did pass:

scripts-unit passes
workspace exports passes
dependency version passes
Aino Desktop already has Playwright fixtures, a temp vault, and a temp userDataDir

Day-0 conclusion

This repo isn’t short on tests. What’s missing is a harness layer that organizes those test assets.

Next I’m not going to chase coverage first. I’m going to make the initial failures:

Reproducible
Classifiable
Reportable
Continuously feedable into the local check flow

I started with a minimal Smoke Harness. Right now 6 suites all pass, taking 27.3 seconds on the Mac mini, with a summary plus artifacts. And this is only the starting point.

One last thing

The question really worth asking is: why didn’t these problems surface reliably before?

The answer: my past workflow was too fragmented. There was no fixed place where all the checks showed up together.

The point of the first smoke version isn’t how much it covers — it’s that it starts to change the shape of the problem: from “I don’t know what to run or which failures matter” to “these 6 suites are the current first gate.”

The essence of this series is a test: can a real repo, pushed to grow by Vibecoding, become controllable again through a local harness?

The debt Vibecoding brings doesn’t necessarily have to be paid off by a human re-reading all the code. Maybe you can start with a harness, and let the repo learn to expose its own problems.

If you also maintain real products with AI, follow along — let’s compare notes.

Next post I’ll write about how those three failures were handled. But the point isn’t how AI fixes bugs — it’s how to turn a one-off fix into a local check that stays visible every time afterward.

Turning Vibecoding Debt into a Harness (00): Building a Local Harness in Public, on a Real Repo

http://quanru.github.io/2026/06/03/Turning-Vibecoding-Debt-into-a-Harness-00-Baseline-Audit

Author

LinYiBing

Posted on

2026-06-03

Updated on

2026-06-03

Licensed under

Turning Vibecoding Debt into a Harness (00): Building a Local Harness in Public, on a Real Repo

Important context

This series covers two things

Why this repo?

It’s not a single app either

Repo scale: a day-0 audit

There’s a very real contradiction here

What day 0 actually produced

Day-0 conclusion

One last thing

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Catalogue

follow.it

Recents

Categories

Archives

Tags