00 / Lived-behavior verification

Behavior receipts for agent-built software.

BitterQA turns a repo-owned behavior contract into a run with artifacts, a verdict, a failure class, a repair lane, and the rerun that settles whether trust was restored.

Built for products that change faster than humans can manually verify them.

Request access Open app

All evidence preserved Receipts · Artifacts · Repair lanes

bq-receipt · 91f4 2026-05-08 14:22Z

Contract

checkout.login_paid_path

✓ load marketing path 200 · 312ms
✓ submit signup with QA principal screenshot · 042
✓ checkout with test card trace · 088
✓ entitlement available in account api 200

release-gate deterministic artifacts kept 30d

01 / Thesis

When creation gets cheap, verification becomes the constraint.

Agent work raises the rate of change. That is the point. Faster change also means faster decay unless the product can answer back. QA has to become part of the environment, not a slower ceremony at the end.

Live behavior

The product is the judge.

Check the paths customers actually touch — landing, signup, login, checkout, onboarding, account, and public APIs — against the real access path, with declared principals and policy.

External evidence

A passing run is a receipt.

Screenshots, response shapes, status codes, traces, timings, and final verdicts ride together as one durable artifact bundle with a stable id.

Agent fuel

A failing run starts the next fix.

Failures land with a class — runtime, contract drift, payment, browser custody — so the next agent or human starts from a narrow piece of evidence, not a blank page.

02 / Contracts

A repo-owned behavior contract, not another test file.

A QA contract names what live behavior must hold for a real user, against which site, on which schedule, with which principal posture, and what evidence to keep. It belongs in the repo that owns the product.

BitterQA owns hosted execution, run history, verdicts, and repair state. The product repo owns what "works" actually means.

site Bound hostname or app — the live surface under verification.

flows Browser walks, API contracts, and webhook round-trips for each critical user path.

principal A QA-only account, reset hooks, and scoped credentials. Never your live customer data.

schedule Daily, hourly, on every release, or wired into a Grid deploy gate.

artifacts Screenshots, network traces, console output, response bodies, DOM snapshots, timings.

repair Whether agents may propose selector fixes, contract updates, or coverage expansions.

03 / Loop

Contract → run → artifacts → verdict → repair → rerun.

A small loop that keeps fast software honest. Each step leaves a durable handle the next step can cite.

01 Declare

Pin the flows that carry the product.

Start with the paths whose silent breakage would cost trust. Live homepage smoke is not enough — you want signup, paid feature use, support widget presence, and the API contracts customers depend on.

contract.flows[]

02 Run

Hit the live surface at the pace of change.

Browser flows for human paths, API checks for machine paths, webhook round-trips for the seams between. Daily, hourly, or wired into a release gate.

qa.run --gate=release

03 Capture

Keep the artifacts that prove it.

Screenshots, traces, response bodies, console and network evidence, DOM snapshots, and timings ride along with the run under one stable id. Bitter Browser supplies the perception layer when human paths need real-browser proof.

artifacts.bundle

04 Settle

Issue a verdict, route the failure.

A passing run is a receipt. A failing run is a class — runtime, contract drift, principal, payment, browser custody — paired with the evidence that names the next fix and the rerun that proves it landed.

verdict · failure_class · rerun

04 / Receipts

Every run leaves a stable handle.

Whether it passed or failed, the next person — or the next agent — starts from a bounded piece of evidence. Pass means trust is preserved. Fail means a narrow class of failure with a repair lane attached.

bq-receipt · 7c2a 2026-05-08 09:14Z

Contract

marketing.public_smoke

✓ homepage 200 + canonical headers 200 · 188ms
✓ primary CTA visible above the fold screenshot · 011
✓ support widget mounted dom · ok

bq-receipt · 4e08 2026-05-08 13:01Z

Contract

checkout.entitlement_landed

✓ signup with QA principal screenshot · 102
✓ checkout session created trace · 184
✗ entitlement appears in account browser_capture_missing

Failure classes routed: runtime contract_drift principal payment browser_capture_missing hub_provisioning

— / Boundary

Not a dashboard. A contract layer.

BitterQA is the lived-behavior verification layer for agents and the teams running them. Grid proves a service was deployed and reachable. Ping proves external liveness. BitterQA proves whether a real product path still behaves as promised.

Daily and release-gate verification stays cheap and deterministic. Agentic repair wakes only for setup, drift, failure, or coverage expansion. Bitter Browser supplies bounded perception and capture under explicit policy — never leaking private session state by default.

05 / Pricing

Start narrow. Verify the paths that carry the product.

One small contract that catches a real failure once is worth more than a hundred green smoke checks that never light up.

Starter

$29/mo

For one product that needs a basic live verification loop.

1 property
3 browser or API checks
Daily scheduled runs
14-day artifact history

Core

$99/mo

For products with several customer-visible paths worth guarding.

3 properties
25 browser or API checks
Daily, hourly, and release-gate runs
30-day screenshots, traces, verdicts

Loop

$299/mo

For teams using agents to ship and verify product changes continuously.

10 properties
100 browser or API checks
90-day artifact history
Failure classes + agent-readable repair notes

06 / Access

Build the first behavior receipt.

Tell us what your agents or team are changing, which live flow has to stay true, and what evidence would make the next fix obvious.

Approvals are manual. Replies come from a human, not a queue.