A terminal tunnel to an autonomous Codex loop is the product surface
The app now has a real local cockpit path: run
python3 scripts/start_autonomous_cockpit_bridge.py to start a detached autonomous
Codex loop, stream its terminal-style log, and watch each bounded iteration publish to
main. For a read-only tunnel by itself, run
python3 scripts/bridge_mvp_cockpit.py; it exposes a read-only tunnel so someone
else can watch the local Codex loop without getting start/stop controls.
Codex now runs explicit autonomous iterations with a bounded prompt, changed files, checks, commit, push, and next step.
The cockpit shows the PTY/log stream so operators can see the agent working instead of trusting a black box.
Diffs, generated datasets, eval reports, screenshots, decisions, and handoffs become structured product state.
Every prompt, correction, approval, failed check, and accepted recommendation becomes future training and eval material.
Embedded read-only loop feed
This panel fetches the local Codex loop through the read-only bridge as data, not as an iframe, so the landing page can render status, import readiness, contract checks, queue state, and logs inline.
connecting to remote loop...
What exists on disk today
The generated folder shows both sides of the current MVP: business-first discovery outputs and dataset-first eval scaffolding. Every claim below maps to a file in this repo.
dallas-electrician-sample-v1 and dallas-electrician-south-dallas-v1.
Each includes a business profile, workflow map, moat hypotheses, data-gap plan,
eval opportunities, and a short operator summary.
generated/normalized/dallas-electrician-sample-v1/ plus imported CSV
normalization runs under generated/normalized/dallas-electrician-import-sample-v1/
and generated/normalized/dallas-electrician-import-sample-v2/. The wider
v2 import currently carries 530 permits, 1072 inspections,
1610 source-record lineage rows, 3 rule documents, and explicit coverage for
pass, fail, partial, cancelled,
not_ready, and unknown inspection outcomes.
14
tasks and 5 reviewed label rows. The imported v1 scaffold has
18 tasks and 7 reviewed label rows, while imported v2
expands to 1083 tasks and 536 reviewed label rows across
next_inspection_outcome, failure_reason_classification,
recommended_next_action, and pattern_extraction.
generate_dallas_discovery_artifacts.py,
import_dallas_permit_extracts.py,
generate_dallas_fixture_pack.py,
generate_dallas_label_reviews.py, and
generate_dallas_eval_artifacts.py.
generated/contracts/dallas-electrician-contract-summary-v1/ now compares
the synthetic scaffold to imported v1 and v2, confirms
13 passing contract checks, includes optional imported
rule_documents.jsonl coverage, keeps the four eval task families stable,
and now checks repeated result-state, failure-reason, pattern-slice, and next-action support explicitly.
generated/coverage/dallas-electrician-edge-case-coverage-v1/ now makes
repeated support visible across result states, failure reasons, pattern slices, and
next-action groups. Imported v2 has repeated support for 6/6
result states, 5/5 failure reasons, 5/5 pattern slices, and
6/6 next-action groups.
generated/workflows/dallas-inspection-workflow-v1/ turns reviewed labels into
a concrete browser-readable action queue. The current queue has 530 items, including
6 high-priority failed inspections and 524 medium-priority partial
or not-ready inspections.
Freshest generated evidence
Freshest non-page artifacts
The freshest generated data artifacts are
generated/contracts/dallas-electrician-contract-summary-v1/summary.md
and summary.json, written on May 27, 2026. They compare all three
current Dallas scaffolds and confirm that the downstream contract still holds.
Coverage thresholds are now enforced
The contract summary now promotes the most important coverage expectations into checks: repeated current result states, repeated core failure reasons, repeated pattern slices, and repeated key next-action groups.
Imported sample v2 is still the data frontier
The widest normalized dataset remains dallas-electrician-import-sample-v2:
530 permits, 1072 inspections, 3 rule documents,
1610 source-record lineage rows, 1083 eval tasks, 536
reviewed label rows, and coverage for
cancelled, fail, not_ready, partial,
pass, and unknown.
Discovery breadth is intentionally small
Business-first discovery currently has two generated variants:
dallas-electrician-sample-v1 and
dallas-electrician-south-dallas-v1. That proves the contract can vary
by profile, but it is still narrow by design.
Coverage now closes the thin spots
The edge-case coverage report now shows repeated latest-import support for every
current result state, failure reason, pattern slice, and next-action group, including
complete_remaining_work|schedule_reinspection.
There is now a product-shaped output
generated/workflows/dallas-inspection-workflow-v1/index.html
shows the first operator-facing workflow: permit, address, contractor, inspection
failure context, recommended actions, observed follow-up, and captured operator
correction state for 530/530 queue items.
Current build log
Widened imported v2 with ELZ-2026-0731, one more Dallas electrical repair sequence that
repeats the henatriacontarecentafoil-bracket incomplete-work repair path, captured the matching accepted operator
correction, and regenerated the normalized, fixture, eval, coverage, contract, and
workflow artifacts. The latest scaffold now has 530 permits,
1072 inspections, 1083 eval tasks, 536 reviewed label
rows, 1610 source lineage rows, and repeated support for
6/6 next-action groups.
Added scripts/generate_dallas_inspection_workflow.py and generated
generated/workflows/dallas-inspection-workflow-v1/. The workflow turns
reviewed Dallas inspection labels into a 13-item action queue with
priority, address, contractor, trigger inspection, recommended actions, and observed
follow-up fields, plus a static index.html page that can be opened locally.
Promoted the main edge-case coverage expectations into
scripts/generate_dallas_contract_summary.py. The generated contract summary
now passes 13/13 checks, including repeated current result-state support,
repeated core failure-reason support, and repeated key next-action support for the latest
imported Dallas scaffold.
Added scripts/generate_dallas_edge_case_coverage.py and generated
generated/coverage/dallas-electrician-edge-case-coverage-v1/. The report
makes repeated support visible across result states, failure reasons, pattern slices,
and next-action groups; latest imported v2 now shows repeated support for
6/6 result states, 5/5 failure reasons,
5/5 pattern slices, and 6/6 next-action groups after the
May 23 fixture-widening pass.
Widened imported v2 with three more Dallas electrician permit sequences:
a repeated cancelled/unknown remodel-final path and two repeated service-release failures
around panel and disconnect corrections. Regenerated the normalized, fixture, eval, and
contract-summary artifacts. Imported v2 now carries 13 permits,
38 inspections, 49 eval tasks, 19 reviewed label rows,
5 repeated pattern slices, and 5 repeated next-action groups while
the shared contract remains 10/10.
Widened imported v2 again with one repeated service-release sequence and one
repeated access-blocked final sequence, tightened the importer and next-action hinting so
access labels no longer come from accidental panel schedule matches, and
regenerated the normalized, fixture, eval, and contract-summary artifacts. Imported
v2 now carries 10 permits, 28 inspections,
37 eval tasks, 15 reviewed label rows, and a real repeated
ensure_site_access|schedule_reinspection next-action group.
Widened imported v2 with three more Dallas electrician permit sequences,
regenerated the normalized, fixture, eval, and contract-summary artifacts, and pushed
recurring remodel, new-install, and repair pattern slices to 2-permit support
each. The shared contract now passes 10/10 checks and makes repeated support explicit.
Refreshed generated/landing.html against the actual April 26 artifact set:
the page now leads with the contract summary, keeps the product framing broad, corrects the
imported v2 lineage and rule-document counts, and points the next-step language
at the real remaining gap instead of inventing broader progress.
Extended scripts/import_dallas_permit_extracts.py so imported Dallas
samples can optionally normalize rule_documents.csv into
rule_documents.jsonl plus matching source-lineage rows. Added Dallas
electrical rule fixtures to imported v1 and v2, then
regenerated the contract summary so the rules path is now checked explicitly.
The unattended loop exposed a failure-propagation bug in the supervisor path: failed child sessions were breaking inner work but still returning success to the day loop, so the next runtime hardening step is to make that path fail closed instead of spinning.
Added scripts/generate_dallas_contract_summary.py and generated
generated/contracts/dallas-electrician-contract-summary-v1/, making the
shared synthetic-versus-imported Dallas contract explicit. That contract is now
broadened by the imported rule-document path; the next gap is now repeated support for the remaining service-release and access-heavy edge cases.
Refreshed generated/landing.html again so it reflects the current repo
state after the contract summary landed: three normalized dataset paths, two discovery
variants, three eval scaffolds, exact task and reviewed-label counts, and the real
remaining normalization gap instead of stale pre-summary language.
Added generated/raw/dallas-electrician-import-sample-v2/ and generated
normalized, fixtures, and evals -v2
artifacts from it, widening the Dallas importer coverage to include
pass, fail, partial, cancelled,
not_ready, and unknown inspection outcomes while keeping the
downstream contracts stable.
Updated generated/landing.html to act as a truthful landing page and
changelog for the repo's current state: broad product framing, exact Dallas artifact counts,
and explicit statements about what is still only scaffolding.
Added scripts/import_dallas_permit_extracts.py plus
generated/raw/dallas-electrician-import-sample-v1/, generated
generated/normalized/dallas-electrician-import-sample-v1/, and
proved that the imported sample can flow through
generated/fixtures/dallas-electrician-import-sequences-v1/ and
generated/evals/dallas-electrician-import-sample-v1/ without changing
the downstream Dallas contracts.
Added batch discovery generation in scripts/generate_dallas_discovery_artifacts.py,
created generated/intake/dallas-electrician-south-dallas-v1/intake.json,
and generated a second Dallas discovery run focused on older-home South Dallas and Oak Cliff work.
Finished the row-backed path for reviewed supervision by adding
scripts/generate_dallas_label_reviews.py and wiring
scripts/generate_dallas_eval_artifacts.py to emit
generated/evals/dallas-electrician-sample-v1/label_reviews.json
directly from normalized rows.
Added generated/normalized/dallas-electrician-sample-v1/ plus
scripts/generate_dallas_fixture_pack.py, so the reusable Dallas
fixture pack is generated from row-shaped permit and inspection records instead of
hand-maintained JSON sequences.
Added deterministic discovery and eval writers, along with the first generated discovery and eval sample directories, so the repo could produce its key MVP artifacts from structured inputs rather than prose alone.
Locked the first implementation wedge to Dallas residential electrical permits and inspections and wrote the supporting spec, schema, eval, and discovery contract docs.
What is not built yet
Built
Broad product framing, a narrow permit-data proof wedge, artifact contracts, deterministic writers, two discovery variants, three normalized Dallas dataset paths, three eval scaffolds, and a shared unattended loop for keeping generated status surfaces current.
Not built
No automatic end-to-end moat builder yet, no production workflow runner, no live baseline-versus-moat benchmarking, no dynamic local app route, and no evidence yet that the permit-data wedge generalizes into a durable data moat.