Skip to main content

Software Factory

Status note

This engineering note captures repo-specific agent, review, and verification patterns shaping VRDex delivery. It is not public product documentation or a universal contributor mandate.

Promote only proven reusable patterns into basics-agentic-dogfooding or global agent context. Keep speculative workflow ideas here until they have repeated value.

Locked decisions

  • VRDex is an opinionated repository, not a neutral sandbox.
  • Safe routine progress should continue without repeatedly asking for permission.
  • Global repo conventions belong in AGENTS.md.
  • Personal/operator preferences belong in AGENTS.local.md, which should remain gitignored.
  • Infrequent onboarding/setup material should live in docs and a repo onboarding skill, not in every-session AGENTS.md context.
  • Durable markdown should live under docs/ rather than accumulating at the repo root.

Current recommendation

  • treat product design and software-factory design as parallel workstreams
  • when an agent takes a poor path or asks a low-value question, finish the immediate task and then capture the process fix before moving on
  • bias toward stronger human and agent onboarding so new sessions converge quickly on repo norms
  • prefer discoverable, cross-linked artifacts over chat-only decisions
  • avoid turning local process experiments into public product promises

Rigorous, not prescriptive

Locked decision

  • VRDex should be rigorous about review, verification, and documentation expectations.
  • VRDex should not force every contributor to use one specific agent, model, or editor workflow.

Current recommendation

  • optimize for compatibility with multiple agents so long as they can operate inside the repo's review and verification system
  • use the repo's review/recycle loops to catch slop and improve quality without over-policing how people work locally

Global vs local context model

AGENTS.md

Use for:

  • repo-wide behavior defaults
  • autonomy and commit/push posture
  • durable safety rules
  • durable workflow expectations every agent should always know

Do not use for:

  • long onboarding playbooks
  • personal operator preferences
  • fast-changing implementation details

AGENTS.local.md

Use for:

  • personal communication preferences
  • local model preferences
  • operator-specific autonomy bias and scratchpad habits
  • anything that should not silently become repo policy

Skills

Use for:

  • onboarding flows
  • repeatable multi-step setup
  • model/tool/MCP orientation
  • control-loop playbooks that are useful on demand but too large for every session

Repo onboarding skill direction

The VRDex onboarding skill should cover:

  • local agent setup expectations
  • how this repo separates global policy from local operator preference
  • supported agent roles and model-routing expectations
  • how to think about agent development in this repo
  • software-factory conventions, issue filing conventions, and documentation conventions
  • typical development workflow from task intake to verification to merge

Review-recycle loop

Current recommendation

  • treat review-recycle as a first-class normal development loop, not an exception
  • use a fresh-context reviewer and a recycler that resumes the original implementer context when possible
  • trigger recycler work on PR creation, draft->ready transitions, new review comments, failing checks, and mergeability regressions
  • triage every outstanding review comment before pushing a follow-up commit on an open PR
  • reply or react with disposition before resolving review threads; do not silently resolve rejected or partially applied feedback
  • use GitHub Copilot automatic follow-up reviews and CI for ordinary iteration; reserve paid/manual reviewer reruns for substantial change sets

Roles

  • implementer: writes the change, runs relevant verification, and keeps enough context to make minimal follow-up patches.
  • reviewer: inspects the change from a fresh context and returns source-linked findings, confidence, uncertainty, and test gaps without editing files.
  • recycler: triages review findings and failing checks, decides apply/reject/split/ask, patches confirmed issues, reruns verification, and records dispositions.

Candidate Reviewer Sources

  • GitHub Copilot: automatic or low-friction PR review and follow-up comments.
  • Greptile: paid/manual review for coherent change sets where cost is justified.
  • Codex, Claude, or other fresh-context agents: parallel source-linked review lanes outside GitHub when a cold read helps.
  • Custom GitHub Action reviewers: deterministic or model-backed checks that produce PR comments or artifacts.
  • Custom OpenCode reviewers: repo-local reviewer sessions that can inspect working trees, artifacts, and docs before feedback is reflected into GitHub.

Candidate direction

  • run first-pass reviewer/recycler loops outside GitHub when practical, then reflect the result back into GitHub once the branch is in better shape
  • allow agents to request reviewer and recycler jobs from the common task pool defined by #50
  • encode reviewer source, confidence, false-positive disposition, and recycler outcome as structured metadata when #50 moves from direction to implementation

Trigger model

  • PR opened ready for review
  • draft PR marked ready
  • substantial new commit pushed to a PR branch
  • baseline check, deploy check, CodeQL, or hosted E2E failure
  • new blocking reviewer comment from a human or AI reviewer
  • mergeability regression after base branch movement
  • stale branch that blocks otherwise-ready merge

Recycle gate

Before the next recycle push:

  • gather all outstanding review comments and failing checks
  • decide for each item: apply, reject with reason, split follow-up, or ask one human question
  • make the smallest correct patch set
  • rerun the relevant verification
  • record dispositions in the PR or issue when review context would otherwise be lost

Orchestrator / supervisor loop

Current recommendation

  • add an orchestrator or executive-assistant layer that sits above implementer sessions
  • the orchestrator should decide one next action when an implementer stops: continue, ask one human question, dispatch another agent, or mark done
  • prefer checkpointed incremental deltas over replaying giant transcripts
  • conserve human attention by asking one concrete decision at a time, with the recommended option first when there is a clear default
  • keep supervisor messages bounded to task state, blocker, evidence, and next-action choices instead of forwarding full transcripts by default

Candidate direction

  • keep implementer sessions persistent and resumable
  • treat recycler work as resuming the original implementer session rather than spinning up a brand-new deep-context worker each time
  • treat .opencode/plugins/supervisor-loop.ts and .opencode/commands/supervisor.md as local experiment files until restart/tool-discovery behavior is validated in a follow-up issue under #43

Resume policy

  • resume the same session when the task needs preserved implementation context, review history, or partial local state
  • start a fresh session when the job is independent, benefits from cold review, or needs reduced context bloat
  • pass a compact delta package upward: goal, files changed, verification run, blockers, open decisions, and proposed next action
  • do not use resumability to hide stale assumptions; reread changed files before editing after a long pause

Delta package template:

  • goal and linked issue/PR
  • branch, files changed, and verification already run
  • blocker or reason the implementer stopped
  • open decisions, with recommended option first when possible
  • proposed next action: continue, ask, dispatch, recycle, or mark done

OpenCode server / task-pool direction

Current recommendation

  • move toward a common hosted OpenCode server that acts as a shared task pool
  • prefer atomic jobs/tasks over thread-subscribed chats when that improves dispatch, accounting, and re-entry
  • track active, idle, completed, and resumable agent sessions as discoverable system state
  • distinguish atomic jobs from resumable sessions explicitly
  • keep dispatch through an orchestrator path instead of uncontrolled recursive agent spawning

Concepts

  • atomic job: a bounded task with a clear input package, expected output, and completion state. Example: review one issue closure, recycle one confirmed finding, or recover one stale PR.
  • resumable session: an agent session with useful retained context that can continue implementation, recycle review feedback, or recover mergeability without replaying the whole history.
  • task pool/server: the roster and queue layer that tracks jobs, sessions, states, assignments, and re-entry metadata.
  • orchestrator request: a controlled request for new work or a resumed session; agents should not recursively spawn uncontrolled work.

Lifecycle states worth preserving:

  • requested
  • queued
  • dispatched
  • active
  • checkpointed/resumable
  • completed, failed, or cancelled

Candidate direction

  • let agents request new agents through an orchestrator-facing interface instead of directly spawning uncontrolled work
  • expose parallelism ceilings, roster visibility, and resume-vs-new-session policy as explicit system controls
  • expose task type, required tools, repo path, branch, risk level, and expected verification as dispatch metadata when a follow-up #50 implementation issue exists

Mergeability recovery

Current recommendation

  • treat mergeability regression as a first-class recycler trigger
  • default to resuming the original implementer session when practical
  • let automation update from base or resolve straightforward conflicts only when the intended behavior remains clear
  • ask for human input before risky conflict resolution that changes product, security, billing, trust, or migration behavior

Recovery loop

  1. Detect unmergeable or stale PR state.
  2. Gather base branch, changed files, failing checks, and outstanding reviews.
  3. Resume the original implementer when context is useful and available.
  4. Apply the smallest conflict or stale-branch fix.
  5. Rerun affected checks.
  6. Leave a concise PR comment if the recovery changed behavior or deferred work.

Detection sources:

  • GitHub PR mergeability state or branch protection state
  • base/head SHA mismatch indicating a stale branch
  • required check failures after base branch movement
  • failed update-from-base or merge attempts
  • scheduled or webhook-based stale-PR scans once automation exists

Dispatch package:

  • PR number, base/head refs, and base/head SHAs
  • changed files and conflict files when known
  • failing checks and outstanding review threads
  • preferred session ID or original implementer identity when available
  • risk flags for product, security, billing, trust, migration, or data behavior

Automation boundary examples:

  • straightforward: stale base update with no conflicts, formatting-only conflict, or test snapshot conflict with unchanged product behavior
  • human required: conflicts that alter product behavior, auth, billing, trust labels, migrations, data retention, or public API contracts

Verification loops

VRDex should plan verification as a layered system, not a single test command.

Required layers to design for

  • lint and formatting validation
  • typecheck/build validation
  • unit and integration testing
  • end-to-end testing
  • screenshot and visual regression review
  • VLM review of meaningful UI changes
  • validation of scripts and ancillary automation code
  • AST/policy checks where structural rules matter
  • infrastructure verification for IaC and deployment automation

Candidate direction

  • feature-ready agents should present a video or screenshot-backed validation package to the human reviewer
  • the human checkpoint should happen when the feature is already mergeable, not as a substitute for engineering verification

Definition of ready

Current recommendation

  • every non-trivial feature should define how it will be reviewed, verified, rolled out, and measured before implementation begins
  • definition-of-ready belongs in engineering/docs discipline, not just in a PM tool
  • docs/agentic/definition-of-ready.md is the canonical checklist and issue-snippet reference for this repo

Definition of done

Current recommendation

  • every non-trivial feature should close with an explicit done check, not just a claim that implementation landed
  • definition-of-done should cover verification completion, documentation updates, rollout posture, and review closure
  • docs/agentic/definition-of-done.md is the canonical closeout checklist and handoff-snippet reference for this repo

Feature flags and analytics

Current recommendation

  • treat feature flags, experimentation, and product analytics as first-class design concerns for feature work
  • default to asking whether a feature should be gated, progressively rolled out, or instrumented
  • avoid stacking overlapping platforms too early; prefer one primary system per concern until a real gap appears
  • docs/agentic/product-analytics-and-feature-flags.md is the canonical policy for tool roles, rollout posture, and product-signal expectations

LLM and agent observability

Current recommendation

  • keep LLM/agent observability separate from product analytics and feature flags
  • do not add a dedicated LLM observability platform until traces, evals, prompt quality, cost, or loop diagnostics are hard to manage with current artifacts
  • treat Langfuse as the first candidate to evaluate if dedicated traces/evals become necessary
  • consider signed action receipts or similar provenance/accountability artifacts as a separate concern from tracing
  • prompt text is not captured by default until a redaction, privacy, and retention policy exists

First signals worth capturing

  • task goal and issue/PR linkage
  • model/agent role at a coarse level
  • tool categories used, without secrets
  • review findings, false-positive dispositions, and recycler outcomes
  • checks run and pass/fail state
  • human decisions requested and answered
  • cost/latency only when the platform exposes it safely and usefully

Implementation ownership: #45 only chooses direction. Any dedicated tracing platform, prompt capture, eval harness, or signed action receipt implementation needs a follow-up issue under #43.

Boundary

Product analytics answers whether VRDex users are succeeding in the product. LLM/agent observability answers whether repo/product agents are producing reliable work. Do not force both jobs into one telemetry system by default.

Cross-repo promotion model

Current recommendation

  • solve repo-specific versions first inside VRDex
  • once a pattern proves useful here, promote the generalized version into basics-agentic-dogfooding
  • avoid over-generalizing before the repo-specific version has shown real value

Contributor posture

Current recommendation

  • newer contributors should be helped by the system rather than forced to infer expectations from tribal knowledge
  • reviewer agents and recycle loops should help raise quality without requiring maintainers to hand-police every sloppy draft
  • protected branches, contributor roles, and org-level controls should arrive when collaboration volume justifies them
  • docs/agentic/contributor-workflow.md is the canonical contributor contract and onboarding pointer for this repo

Backlog direction

Software-factory implementation ideas should be tracked under #43 or linked child issues, separate from product features.