Title for the plan¶
Prompt¶
Before responding to questions or discussion points in this document, explore the occystrap codebase thoroughly. Read relevant source files, understand existing patterns (project structure, command-line argument handling, input source abstractions, output formatting, error handling), and ground your answers in what the code actually does today. Do not speculate about the codebase when you could read it instead. Where a question touches on external concepts (OCI image specs, Docker/Podman compatibility, registry APIs), research as needed to give a confident answer. Flag any uncertainty explicitly rather than guessing.
Situation¶
My Shaken Fist project consistency audits started out as a thought bubble, but they've grown into something I think is really useful. On the other hand, I think I have also outgrown the current process.
Mission and problem statement¶
In terms of what I've learn from this process so far, I think there are three main points:
-
Listing the various things I audit for in the
PROJECT-CONSISTENCY-AUDITS.mdsingle markdown file is becoming unwieldy, as well as reducing the parallelism that I can apply -- if for example I had a directory of audit items then I could spawn an agent per item and perform the audits in parallel. -
Secondly, I think work item tracking in
PLAN-consistency.mdis similarly awkward and I'd be better off tracking outstanding items in github issues, although I am unsure if they should be issues on the shareddevelopmentrepository or on the target project itself. If they were tracked in each project I think we'd need a label like "consistency" to make them easier to surface. -
Finally, I think those issues should more strongly link to a consistent implementation approach -- the templates are a good idea, but often the model isn't aware they exist until its too late. If the issue linked to the specific template to use that would be helpful.
This is especially true because it occurs to me that my current automated
review / fix / retest flow is a bit weird. Specifically, its weird that
the reviewer isn't also the thing which proposes fixes for what it
finds and instead needs to somehow convey the problems to a new instance
of the model. Given we also only do one automated review per PR without
human intervention, we should also be safe to automatically kick off a
retesting run after the automated fixes have been proposed. That is,
I'd like to squash those three workflows into one, and I'd like to
improve how we both track the rollout for that, but also actually roll
it out. This should also include ensuring the absolutely maximum amount
of the repeated implementation is in the actions repository, and not
duplicated across the various projects.
I think it is also weird that the audits happen on copies of the repositories that are my current working clones, which might not be representative of the actual state of the commited code. Perhaps the audit jobs themselves should be running on a CI worker with a fresh clone?
Analysis and recommendations¶
Architecture: modular audit items¶
The single PROJECT-CONSISTENCY-AUDITS.md file currently defines 12
audit criteria in ~24KB of prose. The proposed move to a directory of
individual audit items is sound and directly addresses the parallelism
bottleneck. I'd suggest the following structure:
development/
audits/
README.md # overview, how to add a new audit
llm-tooling.md # one file per audit criterion
release-process.md
ci-review-automation.md
renovate.md
...
Each audit file should have a consistent structure:
# Audit: <name>
## What we check
<concise description of the audit criterion>
## Template
Template: `templates/<name>/`
See: `templates/<name>/README.md`
## Projects
| Project | Status | Issue |
|---------|--------|-------|
| shakenfist | compliant | - |
| imago | compliant | - |
| occystrap | non-compliant | #42 |
This gives us three concrete benefits:
-
Parallelism -- an agent per audit file, each checking all projects against one criterion. This is the natural grain for parallel work because each criterion has its own template and its own set of files to check.
-
Discoverability -- each audit file directly links its template directory, solving the problem of models not finding templates until too late.
-
Incremental addition -- adding a new audit criterion is just adding a new file. No merge conflicts with other in-progress audits.
Work tracking: GitHub issues on target projects¶
Issues should live on the target project, not on development. The
reasoning:
-
The person fixing the issue needs to work in that repo. Having the issue in the same repo means it shows up in their
gh issue list, their project board, and their PR cross-references. -
A
consistencylabel on each project is lightweight and lets us aggregate across repos with a GitHub search likeorg:shakenfist label:consistency is:open. -
The
developmentrepo remains the authority on what to audit and how, but the tracking of where we are for each project lives where the work happens.
Each issue should follow a template:
Title: Consistency: <audit name>
Labels: consistency
Body:
This project is not yet compliant with the <audit name>
consistency audit.
Audit spec: development/audits/<name>.md
Template: development/templates/<name>/README.md
Steps to implement:
<copied from the template README>
This directly links the issue to both the spec and the template, so any agent or human picking up the issue has everything they need.
Consolidating review / fix / retest¶
The current three-workflow dance (pr-re-review, pr-address-comments,
pr-retest) requires human intervention between steps. The observation
that "the reviewer should also propose fixes" is correct -- when a
review finds issues, the same context that identified the problem is
best positioned to propose a fix.
This should remain a separate workflow rather than being folded into the existing review or test workflows, but it should have two trigger modes:
-
Automatic -- triggered when a PR's functional tests pass in CI, provided the PR has not already received an automated review. This is the primary mode: the review happens at the natural point where we know the code works and is ready for feedback, without any human having to remember to invoke it.
-
Manual -- triggered via a bot command (
@shakenfist-bot please review and fix) for cases where a human wants to re-run the review after pushing changes, or where the automatic trigger didn't fire for some reason.
The "has this PR already been reviewed" gate is important to avoid noise. The simplest implementation is to check for the presence of a review comment from the bot -- if one exists, skip the automatic trigger and require the manual command instead.
The combined workflow would then:
- Run the Claude review (existing
review-pr-with-claudeaction). - If the review produces actionable findings with
action: fix, immediately apply fixes in a follow-up commit on the same PR. - Re-run tests after the fix commit to confirm the fixes don't break anything.
The key constraint is security -- the fix step needs write access to
the PR branch, which means the two-checkout security model from
pr-address-comments.yml must be preserved. The combined workflow
should still use the untrusted checkout for reading PR code and the
trusted checkout for the tools that write back.
Since the automatic trigger only fires once (gated by "no prior review") and manual re-runs require explicit human action, the risk profile is the same as today -- we just remove the manual step between review and fix for the first pass.
This combined action should live in shakenfist/actions as a
reusable workflow so that each project only needs a thin trigger
workflow.
Running audits on CI rather than local clones¶
Running audits on local working copies is problematic because:
- Uncommitted changes may mask or create false audit findings.
- The audit results aren't reproducible by others.
- There's no audit trail of when audits ran and what they found.
A scheduled GitHub Actions workflow in development that clones each
target repo fresh and runs the per-criterion checks would solve this.
The workflow could:
- Clone each project repo at HEAD of its default branch.
- Run each audit criterion check (one job per criterion per project, maximising parallelism).
- For failures: create or update a GitHub issue on the target project using the template above.
- For passes: close the corresponding issue if one exists.
This gives us automated drift detection -- if a project regresses on a criterion it previously passed, an issue gets reopened automatically.
Maximising reuse in actions/¶
The current shared actions (pr-bot-trigger, review-pr-with-claude,
export-repo-config) are the right pattern. The combined review+fix
workflow should follow the same approach. Additionally, audit check
scripts themselves could live in actions/ as composite actions:
# In each project's .github/workflows/consistency-audit.yml
jobs:
audit:
uses: shakenfist/actions/.github/workflows/consistency-audit.yml@main
with:
project: ${{ github.repository }}
This means adding a new audit criterion requires:
- Adding the check logic to
actions/. - Adding the audit spec file to
development/audits/. - No changes to individual project repos.
Migration path¶
Rather than a big-bang migration, I'd suggest:
-
Create the
audits/directory -- extract each criterion fromPROJECT-CONSISTENCY-AUDITS.mdinto its own file with the structure above. Keep the original file as a read-only reference until migration is complete. -
Add the
consistencylabel to all project repos and create issues for known non-compliant items using the issue template. -
Build the CI audit workflow in
developmentthat checks one criterion (start with the simplest, like "has AGENTS.md") across all projects. Iterate until the pattern is solid. -
Build the combined review+fix action in
shakenfist/actionsand pilot it on one project before rolling out. -
Retire
PLAN-consistency.mdonce all tracking has moved to GitHub issues.
Open questions¶
-
Should the CI audit workflow run daily, weekly, or on-push to
development? Daily seems right for drift detection without being noisy. -
Should audit issues be auto-assigned to anyone, or left unassigned for triage?
-
For the combined review+fix workflow, should fixes be committed directly to the PR branch or proposed as review suggestions? Direct commits are simpler but suggestions give the author more control.
-
Do we want a dashboard (e.g. a generated README table or GitHub project board) that shows compliance status across all repos at a glance?
Execution¶
Phase 1: Modular audit specs -- DONE¶
- ~~Create
audits/directory with README explaining the structure.~~ - ~~Extract each of the 12 criteria from
PROJECT-CONSISTENCY-AUDITS.mdinto individual files.~~ 13 audit files created. - ~~Ensure each file links to its template and lists per-project status.~~
- ~~Add
consistencylabel to all project repos.~~ Added to 10 repos (ryll has no GitHub repo). - ~~Create GitHub issues for all known non-compliant items.~~ 35 issues created across 10 repos.
Phase 2: CI-based audit runner -- DONE¶
- ~~Write audit check scripts (shell or Python) for each
criterion.~~
scripts/audit-check.pychecks 11 criteria (4 subjective criteria skipped: security-sanitization, console-logging, python-version, test-coverage). - ~~Create a scheduled workflow in
developmentthat runs all checks across all projects.~~.github/workflows/consistency-audit.ymlruns daily at 06:00 UTC with a matrix of 10 repos. - ~~Add issue creation/closure automation for audit results.~~
scripts/audit-manage-issues.pycreates issues for failures and closes them when checks pass, using exact title matching against existing manually-created issues. - Verify drift detection works after first CI run. Requires
AUDIT_TOKENsecret to be configured on the development repo with cross-repo issue permissions.
Phase 3: Combined review+fix workflow¶
- Design the combined workflow preserving the two-checkout security model.
- Implement the "already reviewed" gate (check for existing bot review comment).
- Add the automatic trigger on successful functional test completion, gated by the above check.
- Add the manual trigger via
@shakenfist-bot please review and fix. - Implement as a reusable workflow in
shakenfist/actions. - Pilot on
shakenfist(the reference project). - Roll out to remaining projects.
Phase 4: Cleanup¶
- Retire
PLAN-consistency.mdonce issue tracking is live. - Archive
PROJECT-CONSISTENCY-AUDITS.mdwith a pointer to the newaudits/directory. - Update all documentation in
docs/.
Administration and logistics¶
Success criteria¶
We will know when this plan has been successfully implemented because the following statements will be true:
-
The project consistency audit is implemented in a scalable way with work items tracked in a more concrete manner.
-
Code and configuration reuse is maximised across the repositories.
-
It is easy to add new audit items and have them rolled out without a complete re-audit of all items.
-
It is easy to semi-regularly re-audit all items looking for implementation drift.
-
Documentation in
docs/has been updated to describe these new features and how we use them.
Future work¶
We should list obvious extensions, known issues, unrelated bugs we encountered, and anything else we should one day do but have chosen to defer to here so that we don't forget them.
...
Bugs fixed during this work¶
This section should list and bugs we encounter during development that we fixed.
Back brief¶
Before executing any step of this plan, please back brief the operator as to your understanding of the plan and how the work you intend to do aligns with that plan.