Development Plans¶
This section contains forward-looking roadmaps for Shaken Fist development. These documents describe planned features and architectural directions.
Forward-Looking Statements
Plans describe intended future work and may change based on implementation experience, community feedback, or shifting priorities. Check the status table below to see what has been implemented.
Plan sequencing¶
The set of incomplete plans has grown to the point where the order they land in matters. The intended sequencing is:
- Network operations facade — complete. Landed via the
network-facadebranch. - Retire etcd — (absorbed into BYO MariaDB and sf-database tier phase 0; the standalone
PLAN-remove-etcd.mdhas been removed.) Inspection confirmed the etcd data drain is complete (theDATA_MIGRATIONSdict is empty), but the supporting machinery —shakenfist/etcd.py,etcd3gw, the etcd proto stubs, the drain test files,is_etcd_master, the migration-erasf-ctlaliases — is still in tree. Phase 0 of the BYO-MariaDB plan deletes it as a single sweep. Theetcd_masteransible group name rename stays withPLAN-remove-primaryphase 7 as a deploy-scope concern. - Health checks, readiness, and graceful drain — complete. A precondition for the BYO load-balancer story in remove-primary being operationally honest. Delivered sf-api
/livez//readyz//healthzwith SIGTERM drain, dependency-awaregrpc.health.v1on sf-database, systemdWATCHDOGliveness on the worker/elected daemons (which also closes the cluster-lock proof-of-life gap), and operator LB/upgrade docs. Landed on thehealth-checksbranch. - Remove the primary node — the BYO-infrastructure scope reduction. Phase 7 finishes the deployer-level
etcd_master→database_noderename, by which point the drain code itself is long gone. Naturally followed by a wipe-and-redeploy of Mikal's production cluster against the new shape. - BYO MariaDB and
sf-databaseas a tier — lifted out of remove-primary because it grew into its own master plan. Removes MariaDB-server install from the deployer entirely, reshapessf-databaseinto a deployer-chosen tier of equal stateless instances reached via client-side gRPC load balancing (not leader election), and carves schema/migration execution out of daemon startup into an operator-runsf-ctl ensure-mariadb-schemacommand. Can land in parallel with remove-primary's remaining phases; its phase 1 also performs the scope-shift edit to remove-primary itself.
The remaining incomplete plans — Embrace TLS, Sticky blob transfers, Replace exec'd network commands with netlink, Atomic scheduling via reservations, the connected Generic allocator / Network service ports / Network carrier model triple, and the not-yet-drafted OpenTelemetry instrumentation thread — are intentionally not ordered relative to each other here. They each have specific dependencies on either remove-primary having established the BYO shape (the operator-provides-PKI surface for TLS, the streaming-proxy baseline for sticky transfers, the sf-database election pattern for the others) or network-facade having landed (the netlink plan, whose privilege-separation phases need network-facade's single-mutator property), but among themselves the order is a triage decision best made when remove-primary is close to landing rather than now. The scheduler-reservations plan is independent of the BYO shape but benefits from the OpenTelemetry thread landing first so that phase 0's design choices can be informed by real load and contention numbers.
The generic-allocator / network-service-ports / network-carrier-model triple is internally ordered. Generic-allocator is the foundational refactor (replaces five ad-hoc allocators with one primitive and is independently shippable). Network-service-ports builds on the allocator to expose per-network DNAT'd ports for managed services (web consoles, transfer agents, managed VPN endpoints). Network-carrier-model layers a smeared lease-based per-network carrier role with VIP advertisement on top, removing the network-node singleton; it depends on both prior plans and is the largest of the three. The triple supersedes the "network node failover" thread that was previously a not-yet-drafted line item.
The blob-storage and SQL-pushdown roadmaps and the network-facade plan run on their own cadence and are not part of this sequencing.
Plan Status¶
| Plan | Phase | Status | Description |
|---|---|---|---|
| Blob Storage Roadmap | Phase 1: Hash Tracking | Complete | Move hash storage to MariaDB |
| Blob Storage Roadmap | Phase 2: Lazy Dedup | Future | Composite blobs and deduplication |
| Blob Storage Roadmap | Phase 3: Chunking | Future | Content-defined chunking |
| API Query Batching | Phase 1: Batch Infrastructure | Planning | Add batch query functions |
| API Query Batching | Phase 2: Prefetch Pattern | Future | Modify API to prefetch related data |
| API Query Batching | Phase 3: Generic Framework | Future | Declarative prefetch requirements |
| SQL-pushdown Filtering | Phase 1: Query Infrastructure | Complete | Typed criteria + generic find_objects primitive |
| SQL-pushdown Filtering | Phase 2: Artifact Pushdown | Complete | Push state/namespace/name for Artifact lookups to SQL |
| SQL-pushdown Filtering | Phase 3: Instance and Network Pushdown | Complete | Mirror Artifact pushdown for Instance and Network |
| SQL-pushdown Filtering | Phase 4: Iterator Rework | Complete | Port iterators to single pushed-down query |
| SQL-pushdown Filtering | Phase 5: Ad-hoc Bulk Scan Cleanup | Complete | Eliminate remaining full-table scans on filter paths |
| SQL-pushdown Filtering | Phase 6: Tests and Documentation | Complete | Coverage and docs updates |
| SQL-pushdown Filtering | Phase 7: Denormalised Child-UUID List Removal | Complete | Replace cached UUID lists on attributes tables with SQL queries |
| Replace last_cluster_operation | Phase 1: has_pending_cluster_operation query |
Complete | New query API and tests |
| Replace last_cluster_operation | Phase 2: Switch gating callers | Complete | Move is_okay() and siblings off the single-pointer read |
| Replace last_cluster_operation | Phase 3: Auto-target tracking | Complete | *_create_and_enqueue writes target rows automatically |
| Replace last_cluster_operation | Phase 4: Remove explicit setters | Complete | Drop redundant set_last_cluster_operation callers |
| Replace last_cluster_operation | Phase 5: Documentation and final audit | Complete | Update docs, verify CI |
| Fix cluster_operation_targets UNIQUE constraint | Schema fix | Complete | Replace column-level UNIQUE(operation_uuid) with composite UNIQUE(operation_uuid, target_object_type, target_uuid) so multi-target ops record all their target rows |
| Network operations facade | Master plan | Complete | Split Network into a queue-enqueuing facade and a single-mutator worker so local daemons can no longer bypass net-worker's serialisation |
| Queue performance and coalescing | Steps 1-6 | In Progress | Unified batched dequeue, coalescible-task metadata, worker- and enqueue-side dedup of redundant cluster operations. Step 7 (measure with CI data, decide on fairness) outstanding. |
| Recurring cluster operations | Master plan | Stub | Cron-like framework for recurring cluster operations; absorbs scheduled_tasks.py and daemons/network/maintain.py; adds user-facing recurring tasks (e.g. snapshot every 24 hours) |
| Health checks | Phase 0: Research and decisions | Complete | Routing principle (LB probes only sf-api) collapsed OQ1/2/4/9; decided readiness cache, drain-grace/timeout reconciliation, WATCHDOG liveness + lock proof-of-life, auth, daemon classification |
| Health checks | Phase 1: sf-api endpoints and drain | Complete | /livez, /readyz, /healthz on sf-api; per-worker readiness checker; SIGTERM-driven drain (API_DRAIN_GRACE, reconciled timeouts) |
| Health checks | Phase 2: gRPC health on sf-database | Complete | grpc.health.v1.Health Check now tracks live MariaDB reachability via the daemon's ~10s loop; schema currency stays a startup refuse-to-start precondition; no Watch, no client healthCheckConfig |
| Health checks | Phase 3: WATCHDOG liveness wiring | Complete | Wired systemd WATCHDOG into the eight non-trivial daemons (WatchdogSec=60s, pet in idle() + cluster/cleaner heavy iterators); closes lock proof-of-life via watchdog-kill → lease-expiry failover |
| Health checks | Phase 4: Operator documentation | Complete | load_balancing.md HAProxy/nginx-FOSS/ALB probe configs; rolling-upgrade-with-drain in upgrades.md; live ci_drain_check.sh (in the actions repo) wired into functional-tests |
| Remove the primary node | Phase 1: Remove monitoring | Not started | Drop rsyslog aggregation from the deployer (Grafana and the primary-node Prometheus server already removed as warmup) |
| Remove the primary node | Phase 2: Bootstrap CLI | Not started | Idempotent sf-ctl bootstrap-cluster + bootstrap_operations table |
| Remove the primary node | Phase 3: Remove LB | Complete (pending CI) | Realised by Remove the Apache load balancer |
| Remove the primary node | Phases 4-5: (moved to BYO MariaDB) | (moved) | MariaDB BYO and sf-database-as-tier lifted into PLAN-byo-mariadb.md |
| Remove the primary node | Phase 6: Galaxy role | Not started | Repackage deployer as a per-node ansible-galaxy-style role |
| Remove the primary node | Phase 7: Rename and cleanup | Not started | etcd_master → database_node; final dead-code sweep |
| Remove the Apache load balancer | Phase 1: Document operator-provided LB | Complete | Example apache2 + nginx configs and the localhost:13000 single-node escape hatch (realises remove-primary phase 3) |
| Remove the Apache load balancer | Phase 2: Remove Apache from the deployer | Complete (pending CI) | Delete apache2.yml + apache-site-primary.conf; repoint single-node api_url to :13000 |
| BYO MariaDB and sf-database tier | Phase 0: Retire etcd machinery | Complete | Supersedes PLAN-remove-etcd; deleted etcd.py, etcd3gw, etcd protos, DATA_MIGRATIONS framework, drain tests, dead sf-ctl helpers and show/set-etcd-config aliases, stale migrate-* comments, .claude/skills/migrate-etcd-to-mariadb.md (is_etcd_master deferred to PLAN-remove-primary phase 7) |
| BYO MariaDB and sf-database tier | Phase 1: Statelessness and scope shift | Complete | Schema-versions lock; stop sf-database calling ensure_schema() at startup; daemon startup-version check; MariaDB compat check (version/engine/charset) on ensure-mariadb-schema and sf-database; lift scope out of remove-primary |
| BYO MariaDB and sf-database tier | Phase 2: Config untangle | Complete | Untangle MARIADB_HOST from "I am the database node"; rename DATABASE_NODE_IP → MARIADB_GATEWAY_HOSTS (plural list); bind sf-database on NODE_MESH_IP instead of DATABASE_NODE_IP |
| BYO MariaDB and sf-database tier | Phase 3: gRPC tier | Complete | Multi-endpoint client-side gRPC LB; minimal grpc.health.v1.Health on sf-database; channel factory at shakenfist/util/grpc_channel.py |
| BYO MariaDB and sf-database tier | Phase 4: Deploy BYO | Complete | getsf prompts for connection details; roles/mariadb/ deleted; tools/bootstrap-mariadb.sql ships; deploy.py stops generating a password; tuning .cnf moves to examples/; SHAKENFIST_MARIADB_HOST=localhost escape hatch dissolved |
| BYO MariaDB and sf-database tier | Phase 5: CI workflow step | Complete | New tools/ci-install-mariadb.sh helper + five workflow sites in functional-tests.yml/scheduled-tests.yml install MariaDB and pass GETSF_MARIADB_* env vars to getsf-wrapper |
| BYO MariaDB and sf-database tier | Phase 6: CI tier coverage | Complete | Multi-instance sf-database startup; MARIADB_GATEWAY_HOSTS rendered as a list; bind-all drop-in; ci-topology-slim-tier in shakenfist/actions; merge-queue matrix entry; functional LB-fanout test asserting each etcd_master saw at least 5% of traffic |
| BYO MariaDB and sf-database tier | Phase 7: Documentation | Complete | docs/operator_guide/database.md restructured to lead with BYO and stripped of historical etcd content; tier-model note added to ARCHITECTURE.md; deleted-skill bullet dropped from README.md; "Bring your own MariaDB" section appended to docs/release_notes/v07-v08.md |
| Embrace TLS | Phase 0: Research and decisions | Not started | Resolve open TLS questions into a decisions document |
| Embrace TLS | Phase 1: Cert reload | Not started | Graceful TLS material reload across daemons |
| Embrace TLS | Phase 2: sf-database mTLS | Not started | Canary mTLS path for the highest-traffic gRPC channel |
| Embrace TLS | Phase 3: Other gRPC mTLS | Not started | Extend mTLS to the remaining inter-daemon channels |
| Embrace TLS | Phase 4: MariaDB TLS | Not started | TLS on the SF-to-MariaDB connection |
| Embrace TLS | Phase 5: sf-api TLS | Not started | Optional native TLS on sf-api; document operator-LB story |
| Embrace TLS | Phase 6: Expiry monitoring | Not started | Cert expiry warnings as events + prometheus metrics |
| Embrace TLS | Phase 7: Dev CA | Not started | Repurpose pki_internal_ca as dev/test convenience only |
| Sticky blob transfers | Phase 0: Research and decisions | Not started | Resolve cookie format, LB coverage, and placement-interaction questions |
| Sticky blob transfers | Phase 1: Server-side cookies | Not started | sf-api emits and honours server-set sticky cookies |
| Sticky blob transfers | Phase 2: LB documentation | Not started | Document HAProxy / Envoy / cloud-LB / nginx configurations |
| Sticky blob transfers | Phase 3: Client verification | Not started | Verify SF Python client cookie handling end-to-end |
| Sticky blob transfers | Phase 4: Failover behaviour | Not started | Define recovery path when the sticky backend dies mid-session |
| Replace exec'd network commands with netlink | Phase 0: Research and decisions | Not started | Pick pyroute2.nftables vs python-nftables, handle sysctl / arping corners, scope the privexec split, pick auth model |
| Replace exec'd network commands with netlink | Phase 1: rtnetlink for link / addr / route / neigh | Not started | Port ip exec sites to pyroute2.IPRoute |
| Replace exec'd network commands with netlink | Phase 2: Bridge attributes via IFLA_BR_* |
Not started | Replace brctl with rtnetlink bridge link attributes |
| Replace exec'd network commands with netlink | Phase 3: nftables rules via netlink | Not started | Port iptables rules to nftables in atomic transactions |
| Replace exec'd network commands with netlink | Phase 4: Stand up sf-net-privexec |
Not started | New typed-API daemon holding CAP_NET_ADMIN, with net-worker as its only client |
| Replace exec'd network commands with netlink | Phase 5: Shrink sf-privexec |
Not started | Drop CAP_NET_ADMIN and network RPCs from the existing privexec daemon |
| Replace exec'd network commands with netlink | Phase 6: Cleanup | Not started | Close out remaining sf-net direct-exec sites and the sysctl / arping corners |
| Atomic scheduling via reservations | Phase 0: Research and decisions | Not started | Resolve conditional-INSERT vs SELECT-FOR-UPDATE, reservation row schema, lifecycle states, affinity model rework, batch-create semantics, generic-vs-specific |
| Atomic scheduling via reservations | Phase 1: node_reservations schema |
Not started | Schema and migration for the reservation table |
| Atomic scheduling via reservations | Phase 2: Conditional-INSERT primitive | Not started | The scheduling primitive that filters and claims atomically |
| Atomic scheduling via reservations | Phase 3: Reservation lifecycle | Not started | Consume on building, explicit release on failure, leased TTL reaper |
| Atomic scheduling via reservations | Phase 4: Migrate callers | Not started | Port the three in-process Scheduler() call sites |
| Atomic scheduling via reservations | Phase 5: Batch-create API | Not started | All-or-nothing multi-instance create primitive |
| Atomic scheduling via reservations | Phase 6: Affinity model rework | Not started | Implement the affinity decision from phase 0 |
| Atomic scheduling via reservations | Phase 7: Diagnostic-mode rejection logging | Not started | Restore per-rejection detail on failed schedules without paying the cost on every success |
| Atomic scheduling via reservations | Phase 8: Documentation | Not started | Operator guide for the new model and migration notes |
| Remove the eventlog service | Phase 1: Schema, accessors, RPC, row-count gauge | Complete | events / event_objects tables, RecordEventBatch on sf-database, database_events_rows gauge |
| Remove the eventlog service | Phase 2: Write cut-over and metrics | Complete | Swap drainer's gRPC target to sf-database; promote event_uuid / request_id; wire spool-depth, drop, insert metrics |
| Remove the eventlog service | Phase 3: Prune in cluster daemon | Complete | Move per-event-type prune sweep into the cluster maintainer with multi-object semantics |
| Remove the eventlog service | Phase 4: REST API direct-read | Complete | Event-list endpoints call GetObjectEvents on sf-database; no sqlite locality |
| Remove the eventlog service | Phase 5: Delete the daemon | Complete | Remove sf-eventlog, gRPC protos, systemd unit, config, event_dlq, and on-disk sqlite chunks |
| Remove the eventlog service | Phase 6: Documentation | Complete | Operator guide for new eventlog, history-loss called out in release notes, ARCHITECTURE/README/AGENTS |
| Generic allocator | Phase 0: Research and decisions | Not started | Pick allocation strategy, per-pool policy shape, leased-vs-permanent semantics, migration plan |
| Generic allocator | Phase 1: Schema and primitive | Not started | resource_pool_allocations table and conditional-INSERT allocator |
| Generic allocator | Phase 2: Port VXLAN allocator | Not started | First migration; sets the template |
| Generic allocator | Phase 3: Port console / VDI ports | Not started | Drop the local socket.bind() race-check |
| Generic allocator | Phase 4: Port vsock CID allocator | Not started | Drop the global cluster lock |
| Generic allocator | Phase 5: Port MAC allocator | Not started | Fix today's probabilistic-only correctness |
| Generic allocator | Phase 6: Documentation | Not started | Audit-log surface and developer docs |
| Network service ports | Phase 0: Research and decisions | Not started | Pick port range, TLS-on-shared-IP, token model, two-stage cleanup ordering, carrier-coupling contract |
| Network service ports | Phase 1: Pool registration | Not started | Register service-port pool with the generic allocator |
| Network service ports | Phase 2: API and token issuance | Not started | allocate_service_port / release_service_port |
| Network service ports | Phase 3: Carrier-side DNAT | Not started | Network daemon programs DNAT rules per allocation |
| Network service ports | Phase 4: Reaper and reconciler | Not started | Drift detection and repair across DB and iptables state |
| Network service ports | Phase 5: Validation surface | Not started | Smoke-test caller or first real caller |
| Network service ports | Phase 6: Documentation | Not started | Operator and developer docs including threat-surface change |
| Network carrier model | Phase 0: Research and decisions | Not started | Resolve pool config, lease TTL, advertisement modes, DHCP state, VLAN-trunk forward compatibility |
| Network carrier model | Phase 1: Carrier pool config | Not started | Eligible-carrier set and node-capability declaration |
| Network carrier model | Phase 2: Per-network carrier lease | Not started | Lease primitive backed by cluster_locks and the generic allocator |
| Network carrier model | Phase 3: Renderer process | Not started | Carrier-side process that materialises leased state to kernel |
| Network carrier model | Phase 4: SNAT and floating IP | Not started | Render SNAT and egress IP via the carrier |
| Network carrier model | Phase 5: Service ports via renderer | Not started | Carrier-side hookup for PLAN-network-service-ports |
| Network carrier model | Phase 6: DHCP persistence | Not started | DHCP leases survive carrier change |
| Network carrier model | Phase 7: DNS via renderer | Not started | DNS records rendered from data |
| Network carrier model | Phase 8: BGP advertisement | Not started | Embedded or operator-external BGP speaker |
| Network carrier model | Phase 9: L2 / GARP advertisement | Not started | Alternative for non-routed deployments |
| Network carrier model | Phase 10: Migration | Not started | Cutover from singleton network node to smeared carriers |
| Network carrier model | Phase 11: Operator documentation | Not started | VIP failover modes and pool sizing |
| OIDC authentication | Master plan | Stub | OIDC as an authentication option for human users; existing namespace keys re-framed as service-account tokens for automation |
| Artifact UX rework | Master plan | Stub | Rework the upload/blob/artifact/label/snapshot surface to remove usability sharp edges (ambiguous name resolution, blob-UUID juggling, underpowered labels, instance-costumed snapshots); adopts #3271, #1634, #1167, #592, #877, #1386, #833, #422 |
Status Definitions¶
- Stub: Framing recorded for future detailed planning; not yet ready to execute
- Not started: Plan exists, work not yet begun
- Planning: Design complete, implementation not yet started
- In Progress: Currently being implemented
- Complete: Implemented and released
- Future: Planned but not yet designed in detail
- Blocked on preconditions: Plan exists but explicitly waits on another plan or external event before work can begin