Database Architecture¶

Shaken Fist uses MariaDB as its sole data store. This page describes the bring-your-own MariaDB setup workflow, the compatibility requirements, the configuration keys that control how cluster nodes reach the database tier, the administrative commands, the table inventory, and the schema system.

Bring-your-own MariaDB setup¶

Shaken Fist does not bundle or install MariaDB. The operator provisions the database server before deploying the cluster.

Provisioning checklist¶

Provision a MariaDB server. Any host reachable from every SF node works — it need not be an SF node itself. The server must meet the compatibility requirements below (MariaDB 10.6.0+, InnoDB, utf8mb4).
Apply the bootstrap snippet. The repository ships tools/bootstrap-mariadb.sql, which creates the shakenfist database, the shakenfist user, and the required grants. Replace __REPLACE_ME__ with the password you want:
```
sed 's/__REPLACE_ME__/your-password/' tools/bootstrap-mariadb.sql | mysql -u root
```
The snippet is idempotent and safe to re-run.
(Optional) install the recommended tuning. examples/mariadb-tuning.cnf ships a set of starting-point InnoDB and connection-pool settings tuned for a small-to-medium SF cluster. Copy it and restart MariaDB:
```
sudo cp examples/mariadb-tuning.cnf /etc/mysql/mariadb.conf.d/
sudo systemctl restart mariadb
```
The values are starting points, not prescriptions — adjust them to match your hardware and workload.
Deploy Shaken Fist. Set the mariadb_host, mariadb_port, mariadb_user, mariadb_password and mariadb_database variables in your deployment's group_vars/all.yml to match what you provisioned, then run the deploy playbook (see Installation). Only database-tier nodes render these values into /etc/sf/config.
Schema initialisation. The example playbooks run sf-ctl ensure-mariadb-schema themselves (once, delegated to a database-tier node) before any node registers, so no manual step is needed on first deploy. You can also run it manually at any time from a node that has MARIADB_HOST configured (see Administrative Commands).

Single-box example¶

For a single-machine deployment, the complete workflow is:

sudo apt install mariadb-server
sed 's/__REPLACE_ME__/mypassword/' tools/bootstrap-mariadb.sql | sudo mysql -u root
sudo cp examples/mariadb-tuning.cnf /etc/mysql/mariadb.conf.d/   # optional
sudo systemctl restart mariadb
ansible-playbook -i examples/single-node/inventory.yaml examples/single-node/site.yml

(having first set the mariadb_* variables in examples/single-node/group_vars/all.yml to match — see Installation).

MariaDB compatibility requirements¶

Before sf-database starts, and before sf-ctl ensure-mariadb-schema applies any schema work, the server is checked against these requirements:

MariaDB, not MySQL. The VERSION() string must contain MariaDB. Shaken Fist uses MariaDB-specific column types (such as INET4) that are not available in MySQL.
Version 10.6.0 or later. This matches the version shipped with Ubuntu 22.04 LTS and is well above the 10.2 JSON-features floor.
Default storage engine: InnoDB. Shaken Fist relies on row-level locking and transactional semantics provided by InnoDB.
Default character set: utf8mb4. Required for full Unicode support, including supplementary characters.
Default collation: any utf8mb4_* collation. The exact collation within the utf8mb4 family is not mandated.

sf-ctl ensure-mariadb-schema runs these checks before touching any schema objects and refuses to proceed if the server does not meet them, printing a multi-line error that lists every failing check. The same checks run at sf-database startup; the daemon refuses to start on an incompatible server.

After an SF version bump that includes schema changes, you must run sf-ctl ensure-mariadb-schema before starting sf-database. If you skip this step, the daemon will refuse to start with a schema-version mismatch error that names the command to run.

Why MariaDB and what it stores¶

MariaDB is the sole data store for Shaken Fist. All object state, IPAM reservations, cluster operations, work queues, locks, metrics, and cluster configuration live there. The single-store shape gives the system efficient indexed queries by object type and state value, atomic IP-address reservation via database uniqueness constraints, and transactional operation enqueue (the cluster-operation header, the state row, and the queue row are written atomically in a single transaction).

Only the database service daemon (sf-database) has direct access to MariaDB. All other daemons reach MariaDB through sf-database's gRPC interface. This keeps connection management in one place, gives consistent Prometheus metrics for every database operation, and makes the tier independently scalable. The shakenfist.mariadb module dispatches automatically: if MARIADB_HOST is set (the sf-database daemon and the schema tool) it uses direct SQLAlchemy access; otherwise it goes over gRPC to the sf-database tier listed in MARIADB_GATEWAY_HOSTS.

The driver layer uses the mariadb:// SQLAlchemy dialect so MariaDB-specific column types such as INET4 (4-byte IPv4 storage with native comparison and indexing) are available. The underlying client library (mysqlclient) remains the same because MariaDB maintains MySQL protocol compatibility.

SQL Filter Pushdown¶

Object iteration uses a single indexed SQL query per call rather than materialising all rows and filtering in Python.

The filter criteria shape is ObjectFilterCriteria in shakenfist/schema/object_filter.py:

from shakenfist.schema.object_filter import ObjectFilterCriteria

criteria = ObjectFilterCriteria(
    states=['created'],          # None means no state filter; [] is a no-op at the SQL layer
    namespace='tenant-a',        # None means no namespace filter
    name=None,                   # None means no name filter
    network_uuid=None,           # FK filter — see NetworkInterface special case below
    instance_uuid=None,          # FK filter — see NetworkInterface special case below
)

None on any field is "do not filter on this field". An empty list on states behaves the same as None at the MariaDB layer, but callers may pass [] to express "no matching states" explicitly for future use.

The find_* primitives. Four public functions in shakenfist.mariadb follow the naming convention find_<type>: find_artifacts, find_instances, find_networks, and find_network_interfaces. Each one JOINs the per-type static-values table to object_states on uuid and object_type, then applies whichever of the three optional WHERE clauses the criteria specifies. The JOIN is always covered by the composite index idx_object_states_type_state on (object_type, state_value). The per-type name and namespace columns each have their own single-column index on the type table.

When to use which entry point.

Name lookups from REST handlers should call the per-type from_db_by_ref(name, namespace=ns) class method (e.g. Artifact.from_db_by_ref(ref, namespace=ns)). This override was added in phases 2 and 3 and pushes the name equality predicate to SQL.

Bulk iteration scoped by state and/or namespace should use the iterator constructor directly: Artifacts(namespace=ns, prefilter='active'), Instances(namespace=ns), Networks(namespace=ns). The iterator's _find override builds an ObjectFilterCriteria from the constructor arguments and delegates to the appropriate find_* primitive, so both state and namespace reach SQL without a second round-trip.

Arbitrary-predicate filtering — logic that has no simple SQL equivalent, such as namespace_or_shared_filter which must JOIN the artifact_attributes table to check the shared flag — should pass a callable to the filters= argument of the iterator, or call .filter([predicate]) on the class. These predicates execute in Python after the indexed SQL scan returns its rows.

NetworkInterface special case. The network_interfaces table has no namespace or name column. find_network_interfaces therefore strips both fields from the criteria before building the query; they are silently ignored. State pushdown still works. The two FK filter fields network_uuid and instance_uuid are honoured — they map to indexed columns on the network_interfaces table, and they are how Network.networkinterfaces and Instance.interfaces resolve their per-parent NI list (phase 7 of the SQL-pushdown plan: those properties return hydrated NetworkInterface objects rather than the cached UUID list that used to live on the attribute table). The other find_* helpers leave the FK fields at their default of None because the underlying tables have no matching column.

See the Future-work entry in docs/plans/PLAN-sql-pushdown-filtering.md ("NetworkInterface namespace column") for the deferred discussion of whether to add the column or use a JOIN-based approach once a concrete caller exists.

Example.

from shakenfist import mariadb
from shakenfist.schema.object_filter import ObjectFilterCriteria

criteria = ObjectFilterCriteria(states=['created'], namespace='tenant-a')
for a in mariadb.find_artifacts(criteria):
    ...

MARIADB_HOST vs MARIADB_GATEWAY_HOSTS¶

These two config keys are orthogonal and serve different purposes. Understanding the distinction helps when troubleshooting or planning a deployment.

MARIADB_HOST is set only on nodes that have direct access to the MariaDB server. In practice this means nodes running sf-database, and any node where an operator runs sf-ctl ensure-mariadb-schema. The presence of MARIADB_HOST tells the shakenfist.mariadb module to bypass the gRPC layer and talk to MariaDB directly using SQLAlchemy. Ordinary cluster nodes (running sf-api, sf-queues, etc.) do not have MARIADB_HOST set and should never need it.

MARIADB_GATEWAY_HOSTS is set on every cluster node. It is the list of sf-database gRPC endpoints that non-database daemons connect to. For a single-instance deployment this list has one entry; for higher availability, list multiple sf-database endpoints and the gRPC client library round-robins requests across them.

A node running sf-database has both keys set: MARIADB_HOST for its own direct MariaDB access, and MARIADB_GATEWAY_HOSTS so that any client library running on the same node can still reach the database tier over gRPC (for example, when sf-api and sf-database are co-located).

In summary:

Who uses it	Config key	What it does
`sf-database`, schema tool	`MARIADB_HOST`	Direct SQLAlchemy → MariaDB
All other daemons	`MARIADB_GATEWAY_HOSTS`	gRPC → `sf-database` tier
`sf-database` itself (gRPC listener)	`MARIADB_GATEWAY_PORT`	Port each `sf-database` binds on (default 13005)
Prometheus scraper	`MARIADB_GATEWAY_METRICS_PORT`	Metrics port on each `sf-database` instance (default 13006)

Multi-instance deployments: More than one sf-database instance can run against the same MariaDB server. List every instance's mesh IP in MARIADB_GATEWAY_HOSTS, comma-separated — for example, MARIADB_GATEWAY_HOSTS="10.0.0.20,10.0.0.21,10.0.0.22". Every sf-database instance must be able to reach the MariaDB server; in BYO deployments this typically means the operator's MariaDB is bound to a routable interface rather than 127.0.0.1. This multi-instance shape is exercised by CI on every merge-queue run, so operators can rely on it as a supported production configuration.

Load balancing: When MARIADB_GATEWAY_HOSTS is a multi-element list, every SF daemon connects to the tier with a gRPC channel that round-robins requests across the listed endpoints. Dead endpoints are skipped automatically: the round-robin policy avoids subchannels whose TCP connection is down, and aggressive client keepalives (a ping every 10 seconds with a 5 second timeout) detect a hung instance within about 15 seconds. There is no external load balancer to configure -- the round-robin behaviour and failure detection are inside the gRPC client library. sf-database also publishes the standard grpc.health.v1.Health protocol against the empty-string service name for external monitoring via unary Check calls. Watch-based client-side health checking (healthCheckConfig) is deliberately not enabled: the synchronous health servicer can deadlock the gRPC server's event thread when Watch streams open and close concurrently.

Static object value cache¶

Each SF process caches the immutable static values of objects (their get_<type>() results — uuid, name, version and other create-time columns) in memory, so repeatedly loading the same node, blob or instance does not hit sf-database every time. Mutable data (object state, metadata, attributes, IPAM, daemon states) is never cached. The cache invalidates itself when this process updates or deletes an object; a change made by another process is picked up when the entry's TTL expires.

Two settings control it, both in seconds:

Setting	Default	Applies to
`OBJECT_CACHE_TTL_IMMUTABLE`	300	instance, network, networkinterface, agentoperation (no post-creation writer)
`OBJECT_CACHE_TTL_MUTABLE`	30	node, blob, artifact, upload, dnsmasq, namespace (rewritten only by an online version upgrade)

Setting a value to 0 disables caching for that tier — a fast rollback to pure read-through that needs only a config change and a restart, no code change. The database_object_cache_hits_total, database_object_cache_misses_total and database_object_cache_evictions_total Prometheus counters (labelled by object_type) report cache behaviour, and a working cache shows up as reduced database_get_<type>_total rates on the sf-database tier.

Attributing database load to callers¶

Alongside the per-operation database_<op>_total counters, sf-database publishes database_requests_total{operation, caller_daemon} — the same request stream, but labelled with the daemon that issued each call. The calling daemon is carried as gRPC metadata (caller-daemon) stamped by every SF client and counted by a server-side interceptor, so the attribution is complete even for daemons (sf-api, sf-net, sf-queues, …) that expose no metrics endpoint of their own. A caller that never identified itself, or a one-shot such as the config bootstrap, shows up as caller_daemon="unknown".

Use it to answer "which daemon drives operation X", for example the hottest callers overall:

topk(15, sum by (caller_daemon, operation) (rate(database_requests_total[5m])))

The existing database_<op>_total counters are unchanged — this metric is additive — so summing database_requests_total by operation should track the matching database_<op>_total rate (the operation label is the PascalCase RPC name, e.g. GetNode, so it reads as the CamelCase form of the counter suffix). The caller-node metadata is also sent but not yet a label; it is reserved for the mTLS peer-identity cross-check.

Monitoring sf-database with grpc-health-probe¶

sf-database reports live MariaDB reachability through the standard grpc.health.v1/Check RPC on the empty-string ("") service name. Operators can probe this with grpc-health-probe:

grpc-health-probe -addr=<sf-database-host>:13005

The status reflects the outcome of the most recent ~10 s background poll: SERVING means sf-database can reach MariaDB; NOT_SERVING means the last poll failed. Schema currency (whether the schema is up to date) is only checked at startup and is not a runtime signal — a running sf-database instance always has an up-to-date schema.

sf-api consumes this signal through its per-worker readiness checker (shakenfist/external_api/health.py). When sf-database reports NOT_SERVING, the checker flips the cached flag after three consecutive failures and sf-api's /readyz endpoint begins returning 503. A load balancer probing /readyz will then drain the sf-api worker before MariaDB connectivity is restored. Recovery is asymmetric: once sf-database recovers and the checker sees a single SERVING response, /readyz returns 200 and the worker is restored to rotation automatically.

Client behaviour during an outage¶

When a daemon cannot reach any sf-database instance, its gRPC calls retry up to three times on UNAVAILABLE / DEADLINE_EXCEEDED and then raise shakenfist.exceptions.DatabaseUnavailable. The gRPC channel is only rebuilt between attempts on DEADLINE_EXCEEDED (the wedged-subchannel signature); on UNAVAILABLE the warm round_robin channel is kept so a surviving gateway can serve the retry while the failed one reconnects in the background. This is deliberately distinct from the "object not found" return values the client library uses for genuinely missing objects, so code never mistakes an outage for a missing object and, for example, cleans up something that still exists.

What an operator sees during an outage:

Daemon work loops log the DatabaseUnavailable errors and retry; they do not exit. The queues daemon explicitly pauses queue processing and waits for its health checks to pass again.
Held cluster locks survive short outages: the lock refresher retries every ~2s and the lease only lapses if the outage outlives it (see the locks documentation). Lock acquisition keeps retrying inside the caller's timeout.
sf-api returns 500s for requests that need the database, and its /readyz endpoint goes 503 (via the sf-database health signal above) so load balancers drain it until the database returns.

Recovery is automatic once an sf-database instance is reachable again; no daemon restarts are required.

Rolling restarts of the database tier¶

The deployer restarts database-tier nodes one at a time (serial: 1), and three mechanisms cooperate so cluster clients ride through the roll without a visible outage:

Graceful drain on stop. When sf-database receives a stop it flips its health status to NOT_SERVING, stops accepting new RPCs, and then lets in-flight RPCs finish for up to DATABASE_DRAIN_GRACE seconds (default 10) before forcing the server down. The grace is a cap, not a fixed delay — shutdown proceeds as soon as the last in-flight call ends. If you raise it, keep it comfortably below the systemd unit's TimeoutStopSec (30s) or systemd will SIGKILL the daemon mid-drain.
A health gate before the next node restarts. After (re)starting sf-database the deploy runs sf-ctl gateway-health (see gateway-health below) and retries until the gateway reports SERVING, so the next node in the roll is never taken down while this one is only half-up (listening on its port but unable to reach MariaDB yet).
A settle pause so clients reconnect. Once the gateway is healthy, the deploy pauses for sf_database_roll_settle_seconds (an Ansible variable, default 10) before moving to the next node. Cluster clients re-establish their round_robin subchannel to the recovered gateway on their own reconnect backoff; without the pause, the next restart could briefly leave a client with no READY backend at all. The pause is skipped on idempotent no-op deploys that did not restart anything.

Administrative Commands¶

The sf-ctl command provides several database-related administrative functions. These commands are typically used during cluster bootstrap and maintenance.

ensure-mariadb-schema¶

Ensures the MariaDB schema exists and is up to date. This command must be run on a node with direct database access (i.e. MARIADB_HOST configured):

sf-ctl ensure-mariadb-schema

The command first performs a compatibility check against the requirements listed in MariaDB compatibility requirements above, then creates any missing tables and applies pending schema migrations. Operators must run this command (or ensure their deployment automation runs it) before starting sf-database whenever an SF upgrade includes schema changes.

initialise-node¶

Creates a node record in the database. By default, it uses the local node's configuration:

sf-ctl initialise-node

From a database-tier node (one with MARIADB_HOST already in /etc/sf/config), the command can initialise any node in the cluster without any env-var prefix:

sf-ctl initialise-node --node-name sf-2 --node-mesh-ip 10.0.0.2

gateway-health¶

Checks that an sf-database gateway reports SERVING via the standard grpc.health.v1/Check RPC, exiting zero on SERVING and non-zero otherwise. On sf-database, SERVING means MariaDB is reachable and the schema was current at startup — a stronger signal than the port merely accepting connections. By default it probes this node's own gateway (NODE_MESH_IP):

sf-ctl gateway-health
sf-ctl gateway-health --host 10.0.0.2 --timeout 5

This is the health gate the deploy's serial rolling restart waits on before moving to the next database-tier node (see Rolling restarts of the database tier above). It is equivalent to a grpc-health-probe against port 13005, but needs no extra binary installed on the node.

register-daemon¶

Registers one or more daemons on a node. By default, it registers on the local node:

sf-ctl register-daemon sentinel-first privexec nodelock

From a database-tier node (one with MARIADB_HOST already in /etc/sf/config), daemons can be registered against any node in the cluster:

sf-ctl register-daemon database --node-name sf-1

MariaDB Table Inventory¶

The MariaDB schema uses different table patterns depending on the data characteristics. This section is a developer- and operator-facing reference for the per-table layout.

Table Architecture¶

Shared Tables (DatabaseBackedObject level)¶

Data that has the same schema across all object types is stored in shared tables with (object_type, object_uuid) keys:

Table	Purpose
`object_states`	State value, update time, message for all objects
`object_metadata`	User-defined metadata for all objects

These tables are efficient for cross-type queries (e.g., "find all objects in error state").

High-Churn Dedicated Tables¶

Some data has high write frequency or requires atomic operations with database constraints. These get dedicated tables optimized for their access patterns:

Table	Purpose
`ipam_reservations`	IP address allocations with uniqueness constraints
`cluster_operations`	Full cluster operation metadata with indexed `node_uuid`, `instance_uuid`, `network_uuid` and `priority` columns extracted from JSON for dispatch-time filtering
`work_queue`	Per-job queue row with `queue_name`, `scheduled_at`, `claimed_at`, `claimed_by`, `attempts` and `payload`. Dequeue uses `SELECT ... FOR UPDATE SKIP LOCKED`
`cluster_operation_targets`	Operation-to-object targeting with AUTO_INCREMENT ordering
`cluster_operation_errors`	One row per failed cluster operation, keyed by `op_uuid`. Stores the structured `ErrorReport` (code, message, details, origin_class, traceback) JSON. Cleaned up alongside the `cluster_operations` row by `BaseClusterOperation.hard_delete()` when the cluster cleaner reaps a terminal-state op
`node_metrics`	Ephemeral per-node resource metrics with semi-schemaless JSON payload
`node_daemon_states`	Per-`(node, daemon)` state rows; atomic upsert per daemon, no Python-side coarse lock
`cluster_locks`	Leased distributed locks. `expires_at` lets candidates steal a dead holder's lock without external GC; holders refresh every ~20 s while alive

IPAM reservations are stored separately because:

Atomic allocation: Database uniqueness constraints prevent race conditions
High churn: Addresses are frequently reserved and released
Cross-object queries: Need to find all addresses for an IPAM, not just one object

Cluster operation headers (cluster_operations) and work queue rows (work_queue) live in MariaDB so the create-and-enqueue step can run in a single transaction (header row + state row + queue row). The work_queue table uses MariaDB row locking with SELECT ... FOR UPDATE SKIP LOCKED for race-safe dequeue.

The cluster daemon runs reap_stuck_cluster_operation_jobs() from shakenfist/daemons/cluster/scheduled_tasks.py on a one-minute schedule. It re-queues or rejects rows whose claim has gone stale:

CLUSTER_OP_STUCK_THRESHOLD — seconds before a claimed row is considered stuck (default 1800). Lower values detect crashed workers faster at the cost of possibly re-queuing merely slow jobs.
CLUSTER_OP_MAX_ATTEMPTS — maximum claim attempts before the reaper stops re-queuing and transitions the underlying cluster operation to STATE_ERROR (default 5). Protects the queue from a "job of death" that crashes every worker.
CLUSTER_METRICS_PORT — Prometheus scrape port exposed by the cluster daemon (default 13007). Metrics cluster_op_reaper_requeued_total and cluster_op_reaper_rejected_total record reaper activity.

Cluster operation targets are stored separately because:

Append-only history: Every operation enqueued against an object creates a row, giving full operation history per target
Automatic ordering: AUTO_INCREMENT sequence_number replaces the implicit dependency chain traversal
Indexed queries: Efficient lookups for "latest operation on this instance" and "all operations on this object in order"

Because the table is append-only, it is bounded by a periodic prune in the cluster daemon (alongside the existing delete_stale_transfers cleanup). The cluster daemon runs cluster-wide cleanup under ClusterLock election, so the prune naturally runs from a single node at a time. The prune removes rows whose created_at is older than CLUSTER_OPERATION_TARGET_RETENTION seconds and whose operation is not currently in an active state (queued, preflight, or executing) in object_states. Operations still in flight are never pruned regardless of age. Set CLUSTER_OPERATION_TARGET_RETENTION to 0 to disable pruning entirely (the default is 7 days).

Cluster Operation Target Tracking¶

The cluster_operation_targets table holds one row per (operation, target object) pair. Each row carries the target's object type and UUID, plus the operation_uuid and an AUTO_INCREMENT sequence_number that gives total ordering per target.

Two query shapes are exposed to the rest of the system:

get_latest_cluster_operation_target: returns the highest-sequence row for a given (object_type, uuid) pair, regardless of state. Used by the last_cluster_operation property and external_view() projections to provide the familiar "which op ran last?" answer.
has_pending_cluster_operation_target: returns True if any row for the object references an operation whose state is queued, preflight, or executing. Used by Network.is_okay() and any other gate that must defer while work is in flight. Because it checks all rows rather than only the latest one, a later terminal operation cannot mask an earlier in-flight one.

Rows are written automatically by enqueue_cluster_operation; operators do not need to manage them. Pruning is performed by the cluster daemon under ClusterLock election via _direct_delete_stale_cluster_operation_targets: rows older than CLUSTER_OPERATION_TARGET_RETENTION whose operation has reached a terminal state are removed; in-flight operations are never pruned.

Per-Type Static Value Tables¶

Each concrete object type that is migrated gets its own table for static values (immutable data set at creation time):

Table	Object Type	Fields
`uploads`	Upload	uuid, node, created_at, version
`dnsmasq`	DnsMasq	uuid, namespace, owner_type, owner_uuid, provide_dhcp, provide_dns, version
`blobs`	Blob	uuid, modified, fetched_at, version
`nodes`	Node	uuid, fqdn (unique index), ip, version
`namespaces`	Namespace	name (VARCHAR PK), version
`artifacts`	Artifact	uuid, artifact_type, source_url, name, namespace, version
`network_interfaces`	NetworkInterface	uuid, network_uuid, instance_uuid, macaddr, ipv4, order, model, version
`ipams`	IPAM	uuid, namespace, network_uuid, ipblock, version
`networks`	Network	uuid, name, namespace, netblock, provide_dhcp, provide_nat, provide_dns, vxid (unique), egress_nic, mesh_nic, version
`agent_operations`	AgentOperation	uuid, namespace, instance_uuid (indexed), commands (JSON list), version
`instances`	Instance	uuid, cpus, disk_spec (JSON), memory, name, namespace (indexed), requested_placement (JSON), ssh_key, user_data, video (JSON), uefi, configdrive, nvram_template, secure_boot, machine_type, side_channels (JSON), version

These tables use the object's UUID as the primary key, except for namespaces which uses the namespace name (a string) as its primary key.

Per-Type Attribute Tables¶

Mutable attributes that are specific to an object type are stored in dedicated attribute tables:

Table	Object Type	Key Fields
`blob_attributes`	Blob	uuid, size, info, last_used, retention
`node_attributes`	Node	uuid, last_seen, installed_version, roles, daemons, versions, metrics. Per-daemon state lives in `node_daemon_states` since v19; the legacy `daemon_states` JSON column on this table is no longer read or written. Instance placement lives in `object_references` as `instance_location` rows since `object_references` schema v3; for one transition release the legacy `instances` JSON column is dual-written and unioned into reads so rolling upgrade and rollback both see fresh placements
`namespace_attributes`	Namespace	name, keys (JSON), trust (JSON)
`artifact_attributes`	Artifact	uuid, max_versions, shared, highest_index
`artifact_indexes`	Artifact	artifact_uuid + index_number (composite PK), blob_uuid
`network_interface_attributes`	NetworkInterface	uuid, floating_address
`network_attributes`	Network	uuid, floating_gateway, hosteddns (JSON dict)
`agent_operation_attributes`	AgentOperation	uuid, results (JSON dict)
`instance_attributes`	Instance	uuid, placement (JSON), power_state (JSON), ports (JSON), enforced_deletes (JSON), block_devices (JSON), agent_state (JSON), agent_attributes (JSON), agent_operations (JSON), kvm_pid, error_message, vsock_cids (JSON dict)

Node attributes consolidate observed state, roles, daemons and versions into a single row.

Namespace attributes consolidate keys (authentication) and trust (namespace trust relationships) into a single row.

Because these tables pack several logically independent attributes into one row, every updater passes a field mask naming exactly the columns it changed (see update_*_attributes in shakenfist/mariadb.py). Full-row read-modify-write cycles are reserved for row creation and schema upgrades: with concurrent writers on different nodes, an unmasked write pushes a stale snapshot of the other columns over any update committed since the writer read the row (a cross-attribute lost update).

Node Identity and UUID Persistence¶

Each node in the cluster is assigned a real UUID (UUID version 4) when it first registers with the cluster. Previously, nodes used their FQDN as a fake UUID, but all nodes now have proper UUIDs stored in the nodes MariaDB table with the FQDN as a separate uniquely-indexed column.

To avoid an FQDN-to-UUID database lookup on every daemon startup, the node UUID is persisted locally to {STORAGE_PATH}/node_uuid (typically /srv/shakenfist/node_uuid). On subsequent daemon starts, the UUID is read from this local file for a direct database lookup by primary key.

The node UUID can also be set explicitly via the SHAKENFIST_NODE_UUID environment variable or the NODE_UUID configuration field, which takes precedence over the local file. This is useful for disaster recovery scenarios where local storage has been lost but the node's UUID is known.

The lookup precedence order is:

NODE_UUID configuration field / SHAKENFIST_NODE_UUID environment variable
Local file at {STORAGE_PATH}/node_uuid
FQDN-based lookup in the nodes table (fallback)

If the persisted UUID does not match the current node's FQDN, it is ignored and the FQDN-based fallback is used. This guards against stale UUID files left over from a previous node installation.

Each attribute table follows the same pattern — typed scalar columns for hot-path fields, JSON columns for complex structures, and one indexed FK column per parent — for example:

CREATE TABLE node_attributes (
    uuid UUID PRIMARY KEY,
    last_seen DOUBLE,
    installed_version VARCHAR(64),
    -- Complex structures as JSON
    roles JSON,
    daemons JSON,
    metrics JSON
);

Cached lists of child object UUIDs are deliberately not stored on the parent attribute table — querying the child table by an indexed FK column is the source of truth. Phase 7 of the SQL-pushdown plan removed the last two such caches (network_attributes.networkinterfaces and instance_attributes.interfaces); see PLAN-sql-pushdown-filtering-phase-07-denorm-lists.md.

This approach:

Avoids wide generic tables: Each type has exactly the columns it needs
Enables proper typing: Native SQL types instead of JSON everywhere
Supports efficient indexes: Can index frequently-queried columns
Keeps queries simple: No joins needed for common operations

Abstract Base Classes¶

Abstract base classes like DatabaseBackedObject and ManagedExecutable do not get their own tables. Only concrete classes that are actually instantiated have tables. For example:

ManagedExecutable (abstract) - no table
DnsMasq (concrete, inherits ManagedExecutable) - gets dnsmasq table

Pydantic Models as Schema Source¶

Each table is defined by a Pydantic model that serves as the single source of truth:

from typing import Annotated
from pydantic import BaseModel, ConfigDict, UUID4
from shakenfist.schema.sqlalchemy import SQLIndex, SQLNativeUUID

class DnsMasqData(BaseModel):
    """Schema for DnsMasq static values in MariaDB."""
    model_config = ConfigDict(frozen=True)

    uuid: Annotated[UUID4, SQLNativeUUID()]
    namespace: Annotated[str, SQLIndex()]
    owner_type: Annotated[str, SQLIndex()]
    owner_uuid: Annotated[str, SQLIndex()]
    version: int
    provide_dhcp: bool
    provide_dns: bool

The table is then generated from this model:

from shakenfist.schema.sqlalchemy import pydantic_to_sqlalchemy_table

table = pydantic_to_sqlalchemy_table(
    DnsMasqData, 'dnsmasq', metadata,
    primary_key_field='uuid', include_id_column=False
)

Adding New Attributes¶

When adding a new attribute to an object type:

For shared attributes (DatabaseBackedObject level):

Consider if it belongs in an existing shared table (like object_states)
If it's a new shared concept, create a new shared table

For type-specific attributes:

Add the field to the Pydantic model
ALTER TABLE to add the column (with default if needed)
Bump the object's version number
Add an upgrade step (can be no-op if column has a DB default)

Object Version Upgrades¶

Objects have version numbers that track schema changes. When an object is read from the database with an older version:

Lazy upgrade: The upgrade_pydantic_data() method applies upgrade steps
Persistence: If the cluster minimum version equals current version, the upgraded data is written back to MariaDB
Background migration: A future background worker will upgrade objects that are never read

This allows rolling upgrades without requiring all objects to be migrated immediately.

Schema System¶

Shaken Fist uses Pydantic models for schema definition. These models serve multiple purposes:

Validation: Ensuring data conforms to expected types and constraints
Serialization: Converting between Python objects and JSON payloads
SQL Generation: Automatically generating SQLAlchemy tables for MariaDB

Pydantic Models¶

Schema definitions live in shakenfist/schema/. For example, cluster operations have their schemas defined in shakenfist/schema/operations/.

A typical schema looks like:

from enum import Enum
from typing import List, Optional
from pydantic import BaseModel, Field, UUID4

class model_tasks(Enum):
    verify_size_and_checksum = 1
    ensure_local = 2

class model(BaseModel):
    uuid: UUID4
    node_uuid: str
    blob_uuid: UUID4
    priority: PRIORITY
    request_id: Optional[str]
    tasks: List[model_tasks]
    version: int = Field(ge=1, le=1)

SQLAlchemy Table Generation¶

The shakenfist.schema.sqlalchemy module provides utilities to automatically convert Pydantic models to SQLAlchemy tables. This keeps the schema definition in one place and avoids hand-writing SQL.

Basic Usage¶

from shakenfist.schema.sqlalchemy import pydantic_to_sqlalchemy_table
import sqlalchemy as sa

metadata = sa.MetaData()
table = pydantic_to_sqlalchemy_table(
    MyModel,
    'my_table',
    metadata,
    primary_key_field='uuid'
)

Type Mapping¶

Python types are mapped to SQL column types:

Python Type	SQL Type
`str`	`VARCHAR(255)`
`int`	`BIGINT`
`float`	`DOUBLE`
`bool`	`BOOLEAN`
`bytes`	`LARGEBINARY`
`UUID`	`CHAR(36)`
`Enum`	`VARCHAR(64)`
`IPv4Address`	`INET4` (MariaDB-specific)
`list`, `dict`, nested models	`LONGTEXT` (JSON)
`Optional[X]`	Nullable column of type X

Index Annotations¶

Indexes can be defined directly in the Pydantic model using Python's Annotated types. This keeps index definitions co-located with the schema.

Single-Column Indexes¶

Use SQLIndex() or SQLUniqueIndex() markers:

from typing import Annotated
from pydantic import BaseModel
from shakenfist.schema.sqlalchemy import SQLIndex, SQLUniqueIndex

class User(BaseModel):
    uuid: Annotated[str, SQLIndex()]           # Creates idx_users_uuid
    email: Annotated[str, SQLUniqueIndex()]    # Creates uidx_users_email
    name: str                                   # No index

Compound Indexes¶

For indexes spanning multiple columns, use the model's configuration:

from pydantic import BaseModel, ConfigDict

class Event(BaseModel):
    model_config = ConfigDict(
        json_schema_extra={
            'sql_indexes': [
                ('object_type', 'object_uuid'),  # Compound index
                ('timestamp',),                   # Single column via config
            ]
        }
    )

    object_type: str
    object_uuid: str
    timestamp: float
    message: str

Generated Index Names¶

Index names follow a predictable pattern:

Single-column: idx_{table}_{column} or uidx_{table}_{column} (unique)
Compound: idx_{table}_{col1}_{col2}_{...}

Table Lifecycle¶

The ensure_table_exists() function handles idempotent table creation:

from shakenfist.schema.sqlalchemy import (
    pydantic_to_sqlalchemy_table,
    ensure_table_exists
)

# Create table definition
table = pydantic_to_sqlalchemy_table(MyModel, 'my_table', metadata)

# Create table and indexes in database (idempotent)
ensure_table_exists(engine, table)

Schema Comparison¶

To detect schema drift between the Pydantic model and the database:

from shakenfist.schema.sqlalchemy import compare_schemas

differences = compare_schemas(engine, table)
# Returns: {
#     'missing_columns': [...],  # In model but not in DB
#     'extra_columns': [...],    # In DB but not in model
#     'type_mismatches': [...]   # Different types
# }

Object State Storage¶

Object state (e.g., "created", "deleted", "error") is stored in a dedicated MariaDB table for improved query performance. Access is routed through the database service's gRPC interface for all daemons except the database daemon itself.

The object_states Table¶

The object_states table stores state for all object types:

from typing import Annotated, Optional
from pydantic import BaseModel, ConfigDict, Field
from shakenfist.schema.sqlalchemy import SQLIndex, SQLUniqueIndex

class ObjectState(BaseModel):
    model_config = ConfigDict(
        json_schema_extra={
            'sql_indexes': [
                ['object_type', 'state_value'],  # Efficient queries by type+state
            ]
        }
    )

    object_uuid: Annotated[str, SQLUniqueIndex(), Field(max_length=36)]
    object_type: Annotated[str, SQLIndex(), Field(max_length=32)]
    state_value: Annotated[str, SQLIndex(), Field(max_length=32)]
    update_time: float
    message: Optional[str] = None

State Class¶

The State class is a Pydantic model that replaces the original baseobject.State class. It provides the same interface for backwards compatibility:

from shakenfist.schema.object_state import State

state = State(value='created', update_time=time.time(), message='optional msg')
print(state.value)        # 'created'
print(state.update_time)  # 1234567890.123
print(state.obj_dict())   # {'value': 'created', 'update_time': 1234567890.123}

IPAM Reservation Storage¶

IPAM (IP Address Manager) reservations are stored in MariaDB for atomic address allocation. This provides:

Atomic reservation: Uses database uniqueness constraints to prevent race conditions when multiple nodes try to allocate the same address
Efficient queries: Indexes on ipam_uuid and address for fast lookups
Deletion halo: Supports the deletion-halo pattern where recently released addresses are temporarily unavailable to prevent reuse conflicts

The ipam_reservations Table¶

The ipam_reservations table uses a composite primary key on (ipam_uuid, address):

from ipaddress import IPv4Address

class IPAMReservation(BaseModel):
    model_config = ConfigDict(
        json_schema_extra={
            'sql_indexes': [
                ['ipam_uuid', 'address'],      # Composite unique key
                ['user_type', 'user_uuid'],    # Query by user
            ]
        }
    )

    ipam_uuid: Annotated[str, SQLIndex(), Field(max_length=36)]
    address: Annotated[IPv4Address, SQLIndex()]  # Maps to INET4 column
    reservation_type: ReservationType            # Enum stored as VARCHAR
    user_type: Optional[str] = Field(default=None, max_length=32)
    user_uuid: Optional[str] = Field(default=None, max_length=36)
    reserved_at: float
    comment: Optional[str] = None

The address field uses Python's ipaddress.IPv4Address type, which maps to MariaDB's INET4 column type. This provides efficient 4-byte storage and native IP address comparison operations.

Reservation Types¶

IPAM supports several reservation types:

Type	Description
`network`	The network address (e.g., 10.0.0.0)
`broadcast`	The broadcast address (e.g., 10.0.0.255)
`gateway`	The gateway address for the network
`floating`	A floating IP that can be moved between instances
`routed`	A routed IP address for external connectivity
`instance`	An IP assigned to an instance interface
`deletion-halo`	A recently-released address in the deletion halo

Upload Object Storage¶

Upload objects (temporary objects that receive streamed data during artifact creation) are stored in MariaDB. This provides:

Efficient iteration: Fast queries for cleanup of stale uploads
Node-based lookups: Indexed queries to find uploads by node for routing

The uploads Table¶

The uploads table stores static values for upload objects:

Column	Type	Description
uuid	UUID	Primary key - the upload's unique identifier
node	VARCHAR(255)	The node where the upload data is stored
created_at	DOUBLE	Unix timestamp when the upload was created
version	INTEGER	Object version number

Indexes: - Primary key on uuid - Index on node for efficient routing of upload requests - Index on created_at for finding old uploads during cleanup

Best Practices¶

Schema Evolution¶

When adding new fields:

Add the field to the Pydantic model with a default value
Use Optional[X] for fields that may not exist in old data
Include a version field to track schema versions
Handle missing fields gracefully in code

Rolling Deployments¶

During rolling upgrades where nodes may run different versions:

New fields should be optional until all nodes are upgraded
Old code should ignore unknown fields
Use version fields to detect and handle schema differences

Performance Considerations¶

Use indexes for fields that are frequently queried
Prefer compound indexes for queries that filter on multiple columns
Keep JSON/LONGTEXT fields for data that doesn't need indexing
Use MariaDB for data requiring complex queries

📝 Report an issue with this page