Logging¶
Shaken Fist daemons emit structured JSON logs and can ship them to an operator-provided Loki log store. Shaken Fist does not run a log store, and it no longer aggregates logs onto a primary node: each node either ships its own logs to your Loki, or — if you have not configured one — writes them to the local systemd journal.
What changed¶
Earlier releases forwarded every node's syslog to a primary node
over rsyslog, where logs landed in /var/log/syslog. That
central aggregator has been removed. In its place:
- Daemon logs are structured JSON (one JSON object per line), not plain text. This is the only daemon log format.
- If you tell Shaken Fist where your Loki is, each node ships its own logs there directly, buffered through a local on-disk spool.
- If you do not, each node logs to its local journal and ships nothing.
Local journald still captures each node's logs regardless, so
on-box debugging with journalctl always works.
The two modes¶
Shaken Fist's logging has exactly two modes, keyed on whether
LOKI_BASE_URL is set. The log format (structured JSON) is
identical in both; only the destination differs.
Loki configured (the preferred path)¶
With a Loki endpoint set, each daemon buffers its INFO-and-above
JSON log lines in a local on-disk spool and a background drainer
ships them to Loki, with retry and backpressure. The spool is the
local-durability buffer that covers transient Loki outages. All
levels (including DEBUG) continue to be written to the node's
local journal as well, so on-box journalctl debugging is
unaffected.
Log levels: what ships to Loki¶
Only INFO and above is shipped to Loki. DEBUG stays local
(journald only). This matches the previous rsyslog deployment,
whose forwarder shipped *.*;*.!=debug — DEBUG was never
centrally aggregated, only kept on each node. It also keeps the
highest-volume log level (DEBUG; e.g. every privileged command is
logged at DEBUG) off the spool/push path, which matters for
performance on busy nodes. When you need DEBUG to diagnose a
problem, read it on the node with journalctl.
A future iteration may revisit shipping deeper detail centrally once Shaken Fist has OpenTelemetry-based tracing (see the development plans).
Loki not configured (the fallback)¶
Leaving LOKI_BASE_URL empty means each daemon emits the same
JSON to the node's local journal (journald) and ships nothing. A
node-local agent such as promtail,
Grafana Alloy, or
vector can then scrape that journal if you want central logs
collected on your own terms.
In both modes, systemd additionally captures each service's stdout/stderr into journald (uncaught tracebacks and any output emitted before logging is configured). That is free, local-only, and not a deliberate shipping pipeline.
Configuration¶
Set these via the deployer (getsf prompts / GETSF_LOKI_*
environment variables) or directly as SHAKENFIST_* config:
| Option | Default | Meaning |
|---|---|---|
LOKI_BASE_URL |
'' |
Base URL of your Loki, e.g. http://loki:3100. Empty disables the shipper. |
LOKI_TENANT |
'' |
Value sent as the X-Scope-OrgID header for multi-tenant Loki. Empty omits the header. |
LOKI_AUTH_HEADER |
'' |
Opaque Authorization header value (e.g. Bearer <token>). Empty sends none. Treat as a secret. |
LOG_EVENTS_TO_LOKI |
True |
Whether the per-event Added event log line is emitted to the log stream. Never affects the authoritative MariaDB event write (see Events vs logs). |
Tenant¶
LOKI_TENANT is worth setting deliberately when you run Shaken
Fist against a shared Loki. Shaken Fist's logs can be high volume,
and a dedicated tenant keeps them out of your other tenants'
streams — both for query hygiene and so per-tenant volume limits
and retention can be set independently.
TLS and mutual auth¶
This release supports a plain-HTTP or operator-terminated-HTTPS Loki endpoint plus the optional opaque auth header above. mTLS to the Loki endpoint (operator-provided CA, client certificates) is tracked separately and is not yet available.
Labels and the field contract¶
Shaken Fist tags every Loki stream with a small, bounded set of labels:
Everything else — including high-cardinality identifiers such as
instance_uuid, network_uuid, and request-id — lives in the
JSON log line body, never in a label. This is deliberate:
promoting high-cardinality values to Loki labels causes a
label-cardinality explosion. Query them with LogQL's JSON parser
instead:
{job="shakenfist"} | json | instance_uuid="<uuid>"
{job="shakenfist", daemon="sf-net"} | json | level="ERROR"
The full set of JSON field names (the base fields plus the
with_fields conventions) is documented as a stable contract in
the shakenfist_utilities library, in
docs/log-record-fields.md.
Buffering, backpressure and durability¶
The shipper is modelled on Shaken Fist's eventlog spool/drainer (see Events). Each daemon process holds its own disk-backed sqlite spool at:
- Durability boundary. A log call enqueues one cheap sqlite insert and returns; the line is on disk and survives a process crash. On startup the spool recovers orphan files left by previously-dead PIDs.
- Backpressure. A background drainer thread batches lines and
POSTs them to Loki's
/loki/api/v1/push. On failure the batch stays in the spool and is retried with exponential backoff; a brief Loki outage loses nothing. - Drop, don't block. If the spool exceeds its high-water mark (a sustained outage), new lines are dropped — with a counter increment and a warning — rather than blocking the daemon. Only the Loki copy is ever lost; the node still has journald.
- Clean shutdown. On process exit the drainer flushes the spool synchronously, within a bounded timeout.
Each daemon exposes Prometheus metrics for this on its existing metrics endpoint:
| Metric | Meaning |
|---|---|
logship_spool_depth |
Lines currently pending in the local spool. |
logship_spool_dropped_total |
Lines dropped at the high-water mark. |
logship_push_total{result=...} |
Loki push attempts by outcome. |
logship_push_seconds |
Loki push request latency. |
Events vs logs¶
Shaken Fist has two structured-record streams, and they are not the same thing:
- Events are the authoritative, per-object record (instance, network, blob, …), stored in MariaDB and read back through the REST API. They back billing, owner/admin audit, and per-object progress. See Events.
- Logs are the operational, interleaved-with-everything view for debugging "what was this node doing at 14:03", shipped to Loki (or journald).
By convention every event also emits an Added event log line,
so events show up in your log stream too — giving a single
time-ordered pane of events and logs together. That echo is
controlled by LOG_EVENTS_TO_LOKI (default on); turning it off
keeps events authoritative in MariaDB while reducing Loki volume.
Event storage always stays in MariaDB — it is never moved to
Loki.
If you run with LOG_EVENTS_TO_LOKI off but still want a single
pane, put a Loki logs panel and a MariaDB events panel on one
time-aligned Grafana dashboard.
Using your own log agent instead¶
If you already run a node-local log agent (promtail, Grafana
Alloy, vector, …), you can leave LOKI_BASE_URL unset and have
your agent scrape each node's journal. Shaken Fist's logs are
structured JSON there too, so your agent can parse and label them
however your pipeline prefers.