Database Architecture

Shaken Fist uses a combination of databases for different purposes. This page describes the database architecture, how data is organized, and how the schema system works.

Overview

Shaken Fist currently uses two database backends:

etcd: A distributed key-value store used for cluster coordination, configuration, locks, and object storage.
MariaDB: A relational database being introduced for structured data that benefits from SQL queries and indexing.

etcd

etcd is the primary database for Shaken Fist and is used for:

Object storage: All Shaken Fist objects (instances, networks, blobs, artifacts, etc.) are stored in etcd.
Cluster coordination: Node discovery, leader election, and distributed state.
Distributed locking: See the Locks documentation.
Configuration: Cluster-wide configuration stored at /sf/config.
Event logs: Audit trails and operational events for objects.
Queues: Work queues for cluster operations.

Key Structure

etcd keys follow a hierarchical structure:

/sf/                          # Root prefix for all Shaken Fist data
/sf/object/{type}/{uuid}      # Object definitions
/sf/attribute/{type}/{uuid}/  # Object attributes (state, placement, etc.)
/sf/event/{type}/{uuid}/      # Event logs for objects
/sf/queue/                    # Work queues
/sflocks/                     # Distributed locks

Object Types

Each object type has a dedicated key prefix:

Object Type	Key Prefix
Instance	`/sf/object/instance/`
Network	`/sf/object/network/`
Network Interface	`/sf/object/networkinterface/`
Blob	`/sf/object/blob/`
Artifact	`/sf/object/artifact/`
Node	`/sf/object/node/`
Namespace	`/sf/object/namespace/`

MariaDB

MariaDB is being introduced for data that benefits from:

Complex queries with filtering and sorting
Indexed lookups on multiple columns
Structured data with well-defined schemas
Transaction support

MariaDB is deployed on etcd master nodes and uses Galera for multi-master replication across the cluster.

Connection

Shaken Fist connects to MariaDB using SQLAlchemy. The connection details are configured during cluster deployment.

Schema System

Shaken Fist uses Pydantic models for schema definition. These models serve multiple purposes:

Validation: Ensuring data conforms to expected types and constraints
Serialization: Converting between Python objects and JSON for etcd
SQL Generation: Automatically generating SQLAlchemy tables for MariaDB

Pydantic Models

Schema definitions live in shakenfist/schema/. For example, cluster operations have their schemas defined in shakenfist/schema/operations/.

A typical schema looks like:

from enum import Enum
from typing import List, Optional
from pydantic import BaseModel, Field, UUID4

class model_tasks(Enum):
    verify_size_and_checksum = 1
    ensure_local = 2

class model(BaseModel):
    uuid: UUID4
    node_uuid: str
    blob_uuid: UUID4
    priority: PRIORITY
    request_id: Optional[str]
    tasks: List[model_tasks]
    version: int = Field(ge=1, le=1)

SQLAlchemy Table Generation

The shakenfist.schema.sqlalchemy module provides utilities to automatically convert Pydantic models to SQLAlchemy tables. This keeps the schema definition in one place and avoids hand-writing SQL.

Basic Usage

from shakenfist.schema.sqlalchemy import pydantic_to_sqlalchemy_table
import sqlalchemy as sa

metadata = sa.MetaData()
table = pydantic_to_sqlalchemy_table(
    MyModel,
    'my_table',
    metadata,
    primary_key_field='uuid'
)

Type Mapping

Python types are mapped to SQL column types:

Python Type	SQL Type
`str`	`VARCHAR(255)`
`int`	`BIGINT`
`float`	`DOUBLE`
`bool`	`BOOLEAN`
`bytes`	`LARGEBINARY`
`UUID`	`CHAR(36)`
`Enum`	`VARCHAR(64)`
`list`, `dict`, nested models	`LONGTEXT` (JSON)
`Optional[X]`	Nullable column of type X

Index Annotations

Indexes can be defined directly in the Pydantic model using Python's Annotated types. This keeps index definitions co-located with the schema.

Single-Column Indexes

Use SQLIndex() or SQLUniqueIndex() markers:

from typing import Annotated
from pydantic import BaseModel
from shakenfist.schema.sqlalchemy import SQLIndex, SQLUniqueIndex

class User(BaseModel):
    uuid: Annotated[str, SQLIndex()]           # Creates idx_users_uuid
    email: Annotated[str, SQLUniqueIndex()]    # Creates uidx_users_email
    name: str                                   # No index

Compound Indexes

For indexes spanning multiple columns, use the model's configuration:

from pydantic import BaseModel, ConfigDict

class Event(BaseModel):
    model_config = ConfigDict(
        json_schema_extra={
            'sql_indexes': [
                ('object_type', 'object_uuid'),  # Compound index
                ('timestamp',),                   # Single column via config
            ]
        }
    )

    object_type: str
    object_uuid: str
    timestamp: float
    message: str

Generated Index Names

Index names follow a predictable pattern:

Single-column: idx_{table}_{column} or uidx_{table}_{column} (unique)
Compound: idx_{table}_{col1}_{col2}_{...}

Table Lifecycle

The ensure_table_exists() function handles idempotent table creation:

from shakenfist.schema.sqlalchemy import (
    pydantic_to_sqlalchemy_table,
    ensure_table_exists
)

# Create table definition
table = pydantic_to_sqlalchemy_table(MyModel, 'my_table', metadata)

# Create table and indexes in database (idempotent)
ensure_table_exists(engine, table)

Schema Comparison

To detect schema drift between the Pydantic model and the database:

from shakenfist.schema.sqlalchemy import compare_schemas

differences = compare_schemas(engine, table)
# Returns: {
#     'missing_columns': [...],  # In model but not in DB
#     'extra_columns': [...],    # In DB but not in model
#     'type_mismatches': [...]   # Different types
# }

Object State Storage

Object state (e.g., "created", "deleted", "error") is being migrated from etcd attributes to a dedicated MariaDB table for improved query performance.

The object_states Table

The object_states table stores state for all object types:

from typing import Annotated, Optional
from pydantic import BaseModel, ConfigDict, Field
from shakenfist.schema.sqlalchemy import SQLIndex, SQLUniqueIndex

class ObjectState(BaseModel):
    model_config = ConfigDict(
        json_schema_extra={
            'sql_indexes': [
                ['object_type', 'state_value'],  # Efficient queries by type+state
            ]
        }
    )

    object_uuid: Annotated[str, SQLUniqueIndex(), Field(max_length=36)]
    object_type: Annotated[str, SQLIndex(), Field(max_length=32)]
    state_value: Annotated[str, SQLIndex(), Field(max_length=32)]
    update_time: float
    message: Optional[str] = None

State Class

The State class is a Pydantic model that replaces the original baseobject.State class. It provides the same interface for backwards compatibility:

from shakenfist.schema.object_state import State

state = State(value='created', update_time=time.time(), message='optional msg')
print(state.value)        # 'created'
print(state.update_time)  # 1234567890.123
print(state.obj_dict())   # {'value': 'created', 'update_time': 1234567890.123}

Migration Strategy

State data is migrated incrementally per object type:

Dual-write: New state changes are written to both etcd and MariaDB
Read priority: State reads prefer MariaDB, falling back to etcd
Upgrade step: When an object's version is bumped, its existing state is migrated from etcd to MariaDB

Enabling MariaDB State for an Object Type

To enable MariaDB state storage for an object type:

class MyObject(DatabaseBackedObject):
    object_type = 'myobject'
    current_version = 2  # Bump version for migration

    # Enable MariaDB state storage
    use_mariadb_state = True

    @classmethod
    def _upgrade_step_1_to_2(cls, static_values):
        # Migrate existing state to MariaDB
        if not mariadb.is_configured():
            return

        state_data = etcd.get('attribute/myobject', static_values['uuid'], 'state')
        if state_data:
            state = State(**state_data)
            mariadb.set_state('myobject', static_values['uuid'], state)

Best Practices

Schema Evolution

When adding new fields:

Add the field to the Pydantic model with a default value
Use Optional[X] for fields that may not exist in old data
Include a version field to track schema versions
Handle missing fields gracefully in code

Rolling Deployments

During rolling upgrades where nodes may run different versions:

New fields should be optional until all nodes are upgraded
Old code should ignore unknown fields
Use version fields to detect and handle schema differences

Performance Considerations

Use indexes for fields that are frequently queried
Prefer compound indexes for queries that filter on multiple columns
Keep JSON/LONGTEXT fields for data that doesn't need indexing
Use MariaDB for data requiring complex queries; etcd for simple key-value lookups