Feature matrix
What features does Shaken Fist have now? What about in the future? This page attempts to document the currently implemented features, but it is a bit a moving target. If you're left wondering if something works, please reach out to us and ask.
High level functionality
Our high level functionality is why you'd consider using Shaken Fist. Specifically, we support:
- instances: which are virtual machines deployed and managed by Shaken Fist.
- virtual networks: which are VXLAN meshes between hypervisors managed by Shaken Fist. These virtual networks do automatic IP address management, optionally provide DHCP and NAT, and support floating IPs for external accessability.
- resource efficiency: we try hard to not use much in terms of resources in an orchestration idle state (that is, your workload isn't changing), but we also deploy and configure Kernel Shared Memory (KSM) and make heavy use of qcow2 Copy On Write (COW) layers to reduce the resources used by a single instance. This means you can pack more instances onto a Shaken Fist cluster than you can alternative deployments of the same size from other projects.
Object types
We also have a lot of implementation functionality that is quite useful, but not the sort of thing you'd put on a billboard. Let's work through that by object type.
Artifacts
Artifacts are Shaken Fist's object type for disk images -- the sort of thing that you would store in Glance in OpenStack. Artifacts can store downloaded disk images from the internet (the "image" type), snapshots of previous instances (the "snapshot" type), and arbitrary uploads (also stored with the "image" type). There is also a special "label" artifact type, which is an overlay on top of the other types. Its easiest to explain its behavior by explaining the lifecycle of an artifact.
The normal way to get your first artifact is to download something from the internet. So for example, you might start an instance with a standard Ubuntu cloud image from https://cloud-images.ubuntu.com/. This would be done by specifying the URL of the image in the disk specification of the image: Shaken Fist will then download the image and store it as an artifact, and then start your instance. A second instance using the same image will then check the image at the URL hasn't changed, and if it hasn't use the same artifact as the first instance, skipping the repeated download.
However, if the image has changed, a second version would be downloaded. Depending on the settings for the artifact, both versions are retained. By default Shaken Fist keeps the last three versions of each artifact, although this is configurable.
Now let's assume that you have a nightly CI job which starts an instance from the latest Ubuntu cloud image, and performs some tests to ensure that it works for your software stack. You want to somehow mark for your other workloads what versions of the Ubuntu image are trusted, and you do this with a label. So, your CI job would specify the upstream URL for the cloud image, perform its tests, and then label the image if it passed those tests. Other Ubuntu users in your cloud could then specify that they wanted the most recent version which passed testing by specifying the label for their disk specification, instead of the upstream URL.
Shaken Fist's CI does exactly this. Each night we download a set of cloud images, customize them to make the CI runs a bit faster (pre-installing packages and so forth), and then test that they work. At the end of that run we take a snapshot of the instance we customized, and label it with a label along the lines of "sfci-ubuntu-2004". CI jobs then use that label for their base disk. You can see the ansible we use to do this at https://github.com/shakenfist/shakenfist/blob/develop/deploy/ansible/ci-image.yml if you're interested.
The following operations are exposed on artifacts by the REST API:
Operation | Command line client | API client |
---|---|---|
list artifacts | artifact list |
get_artifacts() |
show an artifact | artifact show |
get_artifact() |
fetch an artifact from a URL without starting an instance | artifact cache |
cache_artifact() |
upload | artifact upload |
create_upload() followed by calls to send_upload() and then upload_artifact() |
download | artifact download |
lookup the desired version's blob with get_artifact() , then download with get_blob_data() |
show detailed information about versions | artifact versions |
get_artifact_versions() |
delete | artifact delete |
delete_artifact() |
delete a version | artifact delete-version |
delete_artifact_version() |
set the maximum number of versions | artifact max-versions |
set_artifact_max_versions() |
Note that artifacts exist in namespaces (since v0.6). This means that your artifacts are private to your namespace, and can't be seen or used by other namespaces. There are two exceptions -- the "system" administrative namespace can see all artifacts, and the "system" namespace can create artifacts visible to all other namespaces -- this is done with the shared flag on the relevant command line or API calls, and uses a "sharedwithall" namespace in the database.
Blobs
Each version of an artifact is an object called a blob. Blobs are stored on Shaken Fist nodes, and are automatically replicated around the cluster as required. By default we store at least two copies of each blob, although this is configurable. Its possible we'll store a lot more copies than that, because we only reap excess copies when we start to run low on disk. This is because these blobs are often used during the startup of instances, so having a local cache of popular blobs can significantly improve instance start up times.
All hypervisor nodes store blobs, but it is also possible to have "storage only" nodes which don't run VMs and just store blobs. In previous deployments we have used these storage nodes to handle having more blobs than we need for currently running instances -- for example historical snapshots we are fond of, but are unlikely to require frequent access to. The storage nodes were therefore a cheaper machine type with slower CPU and disk, but a lot more disk than our hypervisor nodes.
So for example if you had an edge deployment where you are resource constrained, but also want to take nightly instance snapshots as a backup, you might have a more centrally located storage node and Shaken Fist would migrate unused blobs there to free up space on the edge nodes as required. If a blob only present on a storage only node is required for an instance start, a hypervisor node will fetch it at that time.
Finally, blobs are reference counted. They can be used by more than one artifact (for example an image which is then labelled), and we also count how many instances are using a specific blob. We only delete a blob from disk when there are no remaining references to it.
The following operations are exposed on blobs by the REST API:
Operation | Command line client | API client |
---|---|---|
list blobs | blob list |
get_blobs() |
Events
Shaken Fist has an event logging system for the main object types. So for example, instead of reading through log files to find all the state changes that an instance went through, you can simply ask for a list of the events for that instance. This also means that the instance owner can see those logs without having to be given access to your log files.
The following object types currently record events: artifacts; blobs; instances; networks; networkinterfaces; nodes; and uploads. In general, events are exposed in the API as operations on the object they relate to. So for example there is a instance events
command, which calls the get_instance_events()
API client call. Those various calls are documented by their object type.
Networks
Note that networks exist in namespaces. This means that your networks are private to your namespace, and can't be seen or used by other namespaces. There is one exception -- the "system" administrative namespace can see all networks.
Instances
Instances are the primary reason that you'd run Shaken Fist, so there's a lot to cover in their implementation. Obviously instances can be created, deleted, listed, and shown. Additionally, you can list the network interfaces on an instance, track and change metadata on a given instance (a simple key value store similar to OpenStack tags), request the current serial console output; and see events related to the instance. Instances can also have their power state managed: soft (ACPI) reboots; hard (power cycle) reboots; powered off; powered on; and paused.
When creating an instance you can configure:
- the name of the instance
- how many vCPUs the instance has
- how much memory the instance has
- what network connections the instance has, including floating IP attachments and the network interface model to use
- what disks the instance has, their size, type, and bus
- what ssh key cloud-init should set up, if your instance includes cloud-init
- other arbitrary user data which will be passed to cloud-init, if installed
- the namespace of the instance
- what video card the instance has, including the model and amount of video memory
- whether BIOS boot or UEFI boot is used
- whether secure boot is enabled, including a NVRAM template if required
- what configuration drive type is used, with a default of OpenStack style
- key and value metadata
Note that instances exist in namespaces. This means that your instances are private to your namespace, and can't be seen or used by other namespaces. There is one exception -- the "system" administrative namespace can see all instances.
Other features
Shaken Fist supports the follow other features that are not directly related to an object type:
- JWT based API authentication
- graceful shutdown of hypervisors where current work is finished before the processes are stopped
- online upgrade of object versions as required
Comparison to OpenStack
The development team's background is OpenStack, so we find it useful to provide a comparison between what OpenStack supports and what Shaken Fist supports. However, Shaken Fist does not intend to be a direct replacement for OpenStack, and implements many features not present in OpenStack (for example in guest agents).
Here's a simple feature matrix listing when a feature was introduced:
Feature | Implemented | Planned | Not Planned |
---|---|---|---|
Servers / instances | v0.1 | ||
Networks | v0.1 | ||
Multiple NIC's for a given server | v0.1 | ||
Pre-cache a server image | v0.1 | ||
Floating IPs | v0.1 | ||
Pause | v0.1 | ||
Reboot (hard and soft) | v0.1 | ||
Security groups | Yes | ||
Text console | v0.1 | ||
VDI | v0.1 | ||
User data | v0.1 | ||
Keypairs | v0.1 | ||
Virtual networks allow overlapping IP allocations | v0.1 | ||
REST API authentication and object ownership | v0.2 | ||
Snapshots (of all disks) | v0.1 | ||
Central API service | v0.1 | ||
Scheduling | v0.1 | ||
Volumes | No plans | ||
Quotas | No plans | ||
API versioning | No plans | ||
Keystone style service lookup and URLs | No plans | ||
Create multiple servers in a single request | No plans | ||
Resize a server | No plans | ||
Server groups | No plans | ||
Change admin password | No plans | ||
Rebuild a server | No plans | ||
Shelve / unshelve | No plans | ||
Trigger crash dump | No plans | ||
Live migration | No plans | ||
Flavors | No plans | ||
Guest agents | No plans | ||
Host aggregates | No plans | ||
Server tags | v0.2, we call them "metadata" | ||
~~Persistence in MySQL~~ | v0.1 | ||
Distributed etcd for locking and persistence | v0.2 | ||
Production grade REST API via gunicorn | v0.2 | ||
Python REST API client | v0.1 | ||
golang REST API client | v0.2 | ||
Terraform provider | v0.2 |