You can now hot plug (add) a network interface to a running instance. This is
exposed by a POST to the /instance/...instanceref.../interfaces endpoint, passing
a network specification like you would at boot time. More details are available
in the
The Shaken Fist client now has a plugin system, which allows additional commands
that may not be of interest to all users to be added without cluttering the
main codebase. Orchestration of k3s clusters via the shakenfist-client-k3s
plugin is the first example of one of these plugins.
REST API
There is now an API call (GET /admin/resources) which exposes the resource
utilization of the cluster to admin users.
You can now POST to /instance/...instanceref.../interfaces as described
above to hot plug new network interfaces into an existing instance.
Supported distributions
Debian 12 is now supported as a host OS.
Ubuntu 22.04 is now supported as a host OS.
Fedora 34, 38, and 29, as well as Ubuntu 22.04 now have canned guest images.
Rocky 8 and 9 now have canned guest images.
Logging
REST API request traces are now logged via the event logging mechanism with
the object type "api-requests" and the UUID being the request id.
Containers and Kubernetes
The Shaken Fist client can now orchestrate k3s Kubernetes clusters for you. The
lifecycle support is relatively simple at the moment, with cluster creation and
deletion supported, as well as fetching the kubectl configuration from the
cluster. This will be expanded over time. This support is implemented entirely
in the Shaken Fist python client, and heavily utilizes the in guest agent
added in v0.7. The client side nature of the orchestration makes it easy for you
to customize the orchestration if desired without having to alter the main
server code.
Networking
IP address management has moved to a new baseobject called IPAM. Events are
therefore recorded for address management as you would expect.
Addresses released on any network (including the floating network) are now
quarantined for IP_DELETION_HALO_DURATION seconds after deletion before they
can be reused. The only exception to this is if a network is heavily congested
and an allocation attempt will fail. In that case the halo is temporarily
reduced to 30 seconds and a warning log message is emitted.
You can now list the addresses in use for a given network with the
sf-client network addresses ...uuid... command.
In order to support the K3S Kubernetes orchestration, the concept of routed
IPs was introduced. A routed IP is an address from the floating address pool
which uses routing to deliver traffic to the relevant virtual network. An
interface on the virtual network must then have been configured by the user to
answer ARP requests for that address. This works well with metallb, which our
K3S orchestration uses to expose services.
Network orchestration now waits for iptables locks, instead of failing
commands in high load situations.
Instances
Shaken Fist can now capture screenshots of instance consoles.
Pause and unpause are now retried several times on failure, as sometimes libvirt
does not respond correctly.
Specifying an incorrect disk bus now returns a more helpful error.
Power on now implies creation of a config drive is one is specified for the
instance. That is, you can force re-creation of the config drive by powering
an instance off and then on again.
Artifacts
When you refer to an artifact by name, and there is more than one match then
the match in your local namespace (if any) is now preferred instead of returning
an error.
Deployment
We no longer reset the authentication secret used to generate authentication
tokens on upgrade. This means tokens from before an upgrade will continue to work
for their normal lifetime.
We now lock versions of upstream Ansible Galaxy dependencies.
We now lock versions of all of our indirect python dependencies.
Nodes can now transition directly from the missing state to the stopping state.
The Ansible modules have been re-written to skip resources that are already in
existence and ask described by your request. The Ansible module is also now
documented at the user guide.
TLS certificates on hypervisors which are within 30 days of expiry are now
automatically replaced.
Performance
Events are no longer queued via etcd unless the eventlog node is down at the
time of the event. This reduces the number of etcd writes significantly during
CI runs and therefore improves the reliability of etcd. The new approach is
to make gRPC calls directly to the eventlog node if it is available.
We now use gRPC calls to compact etcd, instead of relying on a python client
wrapper. This means we can now update our gRPC and protobuf dependencies to
much more recent versions.
etcd traffic levels are now monitored in CI and we attempt to hold fewer
cluster level locks for local operations.
Minor changes
CI has been moved from relatively unreliable scraping of the instance serial
console over telnet to using the Shaken Fist in-guest agent to inspect the
state of instances for correctness.
The slow lock warning threshold is no longer configurable (SLOW_LOCK_THRESHOLD).
Instead, a warning is emitted if a lot takes more than half of the specified
timeout period to be acquired. This change was made because in some places we
expect to wait a long time for a lock -- for example serialized fetches of a
single resource from outside the cluster, but we also wanted to enforce locks
didn't take a long time to acquire in CI.
Shaken Fist now uses Renovate to keep the dependencies of the develop
branch up to date. This means that locking requirements at release time is no
longer required, and is therefore more reliable.
The qemu commands generated now vary based on the version of qemu installed
on the machine. This was required to support the newer qemu version in
Ubuntu 22.04.
The ansible modules have been rewritten to be more reliable.