Skip to content

v0.7 to v0.8 release notes

Major changes

  • You can now hot plug (add) a network interface to a running instance. This is exposed by a POST to the /instance/...instanceref.../interfaces endpoint, passing a network specification like you would at boot time. More details are available in the
  • The Shaken Fist client now has a plugin system, which allows additional commands that may not be of interest to all users to be added without cluttering the main codebase. Orchestration of k3s clusters via the shakenfist-client-k3s plugin is the first example of one of these plugins.

REST API

  • There is now an API call (GET /admin/resources) which exposes the resource utilization of the cluster to admin users.
  • You can now POST to /instance/...instanceref.../interfaces as described above to hot plug new network interfaces into an existing instance.

Supported distributions

  • Debian 12 is now supported as a host OS.
  • Ubuntu 22.04 is now supported as a host OS.
  • Fedora 34, 38, and 29, as well as Ubuntu 22.04 now have canned guest images.
  • Rocky 8 and 9 now have canned guest images.

Logging

  • REST API request traces are now logged via the event logging mechanism with the object type "api-requests" and the UUID being the request id.

Containers and Kubernetes

  • The Shaken Fist client can now orchestrate k3s Kubernetes clusters for you. The lifecycle support is relatively simple at the moment, with cluster creation and deletion supported, as well as fetching the kubectl configuration from the cluster. This will be expanded over time. This support is implemented entirely in the Shaken Fist python client, and heavily utilizes the in guest agent added in v0.7. The client side nature of the orchestration makes it easy for you to customize the orchestration if desired without having to alter the main server code.

Networking

  • IP address management has moved to a new baseobject called IPAM. Events are therefore recorded for address management as you would expect.
  • Addresses released on any network (including the floating network) are now quarantined for IP_DELETION_HALO_DURATION seconds after deletion before they can be reused. The only exception to this is if a network is heavily congested and an allocation attempt will fail. In that case the halo is temporarily reduced to 30 seconds and a warning log message is emitted.
  • You can now list the addresses in use for a given network with the sf-client network addresses ...uuid... command.
  • In order to support the K3S Kubernetes orchestration, the concept of routed IPs was introduced. A routed IP is an address from the floating address pool which uses routing to deliver traffic to the relevant virtual network. An interface on the virtual network must then have been configured by the user to answer ARP requests for that address. This works well with metallb, which our K3S orchestration uses to expose services.
  • Network orchestration now waits for iptables locks, instead of failing commands in high load situations.

Instances

  • Shaken Fist can now capture screenshots of instance consoles.
  • Pause and unpause are now retried several times on failure, as sometimes libvirt does not respond correctly.
  • Specifying an incorrect disk bus now returns a more helpful error.
  • Power on now implies creation of a config drive is one is specified for the instance. That is, you can force re-creation of the config drive by powering an instance off and then on again.

Artifacts

  • When you refer to an artifact by name, and there is more than one match then the match in your local namespace (if any) is now preferred instead of returning an error.

Deployment

  • We no longer reset the authentication secret used to generate authentication tokens on upgrade. This means tokens from before an upgrade will continue to work for their normal lifetime.
  • We now lock versions of upstream Ansible Galaxy dependencies.
  • We now lock versions of all of our indirect python dependencies.
  • Nodes can now transition directly from the missing state to the stopping state.
  • The Ansible modules have been re-written to skip resources that are already in existence and ask described by your request. The Ansible module is also now documented at the user guide.
  • TLS certificates on hypervisors which are within 30 days of expiry are now automatically replaced.

Performance

  • Events are no longer queued via etcd unless the eventlog node is down at the time of the event. This reduces the number of etcd writes significantly during CI runs and therefore improves the reliability of etcd. The new approach is to make gRPC calls directly to the eventlog node if it is available.
  • We now use gRPC calls to compact etcd, instead of relying on a python client wrapper. This means we can now update our gRPC and protobuf dependencies to much more recent versions.
  • etcd traffic levels are now monitored in CI and we attempt to hold fewer cluster level locks for local operations.

Minor changes

  • CI has been moved from relatively unreliable scraping of the instance serial console over telnet to using the Shaken Fist in-guest agent to inspect the state of instances for correctness.
  • The slow lock warning threshold is no longer configurable (SLOW_LOCK_THRESHOLD). Instead, a warning is emitted if a lot takes more than half of the specified timeout period to be acquired. This change was made because in some places we expect to wait a long time for a lock -- for example serialized fetches of a single resource from outside the cluster, but we also wanted to enforce locks didn't take a long time to acquire in CI.
  • Shaken Fist now uses Renovate to keep the dependencies of the develop branch up to date. This means that locking requirements at release time is no longer required, and is therefore more reliable.
  • The qemu commands generated now vary based on the version of qemu installed on the machine. This was required to support the newer qemu version in Ubuntu 22.04.
  • The ansible modules have been rewritten to be more reliable.