v0.7 to v0.8 release notes
Major changes
- You can now hot plug (add) a network interface to a running instance. This is
exposed by a POST to the
/instance/...instanceref.../interfaces endpoint, passing
a network specification like you would at boot time. More details are available
in the
- The Shaken Fist client now has a plugin system, which allows additional commands
that may not be of interest to all users to be added without cluttering the
main codebase. Orchestration of k3s clusters via the
shakenfist-client-k3s
plugin is the first example of one of these plugins.
- You can now optionally have DNS entries for the instances in a virtual network
provided to the instances on that network via the
provide_dns argument on
network creation. This included a re-write of DHCP server options in the
configuration file.
- We now use threads for most concurrency requirements. See
the operator page on threading for further details.
- Disk utilization is now a factor in scheduling decisions. This might produce
unexpected results if you have many very active filesystems mounted on a single
hypervisor.
REST API
- There is now an API call (
GET /admin/resources) which exposes the resource
utilization of the cluster to admin users.
- You can now POST to
/instance/...instanceref.../interfaces as described
above to hot plug new network interfaces into an existing instance.
Supported distributions
- Debian 12 is now supported as a host OS. Support for Debian 11 as a host OS
has been dropped.
- Ubuntu 22.04 is now supported as a host OS.
- Fedora 34, 38, and 29, as well as Ubuntu 22.04 now have canned guest images.
- Rocky 8 and 9 now have canned guest images.
Logging
- REST API request traces are now logged via the event logging mechanism with
the object type "api-requests" and the UUID being the request id.
- Effort is now taken to report an event against all objects it affects, not
just an arbitrary single object. This should ease debugging.
Containers and Kubernetes
- The Shaken Fist client can now orchestrate k3s Kubernetes clusters for you. The
lifecycle support is relatively simple at the moment, with cluster creation and
deletion supported, as well as fetching the kubectl configuration from the
cluster. This will be expanded over time. This support is implemented entirely
in the Shaken Fist python client, and heavily utilizes the in guest agent
added in v0.7. The client side nature of the orchestration makes it easy for you
to customize the orchestration if desired without having to alter the main
server code.
Networking
- IP address management has moved to a new baseobject called IPAM. Events are
therefore recorded for address management as you would expect.
- Addresses released on any network (including the floating network) are now
quarantined for
IP_DELETION_HALO_DURATION seconds after deletion before they
can be reused. The only exception to this is if a network is heavily congested
and an allocation attempt will fail. In that case the halo is temporarily
reduced to 30 seconds and a warning log message is emitted.
- You can now list the addresses in use for a given network with the
sf-client network addresses ...uuid... command.
- In order to support the K3S Kubernetes orchestration, the concept of routed
IPs was introduced. A routed IP is an address from the floating address pool
which uses routing to deliver traffic to the relevant virtual network. An
interface on the virtual network must then have been configured by the user to
answer ARP requests for that address. This works well with metallb, which our
K3S orchestration uses to expose services.
- Network orchestration now waits for
iptables locks, instead of failing
commands in high load situations.
Instances
- Shaken Fist can now capture screenshots of instance consoles.
- Pause and unpause are now retried several times on failure, as sometimes libvirt
does not respond correctly.
- Specifying an incorrect disk bus now returns a more helpful error.
- Power on now implies creation of a config drive is one is specified for the
instance. That is, you can force re-creation of the config drive by powering
an instance off and then on again.
- Deletion of an instance will now cause most outstanding cluster operations for
that instance to be cancelled. This is so these operations do not block the
delete operation. The one exception at this time is snapshots, which will
complete before the deletion occurs.
Artifacts
- When you refer to an artifact by name, and there is more than one match then
the match in your local namespace (if any) is now preferred instead of returning
an error.
Deployment
- We no longer reset the authentication secret used to generate authentication
tokens on upgrade. This means tokens from before an upgrade will continue to work
for their normal lifetime.
- We now lock versions of upstream Ansible Galaxy dependencies.
- We now lock versions of all of our indirect python dependencies.
- Nodes can now transition directly from the missing state to the stopping state.
- The Ansible modules have been re-written to skip resources that are already in
existence and ask described by your request. The Ansible module is also now
documented at the user guide.
- TLS certificates on hypervisors which are within 30 days of expiry are now
automatically replaced.
Database
- Object state (created, deleted, error, etc.) is now stored in MariaDB instead
of etcd. This improves query performance for state-based lookups and reduces
etcd load. MariaDB is now required for all deployments.
- A new database microservice runs on etcd_master nodes and provides gRPC access
to MariaDB for all other nodes. This centralizes database access and means only
etcd_master nodes require direct MariaDB credentials.
- MariaDB schema versioning is now tracked in a
schema_versions table. This
enables automatic schema migrations when upgrading Shaken Fist. The
sf-ctl ensure-mariadb-schema command can be used to manually verify or apply
schema migrations.
- Existing deployments upgrading to v0.8 must run
sf-ctl migrate-state-to-mariadb
after stopping services and before starting the new version. Use --dry-run
to preview what will be migrated.
- The floating network now uses a well-known UUID (
f10a7f10-a7f1-4a7f-a10a-7f10a7f10a7f)
instead of the invalid string "floating". Existing deployments must run
sf-ctl migrate-floating-network-uuid after stopping services to migrate.
This change enables proper UUID4 type validation in the IPAM schema.
- Events are no longer queued via etcd unless the eventlog node is down at the
time of the event. This reduces the number of etcd writes significantly during
CI runs and therefore improves the reliability of etcd. The new approach is
to make gRPC calls directly to the eventlog node if it is available.
- We now use gRPC calls to compact etcd, instead of relying on a python client
wrapper. This means we can now update our gRPC and protobuf dependencies to
much more recent versions.
- etcd traffic levels are now monitored in CI and we attempt to hold fewer
cluster level locks for local operations.
Minor changes
- Unhandled exceptions are now recorded to
/srv/shakenfist/exceptions/ for
later analysis. See the operator guide
for details.
- CI has been moved from relatively unreliable scraping of the instance serial
console over telnet to using the Shaken Fist in-guest agent to inspect the
state of instances for correctness.
- The slow lock warning threshold is no longer configurable (SLOW_LOCK_THRESHOLD).
Instead, a warning is emitted if a lot takes more than half of the specified
timeout period to be acquired. This change was made because in some places we
expect to wait a long time for a lock -- for example serialized fetches of a
single resource from outside the cluster, but we also wanted to enforce locks
didn't take a long time to acquire in CI.
- Shaken Fist now uses
Renovate to keep the dependencies of the develop
branch up to date. This means that locking requirements at release time is no
longer required, and is therefore more reliable.
- The
qemu commands generated now vary based on the version of qemu installed
on the machine. This was required to support the newer qemu version in
Ubuntu 22.04.
- The ansible modules have been rewritten to be more reliable.
- The ShakenFist client now uses HTTP sessions to reduce latency for requests.
📝 Report an issue with this page