Skip to content

Artifact checksums

As of Shaken Fist v0.7, blob replics are regularly checksummed to verify that data loss has not occurred. The following events imply a checksum operation:

  • snapshotting an NVRAM template.
  • creation of a new blob replica by transfer of a blob from another machine in the cluster (the destination is checksummed to verify the transfer).
  • transcode of a blob into a new format (the new format is stored as a separate blob).
  • conversion of an upload to an artifact.

The following events should imply an artifact checksum, but we found that performance suffered too much for very large blobs:

  • download of a new blob from an external source (artifact fetch for example).
  • snapshotting a disk.

Additionally, all blob replicas are regularly checksummed and compared with what the record in etcd believes the correct value should be. These comparisons are rate limited, but should happen with a maximum frequency of CHECKSUM_VERIFICATION_FREQUENCY seconds, which defaults to every 24 hours. It is possible if you have a large number of blob replicas on a given node that the node will be unable to keep up with checksum operations.

If a blob replica fails the checksum verification, CHECKSUM_ENFORCEMENT is set to True and is not in use on that node, then the replica is deleted and the cluster will re-replicate the blob as required. If the blob replica is in use, there isn't much Shaken Fist can do without disturbing running instances, so the error is logged and then ignored for now.

Checksums are also used when a new version of an artifact is created. If the checksum of the previous version is the same as the checksum for the proposed new version, the proposed new version is skipped. Artifact uploads from v0.7 can also skip actual upload of the contents of the artifact if there is already a blob in the cluster with a matching checksum.