How to recover instances in case of disaster

LXD provides a tool for disaster recovery in case the LXD database is corrupted or otherwise lost.

The tool scans the storage pools for instances and imports the instances that it finds back into the database. You need to re-create the required entities that are missing (usually profiles, projects, and networks).

Important

This tool should be used for disaster recovery only. Do not rely on this tool as an alternative to proper backups; you will lose data like profiles, network definitions, or server configuration.

The tool must be run interactively and cannot be used in automated scripts.

The tool is available through the lxd recover command (note the lxd command rather than the normal lxc command).

Recovery process

When you run the tool, it scans all storage pools that still exist in the database, looking for missing volumes that can be recovered. You can also specify the details of any unknown storage pools (those that exist on disk but do not exist in the database), and the tool attempts to scan those too.

After mounting the specified storage pools (if not already mounted), the tool scans them for unknown volumes that look like they are associated with LXD. LXD maintains a backup.yaml file in each instance’s storage volume, which contains all necessary information to recover a given instance (including instance configuration, attached devices, storage volume, and pool configuration). This data can be used to rebuild the instance, storage volume, and storage pool database records. Before recovering an instance, the tool performs some consistency checks to compare what is in the backup.yaml file with what is actually on disk (such as matching snapshots). If all checks out, the database records are re-created.

If the storage pool database record also needs to be created, the tool uses the information from an instance’s backup.yaml file as the basis of its configuration, rather than what the user provided during the discovery phase. However, if this information is not available, the tool falls back to restoring the pool’s database record with what was provided by the user.

The tool asks you to re-create missing entities like networks. However, the tool does not know how the instance was configured. That means that if some configuration was specified through the default profile, you must also re-add the required configuration to the profile. For example, if the lxdbr0 bridge is used in an instance and you are prompted to re-create it, you must add it back to the default profile so that the recovered instance uses it.

Example

This is how a recovery process could look:

user@host:~$ lxd recover
This LXD server currently has the following storage pools:Would you like to recover another storage pool? (yes/no) [default=no]: yesName of the storage pool: defaultName of the storage backend (btrfs, ceph, cephfs, cephobject, dir, lvm, zfs): zfsSource of the storage pool (block device, volume group, dataset, path, ... as applicable): /var/snap/lxd/common/lxd/storage-pools/default/containersAdditional storage pool configuration property (KEY=VALUE, empty when done): zfs.pool_name=defaultAdditional storage pool configuration property (KEY=VALUE, empty when done):Would you like to recover another storage pool? (yes/no) [default=no]:The recovery process will be scanning the following storage pools: - NEW: "default" (backend="zfs", source="/var/snap/lxd/common/lxd/storage-pools/default/containers")Would you like to continue with scanning for lost volumes? (yes/no) [default=yes]: yesScanning for unknown volumes...The following unknown volumes have been found: - Container "u1" on pool "default" in project "default" (includes 0 snapshots) - Container "u2" on pool "default" in project "default" (includes 0 snapshots)You are currently missing the following: - Network "lxdbr0" in project "default"Please create those missing entries and then hit ENTER: ^Z[1]+  Stopped                 lxd recover
user@host:~$ lxc network create lxdbr0
Network lxdbr0 created
user@host:~$ fg
lxd recover The following unknown volumes have been found: - Container "u1" on pool "default" in project "default" (includes 0 snapshots) - Container "u2" on pool "default" in project "default" (includes 0 snapshots)Would you like those to be recovered? (yes/no) [default=no]: yesStarting recovery...
user@host:~$ lxc list
+------+---------+------+------+-----------+-----------+| NAME |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |+------+---------+------+------+-----------+-----------+| u1   | STOPPED |      |      | CONTAINER | 0         |+------+---------+------+------+-----------+-----------+| u2   | STOPPED |      |      | CONTAINER | 0         |+------+---------+------+------+-----------+-----------+
user@host:~$ lxc profile device add default eth0 nic network=lxdbr0 name=eth0
Device eth0 added to default
user@host:~$ lxc start u1
user@host:~$ lxc list
+------+---------+-------------------+---------------------------------------------+-----------+-----------+| NAME |  STATE  |       IPV4        |                    IPV6                     |   TYPE    | SNAPSHOTS |+------+---------+-------------------+---------------------------------------------+-----------+-----------+| u1   | RUNNING | 192.0.2.49 (eth0) | 2001:db8:8b6:abfe:216:3eff:fe82:918e (eth0) | CONTAINER | 0         |+------+---------+-------------------+---------------------------------------------+-----------+-----------+| u2   | STOPPED |                   |                                             | CONTAINER | 0         |+------+---------+-------------------+---------------------------------------------+-----------+-----------+