How to set up a highly available virtual IP for clusters¶
This page describes how to enhance the high availability (HA) of the control plane for an LXD cluster, through setting up a virtual IP (VIP) as a single access point.
By exposing cluster members to the network and configuring them as remote servers on a client machine, you can control the cluster over the network. This provides high availability: if one cluster member becomes unavailable, you can access the cluster through another.
You can enhance HA by adding a routing service that uses the Virtual Router Redundancy Protocol (VRRP) to configure a single VIP as the access point for the cluster. While the implementation differs, the concept is similar to a floating IP in cloud platforms.
For more information about HA in LXD clusters, including both the control and data planes, see: High availability.
Use Keepalived¶
While VRRP is implemented by various tools, Keepalived is the most commonly used implementation in Linux environments. The VIP configured with Keepalived routes to only one cluster member at any given time (called the MASTER
), and Keepalived performs regular checks to reassign the VIP to another member (a BACKUP
) if the MASTER
fails to respond.
To install Keepalived, run the following commands on each cluster member:
sudo apt update
sudo apt -y install keepalived
The configuration file for Keepalived is typically stored at /etc/keepalived/keepalived.conf
. You must create a configuration file for each cluster member, with one member set with state MASTER
and the rest with state BACKUP
.
Example minimal configuration¶
Example of a minimal Keepalived configuration for three LXD cluster members (m1
, m2
, and m3
):
keepalived.conf
on m1¶vrrp_instance VI_1 {
state MASTER
interface enp5s0
virtual_router_id 41
priority 200
advert_int 1
virtual_ipaddress {
192.0.2.50/24
}
}
keepalived.conf
on m2 and m3¶vrrp_instance VI_1 {
state BACKUP
interface enp5s0
virtual_router_id 41
priority 100
advert_int 1
virtual_ipaddress {
192.0.2.50/24
}
}
Restart the keepalived
service on each cluster member after creating or editing its configuration file:
sudo systemctl restart keepalived
On m1
(the cluster member designated as MASTER
), run the following command, using the interface you configured in keepalived.conf
:
ubuntu@m1:~$
ip -br addr show <interface>
Confirm that in the output, you can see the VIP as an IP address of the interface.
Test the example configuration¶
This section describes how to conduct a basic test of the example Keepalived configuration, using the CLI.
First, create a container on each of the cluster members. This can be performed from any of the cluster members, using the --target
flag. Example:
ubuntu@m1:~$
lxc init ubuntu:24.04 c1 --target m1
Creating c1
ubuntu@m1:~$
lxc init ubuntu:24.04 c2 --target m2
Creating c2
ubuntu@m1:~$
lxc init ubuntu:24.04 c3 --target m3
Creating c3
Run the following command from any cluster member to list the created instances:
lxc list
Confirm that the containers each exist on a different cluster member.
Next, you need a client LXD server that can access the network used by the cluster for external connectivity. On it, add the VIP as a remote server:
ubuntu@my-client:~$
lxc remote add my-cluster 192.0.2.50
Then check the list of instances running on the cluster from the client machine. Example:
ubuntu@my-client:~$
lxc list my-cluster:
The output shown should match what you see when you run lxc list
on any of the cluster members.
Finally, take the cluster member configured as the Keepalived MASTER
offline so that it is no longer reachable. This should cause Keepalived to automatically move the VIP to one of the BACKUP
servers.
From the client server, list the instances running on the cluster once more, using the same command as before. You should see the same list as before, with the exception that the container running on the offline cluster member now displays an ERROR
state. This confirms that you can still run remote commands on the client, meaning that Keepalived has reassigned the MASTER
role to another cluster member.
Configuration keys¶
In this section, we provide brief descriptions of the configuration keys used in this guide. Keep in mind that our example minimal configuration does not include authentication and other settings that might be relevant in production. For full configuration details, refer to the official Keepalived documentation.
state
:Only one cluster member can be designated the
MASTER
state. The rest must be set asBACKUP
.interface
This is the interface that carries the subnet used for client access to the cluster. On clusters using OVN networking, this is the uplink network.
virtual_router_id
The
virtual_router_id
must be the same in all cluster members. This assigns the cluster members to the same virtual router.priority
This determines the order in which the VIP is allocated to a cluster member. The
MASTER
must always have a higher priority number than anyBACKUP
. TheBACKUP
servers can use the same number to let Keepalived choose the priority, or you can set a specific priority for each server.advert_int
The
advert_int
key sets the advertisement internal (in seconds). TheMASTER
sends VRRP advertisements at this interval to tellBACKUP
servers that it’s online. TheBACKUP
servers assume that theMASTER
is offline if it stops sending these.virtual_ipaddress
This is the VIP exposed for access to the cluster. Select an unused IP from the subnet used for external access by the cluster. It must be identical on all cluster members.
Load balancing¶
Using a VIP to route requests to a single cluster member can cause high load on that machine. Keepalived also provides a framework for load balancing, using the Linux Virtual Server (IPVS) kernel module. For details, see the Keepalived documentation.
Alternatively, consider combining Keepalived with an implementation of HAProxy. HAProxy is a reverse proxy that can redirect traffic for both TCP and HTTP protocols, which means that it can handle load balancing both API and UI traffic for LXD clusters.
HAProxy can also support the use of ACME (Automatic Certificate Management Environment) services such as Let’s Encrypt to automate renewing certificates for UI access. For details, see: TLS server certificate.