Site Recoveryfor OpenStack
Guide

Prerequisites

OpenStack version requirements, Cinder volume type configuration, Pure FlashArray connectivity

master

Overview

This page covers the environment requirements and configuration steps you must complete before deploying Trilio Site Recovery for OpenStack. Because the service orchestrates disaster recovery across two independent OpenStack clouds, several conditions must be true on both sites before any protection group can be created or any failover executed. Work through this page in order: verify your OpenStack versions and topology, configure Cinder volume types with replication properties, establish Pure Storage FlashArray connectivity, and apply the required OpenStack service policy changes that allow the Trilio protector service to perform DR operations on behalf of your tenants.


Prerequisites

Before you begin, confirm the following on both the primary site and the secondary (DR) site:

OpenStack

  • OpenStack Victoria or later is recommended on each site
  • Each site must run its own independent Nova, Cinder, Neutron, and Keystone endpoints — a single shared control plane is not supported
  • The following OpenStack services must be operational on both sites:
    • Keystone (Identity)
    • Nova (Compute)
    • Cinder (Block Storage)
    • Neutron (Networking)
    • Glance (Image) — required when using the Mock storage driver for end-to-end testing

Infrastructure

  • Two Pure Storage FlashArray systems with an async replication connection already established between them (or a sync ActiveCluster Pod link for synchronous replication)
  • Pure Storage management IPs must be reachable from both OpenStack controller nodes
  • Both sites must be able to reach each other's Keystone and Trilio protector API endpoints (default port: 8788)
  • RabbitMQ accessible from the protector engine on each site

Databases and messaging

  • MariaDB or MySQL available on each site for the protector service database
  • Python 3.8 or later on each site

Credentials

  • Admin-level OpenStack credentials for both sites (required to create service users, configure volume types, and apply policy changes)
  • Pure Storage FlashArray API tokens for both arrays

Tooling

  • python-openstackclient with the protectorclient OSC plugin installed on the workstation you will use to coordinate operations across both sites
  • A clouds.yaml file configured with named entries for both sites (see the Configuration section)

Installation

Complete the following steps on both sites unless a step explicitly says otherwise.

Step 1: Create the protector service database

mysql -u root -p << EOF
CREATE DATABASE protector CHARACTER SET utf8;
GRANT ALL PRIVILEGES ON protector.* TO 'protector'@'localhost' IDENTIFIED BY 'PROTECTOR_DBPASS';
GRANT ALL PRIVILEGES ON protector.* TO 'protector'@'%' IDENTIFIED BY 'PROTECTOR_DBPASS';
FLUSH PRIVILEGES;
EOF

Step 2: Create the protector service user in Keystone

# Source admin credentials for this site
source ~/admin-openrc

# Create protector user
openstack user create --domain default --password-prompt protector

# Grant admin role in the service project (standard for OpenStack service accounts)
openstack role add --project service --user protector admin

# Register the service and its endpoints
openstack service create --name protector \
  --description "OpenStack Disaster Recovery Service" protector

openstack endpoint create --region RegionOne \
  protector public http://controller:8788/v1/%\(tenant_id\)s

openstack endpoint create --region RegionOne \
  protector internal http://controller:8788/v1/%\(tenant_id\)s

openstack endpoint create --region RegionOne \
  protector admin http://controller:8788/v1/%\(tenant_id\)s

Repeat this step on the second site, substituting that site's controller address.

Step 3: Apply Cinder service policy changes

The protector service uses Keystone trusts with the member role to act on behalf of tenants. Cinder restricts certain operations to the admin role by default; you must relax those restrictions.

For standard deployments, add the following to /etc/cinder/policy.yaml on each site:

# Allow member role to manage/unmanage volumes (required for DR failover)
"volume_extension:volume_manage": "rule:admin_or_owner"
"volume_extension:volume_unmanage": "rule:admin_or_owner"

# Allow member role to list volume services (for host discovery during manage)
"volume_extension:services:index": "rule:admin_or_owner"

For Kolla-Ansible deployments, create or update /etc/kolla/config/cinder/policy.yaml:

"volume_extension:volume_manage": "rule:admin_or_owner"
"volume_extension:volume_unmanage": "rule:admin_or_owner"
"volume_extension:services:index": "rule:admin_or_owner"

Then reconfigure Cinder:

kolla-ansible -i inventory reconfigure -t cinder

These three policies are required for the following reasons:

  • volume_manage — during failover, the protector service creates volumes on the Pure Storage array from replicated snapshots and imports them into Cinder using the manage API
  • volume_unmanage — during failback, volumes must be unmanaged from Cinder before storage-layer cleanup
  • services:index — used to discover the correct Cinder volume service host for the manage operation

Step 4: Apply Neutron service policy changes (optional — test failover auto-network only)

If you intend to use --auto-network during test failover operations, the protector service needs to create temporary isolated networks. If your Neutron policy restricts shared network creation, add:

# Allow service project to create shared networks (for test failover auto-network)
"create_network:shared": "rule:admin_only"

If you prefer not to modify the Neutron policy, you can instead create the test network manually and pass --network-mapping when executing a test failover. No other Neutron policy changes are required.

Step 5: Initialize the protector database schema

protector-manage db sync

Step 6: Create required directories

useradd --system --shell /bin/false protector

mkdir -p /var/log/protector
mkdir -p /var/lib/protector
mkdir -p /etc/protector

chown -R protector:protector /var/log/protector
chown -R protector:protector /var/lib/protector
chown -R protector:protector /etc/protector

Step 7: Enable and start services

systemctl daemon-reload
systemctl enable protector-api protector-engine
systemctl start protector-api protector-engine

# Confirm both services are running
systemctl status protector-api
systemctl status protector-engine

Configuration

protector.conf — key options

Create /etc/protector/protector.conf on each site. The file is site-local; there is no shared configuration between sites.

[DEFAULT]
debug = False
log_dir = /var/log/protector
state_path = /var/lib/protector

[api]
bind_host = 0.0.0.0
bind_port = 8788          # Default API port. Both sites must expose this port.
workers = 4

[database]
# Each site uses its own local database. Never point both sites at the same DB.
connection = mysql+pymysql://protector:PROTECTOR_DBPASS@controller/protector

[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = Default
user_domain_name = Default
project_name = service
username = protector
password = PROTECTOR_PASS

[service_credentials]
# Roles delegated via Keystone trust. Both 'member' and '_member_' are listed
# for compatibility across OpenStack versions. All listed roles must exist in
# the target Keystone and be sufficient (with the policy changes above) for
# the protector service to perform DR operations.
default_trust_roles = member,_member_

[oslo_policy]
policy_file = /etc/protector/policy.yaml
OptionDefaultEffect
bind_port8788Port the protector API listens on. Must match the Keystone endpoint registered in Step 2.
debugFalseSet to True for verbose logging during initial setup and troubleshooting.
default_trust_rolesmember,_member_Roles delegated when a Keystone trust is created. Adjust only if your cloud uses non-standard role names.
connectionDatabase DSN. Each site must point to its own local database — shared databases between sites are not supported.

Mock storage mode (testing without physical arrays)

If you are running end-to-end tests without physical Pure Storage hardware, add the following to protector.conf on both sites:

[DEFAULT]
# Simulate Cinder Consistency Group operations via SQLite
use_mock_cinder = True

# Simulate Pure Storage FlashArray replication via SQLite
use_mock_storage = True

Mock mode stores all array state in SQLite files under /var/lib/protector/mock_storage/. Volume "replication" is simulated by creating new bootable volumes from Glance images on the target site, so the same image name must exist on both clusters.

Create the mock storage directory on both sites:

mkdir -p /var/lib/protector/mock_storage
chmod 755 /var/lib/protector/mock_storage

clouds.yaml — multi-site coordination

The protectorclient OSC plugin and Horizon dashboard authenticate to both sites simultaneously. Configure ~/.config/openstack/clouds.yaml on your workstation with named entries for each site:

clouds:
  site-a:
    auth:
      auth_url: http://site-a-controller:5000/v3
      project_name: admin
      username: admin
      password: password
      user_domain_name: Default
      project_domain_name: Default
    region_name: RegionOne

  site-b:
    auth:
      auth_url: http://site-b-controller:5000/v3
      project_name: admin
      username: admin
      password: password
      user_domain_name: Default
      project_domain_name: Default
    region_name: RegionOne

The cloud names (site-a, site-b) are referenced in all subsequent CLI commands as --primary-site and --secondary-site arguments.

Cinder volume type properties

For a volume type to be eligible for protection, it must have both of the following extra specs set on both sites:

Extra specRequired valuePurpose
replication_enabled'<is> True'Marks the type as replication-capable. Volumes with a type that lacks this property cannot be added to a Protection Group.
replication_type'<in> async' or '<in> sync'Selects the replication mode. Must match the replication link configured between your FlashArrays.
volume_backend_nameYour Cinder backend nameRoutes volume creation to the correct backend. Must match an active cinder-volume host on each site (the part after @ in openstack volume service list).

The volume type name should be identical on both sites (for example, replicated-async) to avoid mapping confusion during failover.


Usage

Verify Cinder backend names before creating volume types

Before creating replicated volume types, confirm which Cinder backends are active on each site:

source /etc/kolla/admin-openrc.sh
openstack volume service list

The Host column shows values in the form controller@<backend-name>. Use the part after @ as your volume_backend_name value. If you set a backend name that does not match an active host, volume creation will fail with "No valid backend was found".

Create replicated volume types

Run the following on both sites, substituting your actual backend name for rbd-1:

Async replication:

openstack volume type create replicated-async \
    --description "Volumes with asynchronous replication"

openstack volume type set replicated-async \
    --property replication_enabled='<is> True' \
    --property replication_type='<in> async' \
    --property volume_backend_name='rbd-1'

Sync replication (ActiveCluster):

openstack volume type create replicated-sync \
    --description "Volumes with synchronous replication"

openstack volume type set replicated-sync \
    --property replication_enabled='<is> True' \
    --property replication_type='<in> sync' \
    --property volume_backend_name='rbd-1'

Configure tenant mapping

Because the two sites run independent Keystone deployments, the same tenant has different project UUIDs on each site. Create a mapping once per tenant pair:

openstack dr tenant mapping create \
    --local-tenant <local-project-id> \
    --remote-site <remote-site-id> \
    --remote-tenant <remote-project-id> \
    --description "Tenant mapping"

This mapping syncs automatically to the remote site. You do not need to run the command on both sites.

Validate sites after registration

After registering both sites (covered in the next workflow step), confirm connectivity and volume type availability:

openstack protector site validate site-a
openstack protector site validate site-b

openstack protector site list-volume-types site-a
openstack protector site list-volume-types site-b

Both sites must return the same volume type name (for example, replicated-async) before you proceed to create a Protection Group. If a volume type is missing on one site, Protection Group creation will be blocked.

Trust-based authentication

The protector service authenticates to OpenStack services on behalf of your tenants using Keystone trusts. Trusts are created automatically the first time a tenant interacts with the DR service — no manual setup is required. The trust delegates the roles listed in default_trust_roles (default: member,_member_). These trusts do not expire and persist until manually deleted.


Examples

Example 1: List Cinder backends and create a matching volume type

Confirm the active backend name, then create the replicated volume type:

source /etc/kolla/admin-openrc.sh
openstack volume service list

Expected output:

+------------------+------------------+------+---------+-------+
| Binary           | Host             | Zone | Status  | State |
+------------------+------------------+------+---------+-------+
| cinder-scheduler | controller       | nova | enabled | up    |
| cinder-volume    | controller@rbd-1 | nova | enabled | up    |
+------------------+------------------+------+---------+-------+
openstack volume type create replicated-async \
    --description "Volumes with asynchronous replication"

openstack volume type set replicated-async \
    --property replication_enabled='<is> True' \
    --property replication_type='<in> async' \
    --property volume_backend_name='rbd-1'

openstack volume type show replicated-async

Expected output:

+-------------+--------------------------------------------------------------------+
| Field       | Value                                                              |
+-------------+--------------------------------------------------------------------+
| id          | abc123-def456-...                                                  |
| name        | replicated-async                                                   |
| description | Volumes with asynchronous replication                              |
| is_public   | True                                                               |
| properties  | replication_enabled='<is> True', replication_type='<in> async',    |
|             | volume_backend_name='rbd-1'                                        |
+-------------+--------------------------------------------------------------------+

Repeat identically on the secondary site before proceeding.


Example 2: Verify Glance images match on both sites (mock storage mode only)

When using mock storage, volume replication is simulated by booting from Glance images. The image name must be identical on both sites.

# On site A
source cluster1rc
openstack image list --name cirros

# On site B
source cluster2rc
openstack image list --name cirros

If the image is missing on either site, upload it:

wget http://download.cirros-cloud.net/0.5.2/cirros-0.5.2-x86_64-disk.img

openstack image create cirros \
    --disk-format qcow2 \
    --container-format bare \
    --public \
    --file cirros-0.5.2-x86_64-disk.img

Example 3: Create a tenant mapping between sites

# Obtain project IDs
source site-a-openrc
LOCAL_PROJECT=$(openstack project show myproject -f value -c id)

source site-b-openrc
REMOTE_PROJECT=$(openstack project show myproject -f value -c id)
REMOTE_SITE=$(openstack protector site show site-b -f value -c id)

# Create the mapping (run from site A)
source site-a-openrc
openstack dr tenant mapping create \
    --local-tenant $LOCAL_PROJECT \
    --remote-site $REMOTE_SITE \
    --remote-tenant $REMOTE_PROJECT \
    --description "myproject tenant mapping"

Expected output:

+----------------+--------------------------------------+
| Field          | Value                                |
+----------------+--------------------------------------+
| id             | tm-aabbcc-1234-...                   |
| local_tenant   | <local-project-id>                   |
| remote_site    | <remote-site-id>                     |
| remote_tenant  | <remote-project-id>                  |
| description    | myproject tenant mapping             |
+----------------+--------------------------------------+

Example 4: Validate site readiness

openstack protector site validate site-a
openstack protector site validate site-b

Expected output for each site:

Site site-a: reachable
Keystone: OK
Cinder: OK
Nova: OK
Neutron: OK
Replicated volume types found: replicated-async

If a service check fails, resolve the connectivity or policy issue before continuing to the Protection Group creation step.


Troubleshooting

Volume creation fails with "No valid backend was found"

Symptom: openstack volume create returns an error like No valid backend was found for volume after you have created a replicated volume type.

Likely cause: The volume_backend_name property on the volume type does not match any active cinder-volume host.

Fix: Run openstack volume service list and check the Host column. The backend name is the portion after @. Update the volume type property to match exactly:

openstack volume type set replicated-async \
    --property volume_backend_name='<correct-backend-name>'

Protection Group creation blocked: volume type not found on remote site

Symptom: openstack protector protection-group create fails with a message indicating the volume type does not exist on the secondary site.

Likely cause: The replicated volume type was created on the primary site but not on the secondary site, or the names differ between sites.

Fix: SSH to the secondary site controller, source admin credentials, and create the volume type with the identical name and properties:

openstack volume type create replicated-async \
    --property replication_enabled='<is> True' \
    --property replication_type='<in> async' \
    --property volume_backend_name='<secondary-backend-name>'

Then re-run openstack protector site validate site-b to confirm.


Failover fails with permission denied on volume manage/unmanage

Symptom: A failover operation fails and the protector engine log shows a 403 response when attempting volume_manage or volume_unmanage.

Likely cause: The Cinder policy has not been updated on the target site, or the Cinder service was not reconfigured after the policy change.

Fix: Confirm the policy entries are present in /etc/cinder/policy.yaml (or the Kolla-Ansible equivalent) on the affected site, then reconfigure Cinder:

kolla-ansible -i inventory reconfigure -t cinder

For non-Kolla deployments, restart the cinder-api service after editing policy.yaml.


Keystone trust creation fails

Symptom: First-time use of the DR service returns an error such as Could not create trust or Role 'member' could not be found.

Likely cause: The roles listed in default_trust_roles do not exist on the target Keystone, or the protector service user does not have the admin role in the service project.

Fix: Verify the roles exist:

openstack role list

Verify the service user assignment:

openstack role assignment list --user protector --project service --names

If the admin assignment is missing:

openstack role add --project service --user protector admin

If the member role does not exist under that name, update default_trust_roles in protector.conf to match the actual role names in your Keystone, then restart both protector services.


Mock storage failover fails: image not found

Symptom: When running in mock storage mode, failover fails with an error like Image 'cirros' not found on target site.

Likely cause: The Glance image used by mock replication is present on the primary site but missing on the secondary site, or the image names differ.

Fix: Run openstack image list on both sites and confirm the same image name exists. Upload the missing image to the secondary site:

openstack image create cirros \
    --disk-format qcow2 \
    --container-format bare \
    --public \
    --file cirros-0.5.2-x86_64-disk.img

This issue only affects mock storage mode. Physical Pure Storage replication does not require Glance images during failover.


protector-api not responding on port 8788

Symptom: curl http://controller:8788/ returns a connection refused error.

Likely cause: The service failed to start, is listening on a different interface, or a firewall rule is blocking the port.

Fix:

# Check service status
systemctl status protector-api

# Check which address the process is bound to
ss -tlnp | grep 8788

# Review startup errors
journalctl -u protector-api -n 50

If the service exited immediately, the most common causes are a bad database DSN in protector.conf or a missing api-paste.ini. Verify both files are present and the database is reachable:

mysql -h controller -u protector -p protector -e "SELECT 1;"