Site Recoveryfor OpenStack
Guide

Service Installation

Installing and configuring protector-api and protector-engine on each site

master

Overview

This page walks you through installing and configuring the two Trilio Site Recovery services — protector-api and protector-engine — on each of your OpenStack sites. Because the two sites operate independently with no direct service-to-service communication, you must repeat this installation on both your primary and secondary clusters. After completing this guide, each site will have a running Protector service registered in Keystone, backed by a MariaDB database, and ready to be paired with its peer site for DR operations.


Prerequisites

Before you begin, confirm the following on each site where you are installing the service:

OpenStack environment

  • OpenStack Victoria or later
  • Nova, Cinder, Neutron, and Keystone endpoints operational
  • Admin credentials available (admin-openrc or equivalent)

Infrastructure

  • MariaDB or MySQL database server accessible from the controller node
  • Python 3.8 or later
  • pip available in the target Python environment
  • Ports open: 8788/tcp (Protector API), 3306/tcp (MariaDB)

Storage

  • Pure Storage FlashArray with async (or sync) replication configured between the two arrays
  • Pure Storage management IP reachable from the controller on each site
  • Cinder Pure Storage backend driver configured on each site

Both sites

  • Each OpenStack cluster must be able to reach the other site's Keystone endpoint and Protector API endpoint (port 8788)
  • You need the auth URL, project name, username, and password for an admin account on each site

Repeat all steps in this guide on both sites. The primary and secondary designations are workload-relative — both sites run identical service configurations.


Installation

Perform all steps below on each site. Where commands differ between sites, the distinction is noted.

Step 1: Create the database

Connect to MariaDB and create a dedicated database and user for the Protector service:

mysql -u root -p << EOF
CREATE DATABASE protector CHARACTER SET utf8;
GRANT ALL PRIVILEGES ON protector.* TO 'protector'@'localhost' IDENTIFIED BY 'PROTECTOR_DBPASS';
GRANT ALL PRIVILEGES ON protector.* TO 'protector'@'%' IDENTIFIED BY 'PROTECTOR_DBPASS';
FLUSH PRIVILEGES;
EOF

Replace PROTECTOR_DBPASS with a strong password. Use the same logical name (protector) on both sites, but each site connects to its own local database instance.

Step 2: Create the service user and endpoints in Keystone

Source your admin credentials, then create the service identity:

source ~/admin-openrc

# Create the protector user
openstack user create --domain default --password-prompt protector

# Grant the admin role in the service project
openstack role add --project service --user protector admin

# Register the service in the catalog
openstack service create --name protector \
  --description "OpenStack Disaster Recovery Service" protector

# Create endpoints (adjust the controller hostname for each site)
openstack endpoint create --region RegionOne \
  protector public http://controller:8788/v1/%\(tenant_id\)s

openstack endpoint create --region RegionOne \
  protector internal http://controller:8788/v1/%\(tenant_id\)s

openstack endpoint create --region RegionOne \
  protector admin http://controller:8788/v1/%\(tenant_id\)s

Replace controller with the actual hostname or IP of the controller node on each site.

Step 3: Create the system user and directories

Create a dedicated non-login system account and the required directories:

useradd --system --shell /bin/false protector

mkdir -p /var/log/protector
mkdir -p /var/lib/protector
mkdir -p /etc/protector

chown -R protector:protector /var/log/protector
chown -R protector:protector /var/lib/protector
chown -R protector:protector /etc/protector

Step 4: Install the Protector package

git clone https://github.com/your-org/openstack-protector.git
cd openstack-protector

pip install -r requirements.txt
python setup.py install

After installation, verify the management command is available:

protector-manage --version

Step 5: Write configuration files

Create the three configuration files below. Detailed descriptions of every option appear in the Configuration section.

/etc/protector/protector.conf

[DEFAULT]
debug = False
log_dir = /var/log/protector
state_path = /var/lib/protector

[api]
bind_host = 0.0.0.0
bind_port = 8788
workers = 4

[database]
connection = mysql+pymysql://protector:PROTECTOR_DBPASS@controller/protector

[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = Default
user_domain_name = Default
project_name = service
username = protector
password = PROTECTOR_PASS

[service_credentials]
default_trust_roles = member,_member_

[oslo_policy]
policy_file = /etc/protector/policy.yaml

Replace PROTECTOR_DBPASS, controller, and PROTECTOR_PASS with the values appropriate for each site.

/etc/protector/policy.yaml

"context_is_admin": "role:admin"
"admin_or_owner": "is_admin:True or project_id:%(project_id)s"
"default": "rule:admin_or_owner"

# Protection Groups
"protector:protection_groups:index": "rule:default"
"protector:protection_groups:show": "rule:default"
"protector:protection_groups:create": "rule:default"
"protector:protection_groups:update": "rule:default"
"protector:protection_groups:delete": "rule:default"

# Members
"protector:members:index": "rule:default"
"protector:members:create": "rule:default"
"protector:members:delete": "rule:default"

# Operations
"protector:operations:index": "rule:default"
"protector:operations:show": "rule:default"
"protector:operations:action": "rule:default"

# Policies
"protector:policies:show": "rule:default"
"protector:policies:create": "rule:default"

/etc/protector/api-paste.ini

[composite:protector]
use = egg:Paste#urlmap
/: protectorversions
/v1: protectorapi_v1

[composite:protectorapi_v1]
use = call:keystonemiddleware.auth_token:filter_factory
paste.filter_factory = keystonemiddleware.auth_token:filter_factory
keystone_authtoken = keystoneauth

[app:protectorversions]
paste.app_factory = protector.api.versions:VersionsController.factory

[app:protectorapi_v1]
paste.app_factory = protector.api.app:create_app

[filter:keystoneauth]
paste.filter_factory = keystonemiddleware.auth_token:filter_factory

Set ownership on all configuration files:

chown -R protector:protector /etc/protector
chmod 640 /etc/protector/protector.conf

Step 6: Apply required OpenStack policy changes

The Protector service acts on behalf of tenants using Keystone trusts. Cinder's default policy restricts several operations the service needs during failover. Add the following to Cinder's policy file on each site.

Standard deployments — edit /etc/cinder/policy.yaml:

# Required for DR failover: manage/unmanage volumes and discover service hosts
"volume_extension:volume_manage": "rule:admin_or_owner"
"volume_extension:volume_unmanage": "rule:admin_or_owner"
"volume_extension:services:index": "rule:admin_or_owner"

Kolla-Ansible deployments — create or update /etc/kolla/config/cinder/policy.yaml with the same content, then reconfigure:

kolla-ansible -i inventory reconfigure -t cinder

These changes are required because:

  • volume_manage — Protector imports replicated volumes into Cinder on the target site after a failover
  • volume_unmanage — Protector removes volumes from Cinder management during failback cleanup
  • services:index — Protector discovers the correct Cinder volume service host to target the manage operation

Step 7: Initialize the database schema

protector-manage db sync

Run this command on each site after writing protector.conf. It is safe to re-run; subsequent executions apply only pending Alembic migrations.

Step 8: Install systemd service files

/etc/systemd/system/protector-api.service

[Unit]
Description=OpenStack Protector API Service
After=network.target

[Service]
Type=simple
User=protector
Group=protector
ExecStart=/usr/local/bin/protector-api --config-file /etc/protector/protector.conf
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

/etc/systemd/system/protector-engine.service

[Unit]
Description=OpenStack Protector Engine Service
After=network.target

[Service]
Type=simple
User=protector
Group=protector
ExecStart=/usr/local/bin/protector-engine --config-file /etc/protector/protector.conf
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Step 9: Enable and start the services

systemctl daemon-reload

systemctl enable protector-api
systemctl enable protector-engine

systemctl start protector-api
systemctl start protector-engine

# Verify both services are active
systemctl status protector-api
systemctl status protector-engine

Step 10: Verify the API is reachable

curl http://controller:8788/

A successful response returns the available API versions. Repeat this health check on both sites before proceeding to register the sites with each other.


Configuration

The primary configuration file is /etc/protector/protector.conf. The sections and options below govern service behavior.

[DEFAULT]

OptionDefaultDescription
debugFalseSet to True to enable verbose debug logging. Do not use in production.
log_dir/var/log/protectorDirectory where protector-api.log and protector-engine.log are written.
state_path/var/lib/protectorDirectory for ephemeral state files. Must be writable by the protector user.

[api]

OptionDefaultDescription
bind_host0.0.0.0Interface the API process listens on. Use a specific IP to restrict access.
bind_port8788TCP port the API listens on. Both sites must expose the same port to each other.
workers4Number of API worker processes. Tune based on available CPU cores.

[database]

OptionDescription
connectionSQLAlchemy connection string for the local MariaDB/MySQL instance. Format: mysql+pymysql://USER:PASS@HOST/DBNAME. Each site connects to its own independent database — there is no shared database between sites.

[keystone_authtoken]

This section configures the Keystonemiddleware token validation pipeline. All options follow the standard OpenStack auth_token middleware convention.

OptionDescription
www_authenticate_uriPublic Keystone endpoint, returned to clients that need to authenticate.
auth_urlKeystone endpoint the service uses to validate tokens internally.
memcached_serversOptional token cache. Omit to disable caching.
auth_typeMust be password.
project_nameService project. Conventionally service.
username / passwordCredentials of the protector service user created in Step 2.

[service_credentials]

OptionDefaultDescription
default_trust_rolesmember,_member_Roles the service requests when creating Keystone trusts on behalf of tenants. Both member and _member_ are listed for compatibility across OpenStack releases. These roles must exist in Keystone and, combined with the Cinder policy changes in Step 6, must be sufficient for the service to perform DR operations within tenant scope.

[oslo_policy]

OptionDefaultDescription
policy_file/etc/protector/policy.yamlPath to the RBAC policy file. The default policy grants all operations to admins and to the resource owner (admin_or_owner). Modify this file to enforce finer-grained access control.

API microversioning

The Protector API uses the OpenStack-API-Version: protector <version> header for microversioning. The base version is 1.0 and the current version is 1.2. Clients that do not send this header receive the base version response. You do not configure this in protector.conf — it is negotiated per request.


Usage

Once both sites have running Protector services, the typical operational flow is:

  1. Configure clouds.yaml so that the OSC CLI plugin (protectorclient) can authenticate to both sites simultaneously.
  2. Register both sites with each Protector service.
  3. Prepare replication-enabled Cinder volume types on both sites.
  4. Create Protection Groups and add VMs.
  5. Execute DR operations (test failover, planned failover, failback).

This page covers only steps 1 and 2. For the full workflow, see the DR Workflow guide.

Configure multi-site credentials

Create ~/.config/openstack/clouds.yaml with an entry for each site:

clouds:
  site-a:
    auth:
      auth_url: http://site-a-controller:5000/v3
      project_name: admin
      username: admin
      password: password
      user_domain_name: Default
      project_domain_name: Default
    region_name: RegionOne

  site-b:
    auth:
      auth_url: http://site-b-controller:5000/v3
      project_name: admin
      username: admin
      password: password
      user_domain_name: Default
      project_domain_name: Default
    region_name: RegionOne

With this file in place, every openstack command accepts --os-cloud site-a or --os-cloud site-b to select the target site. The protectorclient plugin uses both entries when it needs to coordinate metadata across sites.

Register sites

After installation, register each site with the Protector service. You run this command once from a host that can reach both sites:

# Register the primary site
openstack --os-cloud site-a protector site create \
  --name site-a \
  --description "Primary datacenter" \
  --site-type primary \
  --auth-url http://site-a-controller:5000/v3 \
  --region-name RegionOne

# Register the secondary site
openstack --os-cloud site-a protector site create \
  --name site-b \
  --description "Secondary datacenter" \
  --site-type secondary \
  --auth-url http://site-b-controller:5000/v3 \
  --region-name RegionOne

The site-type values (primary and secondary) express the initial designation for these site records. In practice, primary and secondary are workload-relative — they swap on failover. Both sites run identical service configurations.

Validate that the service can reach each site's OpenStack endpoints:

openstack --os-cloud site-a protector site validate site-a
openstack --os-cloud site-a protector site validate site-b

Understand metadata synchronization behavior

Protector keeps a complete copy of all Protection Group metadata on both sites at all times. When you modify a Protection Group (add a member, change a policy, execute a failover), the service:

  1. Updates the local metadata and increments the version number.
  2. Checks that the peer site is reachable.
  3. Pushes the updated metadata to the peer site.
  4. Confirms the peer has accepted and written the update.

If the peer site is unreachable, the modification is blocked. This is intentional — allowing changes without synchronization would cause the two sites to diverge, making future failovers unreliable. If you encounter a blocked operation after a connectivity interruption, restore connectivity and then run openstack protector protection-group sync-force <pg-name> before retrying.


Examples

Example 1: Verify both services are running after installation

Run on each site after completing Step 9.

systemctl status protector-api protector-engine

Expected output (abbreviated):

ā— protector-api.service - OpenStack Protector API Service
   Loaded: loaded (/etc/systemd/system/protector-api.service; enabled)
   Active: active (running) since Mon 2025-01-01 08:00:00 UTC

ā— protector-engine.service - OpenStack Protector Engine Service
   Loaded: loaded (/etc/systemd/system/protector-engine.service; enabled)
   Active: active (running) since Mon 2025-01-01 08:00:01 UTC

If either service shows failed or activating, check the journal output shown in the Troubleshooting section.


Example 2: Health check the API endpoint

Confirm the API is accepting requests and returning version information:

curl -s http://controller:8788/

Expected output:

{
  "versions": [
    {
      "id": "v1",
      "status": "CURRENT",
      "min_version": "1.0",
      "max_version": "1.2"
    }
  ]
}

Example 3: Confirm database schema was applied

After running protector-manage db sync, verify the expected tables exist:

mysql -u protector -p protector -e "SHOW TABLES;"

Expected output (table names may vary by release):

+-------------------------+
| Tables_in_protector     |
+-------------------------+
| alembic_version         |
| consistency_groups      |
| cg_volumes              |
| dr_operations           |
| pg_members              |
| protection_groups       |
| replication_policies    |
| sites                   |
+-------------------------+

Example 4: Register both sites and validate connectivity

This example assumes clouds.yaml is configured with site-a and site-b entries.

# Register Site A
openstack --os-cloud site-a protector site create \
  --name site-a \
  --description "Primary datacenter - Boston" \
  --site-type primary \
  --auth-url http://10.0.1.10:5000/v3 \
  --region-name RegionOne

# Register Site B
openstack --os-cloud site-a protector site create \
  --name site-b \
  --description "Secondary datacenter - Seattle" \
  --site-type secondary \
  --auth-url http://10.0.2.10:5000/v3 \
  --region-name RegionOne

# Validate both sites
openstack --os-cloud site-a protector site validate site-a
openstack --os-cloud site-a protector site validate site-b

Expected output for each validate call:

+--------------------+--------+
| Field              | Value  |
+--------------------+--------+
| name               | site-a |
| status             | active |
| keystone_reachable | True   |
| nova_reachable     | True   |
| cinder_reachable   | True   |
| neutron_reachable  | True   |
+--------------------+--------+

If any endpoint shows False, resolve the connectivity issue before proceeding to create Protection Groups.


Troubleshooting

Use a consistent diagnostic approach for each issue: check systemctl status, then the service log at /var/log/protector/, then the systemd journal with journalctl -u <service> -n 100.


Service fails to start: protector-api or protector-engine enters failed state

Symptom: systemctl status protector-api shows Active: failed.

Likely causes and fixes:

  • Configuration syntax error — Run protector-api --config-file /etc/protector/protector.conf --help to surface parse errors before starting the service.
  • Database unreachable — Verify the connection string in [database] and test it manually: mysql -h <host> -u protector -p protector. Ensure MariaDB is running and port 3306 is open.
  • Port already in use — Check for a conflicting process: ss -tlnp | grep 8788. Change bind_port in [api] if needed.
  • Missing directories — Confirm /var/log/protector and /var/lib/protector exist and are owned by the protector user.

protector-manage db sync fails with Access denied

Symptom: (1044, "Access denied for user 'protector'@'%' to database 'protector'")

Likely cause: The database grants were not applied, or the hostname in the connection string does not match the GRANT statement.

Fix: Reconnect as root and re-issue the GRANT statements from Step 1, then retry db sync.


Keystone authentication errors in protector-api.log

Symptom: Log entries containing 401 Unauthorized or keystonemiddleware.auth_token [-] Unable to validate token.

Likely causes and fixes:

  • Incorrect credentials — Verify username and password in [keystone_authtoken] match the Keystone user: openstack user show protector.
  • Wrong auth_url — Confirm the URL points to the Keystone endpoint on the same site. Each site has its own auth_url.
  • Service user missing role — Re-run: openstack role add --project service --user protector admin.

API returns 503 or is unreachable after startup

Symptom: curl http://controller:8788/ times out or returns Connection refused.

Likely causes and fixes:

  • Service not running — Confirm systemctl status protector-api shows active (running).
  • Binding to wrong interface — If bind_host is set to a specific IP, ensure that IP is assigned to the controller: ip addr show. Use 0.0.0.0 to bind all interfaces.
  • Firewall blocking port — Check: iptables -L -n | grep 8788 or firewall-cmd --list-ports. Open the port if needed.

protector site validate reports one or more endpoints unreachable

Symptom: cinder_reachable: False or similar after registering sites.

Likely cause: Network path between the Protector controller and the remote site's OpenStack endpoints is not open, or the endpoint URL registered in Keystone is incorrect.

Fix: From the Protector controller, test connectivity directly:

curl -s http://<remote-site-controller>:5000/v3
curl -s http://<remote-site-controller>:8776/

Resolve firewall or routing issues, then re-run protector site validate.


Cinder volume_manage operations fail during failover with Policy doesn't allow

Symptom: DR operation log shows HTTP 403 when the engine attempts to manage a volume on the target site.

Likely cause: The Cinder policy changes from Step 6 were not applied, or were applied to the wrong site.

Fix: Confirm the policy entries exist in /etc/cinder/policy.yaml (or the Kolla-Ansible equivalent) on the target site, then restart the Cinder API and volume services:

systemctl restart cinder-api cinder-volume

Modification to a Protection Group is blocked with "remote site unreachable"

Symptom: Any write operation on a Protection Group returns an error stating the remote site cannot be reached and the operation is blocked.

Likely cause: This is expected behavior — Protector blocks modifications when it cannot synchronize metadata to the peer site, to prevent divergence.

Fix: Restore connectivity to the peer site, then force a metadata sync before retrying your operation:

openstack protector protection-group sync-status <pg-name>
openstack protector protection-group sync-force <pg-name>