Service Placement
Where protector-api, protector-engine, MariaDB, and RabbitMQ run
This page explains where each Trilio Site Recovery component ā protector-api, protector-engine, MariaDB, and RabbitMQ ā runs relative to your OpenStack infrastructure. Because Trilio Site Recovery requires two fully independent OpenStack clouds, you must deploy and configure these services on both sites: a primary site and a secondary (DR) site. Understanding service placement is foundational to every other deployment decision, from firewall rules to failover behavior, because there is no direct service-to-service communication between sites ā the CLI plugin and Horizon dashboard are the sole coordination layer.
Before planning your service placement, confirm the following:
- Two independent OpenStack clouds are available, each with its own Nova, Cinder, Neutron, and Keystone endpoints. The sites may be in separate physical datacenters or in the same cluster using different regions.
- OpenStack Victoria or later is running on both sites.
- MariaDB (or MySQL-compatible) is available on each site to back the local Protector database. A shared database between sites is explicitly not supported ā each site must have its own database instance.
- RabbitMQ is accessible from the
protector-engineprocess on each site. - Python 3.8 or later is installed on every host where you will run Protector services.
- Pure Storage FlashArray replication is configured between the two sites before you begin (async or sync, depending on your RPO requirements).
- You have
admincredentials on both OpenStack clouds to register service users and create Keystone endpoints. - Network connectivity exists between the two sites at the API plane ā each site's Keystone and
protector-apiendpoints (default port8788) must be reachable from the OSC CLI host and from the other site's management network.
Perform every step below on both sites unless a step is explicitly marked as site-specific.
Step 1: Create the Protector system user and directories
# Create a non-login system user to run the services
useradd --system --shell /bin/false protector
# Create required directories
mkdir -p /var/log/protector
mkdir -p /var/lib/protector
mkdir -p /etc/protector
# Set ownership
chown -R protector:protector /var/log/protector
chown -R protector:protector /var/lib/protector
chown -R protector:protector /etc/protector
Step 2: Create the Protector database (per site)
Each site needs its own database. The database must be reachable from the host running protector-engine and protector-api.
mysql -u root -p << EOF
CREATE DATABASE protector CHARACTER SET utf8;
GRANT ALL PRIVILEGES ON protector.* TO 'protector'@'localhost' IDENTIFIED BY 'PROTECTOR_DBPASS';
GRANT ALL PRIVILEGES ON protector.* TO 'protector'@'%' IDENTIFIED BY 'PROTECTOR_DBPASS';
FLUSH PRIVILEGES;
EOF
Replace PROTECTOR_DBPASS with a strong password. Record it ā you will need it in protector.conf.
Step 3: Register the Protector service in Keystone (per site)
Run the following on each site using that site's admin credentials.
# Source site admin credentials
source ~/admin-openrc
# Create the service user
openstack user create --domain default --password-prompt protector
# Grant admin role in the service project
openstack role add --project service --user protector admin
# Register the service catalog entry
openstack service create \
--name protector \
--description "OpenStack Disaster Recovery Service" \
protector
# Create the three endpoint types
openstack endpoint create --region RegionOne \
protector public http://controller:8788/v1/%\(tenant_id\)s
openstack endpoint create --region RegionOne \
protector internal http://controller:8788/v1/%\(tenant_id\)s
openstack endpoint create --region RegionOne \
protector admin http://controller:8788/v1/%\(tenant_id\)s
Replace controller with the hostname or IP of the node that will run protector-api on that site.
Step 4: Install the Protector package (per site)
git clone https://github.com/your-org/openstack-protector.git
cd openstack-protector
pip install -r requirements.txt
python setup.py install
Step 5: Initialize the database schema (per site)
protector-manage db sync
This applies all Alembic migrations against the local database. Run it after every upgrade.
Step 6: Install systemd unit files (per site)
Create /etc/systemd/system/protector-api.service:
[Unit]
Description=OpenStack Protector API Service
After=network.target
[Service]
Type=simple
User=protector
Group=protector
ExecStart=/usr/local/bin/protector-api --config-file /etc/protector/protector.conf
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
Create /etc/systemd/system/protector-engine.service:
[Unit]
Description=OpenStack Protector Engine Service
After=network.target
[Service]
Type=simple
User=protector
Group=protector
ExecStart=/usr/local/bin/protector-engine --config-file /etc/protector/protector.conf
Restart=on-failure
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
Step 7: Enable and start services (per site)
systemctl daemon-reload
systemctl enable protector-api protector-engine
systemctl start protector-api protector-engine
# Verify both are running
systemctl status protector-api
systemctl status protector-engine
The primary configuration file is /etc/protector/protector.conf. You must create this file on each site with values appropriate to that site's infrastructure. The configuration on Site A and Site B will differ in their database connection strings, Keystone auth URLs, and bound addresses ā but the structure is identical.
Minimal working configuration
[DEFAULT]
debug = False
log_dir = /var/log/protector
state_path = /var/lib/protector
[api]
bind_host = 0.0.0.0
bind_port = 8788
workers = 4
[database]
connection = mysql+pymysql://protector:PROTECTOR_DBPASS@controller/protector
[keystone_authtoken]
www_authenticate_uri = http://controller:5000
auth_url = http://controller:5000
memcached_servers = controller:11211
auth_type = password
project_domain_name = Default
user_domain_name = Default
project_name = service
username = protector
password = PROTECTOR_PASS
[oslo_policy]
policy_file = /etc/protector/policy.yaml
[service_credentials]
default_trust_roles = member,_member_
Key options explained
| Option | Section | Default | Effect |
|---|---|---|---|
bind_host | [api] | 0.0.0.0 | Interface protector-api listens on. Set to a specific IP to restrict access. |
bind_port | [api] | 8788 | Port for the REST API. Must match the Keystone endpoint URLs you registered. |
workers | [api] | 4 | Number of API worker processes. Increase for high-request-rate deployments. |
connection | [database] | ā | SQLAlchemy DSN for the local site's MariaDB instance. Each site points to its own database. |
debug | [DEFAULT] | False | Set to True to emit verbose logs. Do not use in production ā logs include sensitive metadata. |
log_dir | [DEFAULT] | /var/log/protector | Directory for API and engine log files. |
state_path | [DEFAULT] | /var/lib/protector | Working directory for runtime state files. |
default_trust_roles | [service_credentials] | member,_member_ | Keystone roles delegated via trust when the service acts on behalf of a tenant. Both member and _member_ are listed for compatibility across OpenStack releases. |
policy_file | [oslo_policy] | /etc/protector/policy.yaml | Path to the RBAC policy file. |
Why each site has its own database
protector-engine writes DR operation state, protection group metadata, and site registration records to the local database. Metadata synchronization between sites happens at the API layer through explicit sync calls ā not through a shared database. This design means each site remains independently operable, which is critical: if Site A's database were shared with Site B, a network partition between sites would prevent both sites from recording state.
RBAC policy file
Create /etc/protector/policy.yaml on each site:
"context_is_admin": "role:admin"
"admin_or_owner": "is_admin:True or project_id:%(project_id)s"
"default": "rule:admin_or_owner"
"protector:protection_groups:index": "rule:default"
"protector:protection_groups:show": "rule:default"
"protector:protection_groups:create": "rule:default"
"protector:protection_groups:update": "rule:default"
"protector:protection_groups:delete": "rule:default"
"protector:members:index": "rule:default"
"protector:members:create": "rule:default"
"protector:members:delete": "rule:default"
"protector:operations:index": "rule:default"
"protector:operations:show": "rule:default"
"protector:operations:action": "rule:default"
"protector:policies:show": "rule:default"
"protector:policies:create": "rule:default"
Cinder policy adjustments (both sites)
The protector service needs permissions beyond the default member role for two Cinder operations used during failover. Add the following to /etc/cinder/policy.yaml on both sites:
# Required for importing replicated volumes into Cinder after failover
"volume_extension:volume_manage": "rule:admin_or_owner"
# Required for unmanaging volumes during failback
"volume_extension:volume_unmanage": "rule:admin_or_owner"
# Required to discover the correct Cinder volume service host
"volume_extension:services:index": "rule:admin_or_owner"
For Kolla-Ansible deployments, place these in /etc/kolla/config/cinder/policy.yaml and then run:
kolla-ansible -i inventory reconfigure -t cinder
Once both sites are running protector-api and protector-engine with their own databases, the two deployments operate independently ā they do not communicate with each other directly. You interact with both sites through the openstack CLI (using the protectorclient plugin) or the Horizon dashboard, which authenticates to whichever site you target and pushes metadata sync calls to the peer site when needed.
Confirming services are reachable
Verify the health endpoint on each site before proceeding:
# Site A
curl http://site-a-controller:8788/
# Site B
curl http://site-b-controller:8788/
A successful response returns the API version discovery document.
Where to run CLI commands
The protectorclient CLI plugin can be run from any host that has network access to both sites' Keystone and protector-api endpoints. It does not need to run on the controller nodes themselves. A typical operator workstation with ~/.config/openstack/clouds.yaml configured for both sites is the recommended pattern:
clouds:
site-a:
auth:
auth_url: http://site-a-controller:5000/v3
project_name: admin
username: admin
password: YOUR_PASSWORD
user_domain_name: Default
project_domain_name: Default
region_name: RegionOne
site-b:
auth:
auth_url: http://site-b-controller:5000/v3
project_name: admin
username: admin
password: YOUR_PASSWORD
user_domain_name: Default
project_domain_name: Default
region_name: RegionOne
Understanding the active/standby split
Protector services run on both sites at all times ā there is no concept of "the primary site runs the service and the secondary site does not." The site where VMs are currently running is considered authoritative for metadata, but the protector-api and protector-engine processes on the standby site are fully active and ready to receive failover commands. This symmetrical design means that after a failover, the site roles swap and no service restarts are required.
Checking service logs
# API service log
tail -f /var/log/protector/protector-api.log
# Engine service log
tail -f /var/log/protector/protector-engine.log
# Systemd journal (live)
journalctl -u protector-api -f
journalctl -u protector-engine -f
Example 1: Verify service placement after installation
Confirm that both services are listening on the expected port on each controller node.
# On the Site A controller
netstat -tlnp | grep 8788
Expected output:
tcp 0 0 0.0.0.0:8788 0.0.0.0:* LISTEN <pid>/protector-api
# Confirm systemd reports both services active
systemctl status protector-api protector-engine
Expected output (truncated):
ā protector-api.service - OpenStack Protector API Service
Loaded: loaded (/etc/systemd/system/protector-api.service; enabled)
Active: active (running) since ...
ā protector-engine.service - OpenStack Protector Engine Service
Loaded: loaded (/etc/systemd/system/protector-engine.service; enabled)
Active: active (running) since ...
Repeat this verification on the Site B controller.
Example 2: Confirm database connectivity from the service
Before registering sites, verify the protector-engine can reach its local database.
# Test the database credentials from the controller
mysql -h controller -u protector -p protector -e "SHOW TABLES;"
Expected output after db sync has run:
+----------------------+
| Tables_in_protector |
+----------------------+
| alembic_version |
| consistency_groups |
| cg_volumes |
| dr_operations |
| pg_members |
| protection_groups |
| replication_policies |
| sites |
+----------------------+
If the table list is empty, re-run protector-manage db sync.
Example 3: Validate API endpoint registration in Keystone
After completing the Keystone registration steps on both sites, confirm the endpoint is discoverable.
# On Site A
source ~/admin-openrc
openstack endpoint list --service protector
Expected output:
+------------------+-----------+--------------+--------------+---------+-----------+------------------------------------------+
| ID | Region | Service Name | Service Type | Enabled | Interface | URL |
+------------------+-----------+--------------+--------------+---------+-----------+------------------------------------------+
| <id> | RegionOne | protector | protector | True | public | http://site-a-controller:8788/v1/%(tenant_id)s |
| <id> | RegionOne | protector | protector | True | internal | http://site-a-controller:8788/v1/%(tenant_id)s |
| <id> | RegionOne | protector | protector | True | admin | http://site-a-controller:8788/v1/%(tenant_id)s |
+------------------+-----------+--------------+--------------+---------+-----------+------------------------------------------+
Repeat on Site B using ~/site-b-openrc.
Example 4: Confirm the Cinder policy changes are in effect
After updating /etc/cinder/policy.yaml, verify that the protector service user can list volume services (a proxy check for the policy changes).
# Authenticate as the protector service user
export OS_USERNAME=protector
export OS_PASSWORD=PROTECTOR_PASS
export OS_PROJECT_NAME=service
# ... (remaining auth env vars for Site A)
openstack volume service list
If the command returns the list of Cinder volume services without a 403 Forbidden error, the policy change is active. If it fails, restart Cinder after applying the policy file:
systemctl restart openstack-cinder-api
protector-api fails to start: database connection error
Symptom: systemctl status protector-api shows Active: failed and the journal contains OperationalError: (pymysql.err.OperationalError) Can't connect to MySQL server.
Likely cause: The connection DSN in protector.conf is wrong, the database does not exist, or the MariaDB service is not running.
Fix:
- Verify MariaDB is running:
systemctl status mariadb - Test the credentials directly:
mysql -h <host> -u protector -p protector -e "SELECT 1;" - Check that the
protectordatabase exists:mysql -u root -p -e "SHOW DATABASES;" - If the database is missing, re-run the
CREATE DATABASEandGRANTstatements from the installation steps. - If credentials are wrong, update
protector.confand restart:systemctl restart protector-api
protector-api returns 401 Unauthorized for all requests
Symptom: Every API call returns HTTP 401, even with a valid token.
Likely cause: The [keystone_authtoken] section in protector.conf is pointing to the wrong Keystone URL, or the protector service user does not exist or has the wrong password.
Fix:
- Confirm the service user exists and can authenticate:
openstack token issue --os-username protector --os-project-name service - Verify
auth_urlandwww_authenticate_uriin[keystone_authtoken]match the Keystone v3 endpoint for that site. - Confirm the password in
protector.confmatches what was set in Keystone. - Restart after any changes:
systemctl restart protector-api
protector-engine starts but DR operations hang indefinitely
Symptom: DR operations are created (status: running) but never progress beyond 0% and never complete.
Likely cause: protector-engine cannot reach RabbitMQ, or RabbitMQ is not running.
Fix:
- Confirm RabbitMQ is running on the expected host:
systemctl status rabbitmq-server - Check the
protector-enginelog for AMQP connection errors:journalctl -u protector-engine | grep -i rabbit - Verify RabbitMQ is accessible from the engine host on the expected port (default
5672):telnet <rabbitmq-host> 5672 - Review the
[oslo_messaging_rabbit]section ofprotector.confif you have customized the RabbitMQ connection.
Keystone endpoint registered but service not discoverable from CLI
Symptom: openstack endpoint list --service protector returns no rows, or the CLI reports EndpointNotFound.
Likely cause: The service type used during openstack service create does not match the type the client looks up, or the endpoint was created against the wrong region.
Fix:
- List all services and confirm the entry:
openstack service list | grep protector - Confirm the endpoint region matches your
clouds.yamlregion_name:openstack endpoint list --service protector - If the service or endpoint is missing, re-run the Keystone registration steps from the installation section.
Metadata sync blocked: "Cannot modify protection group ā remote site unreachable"
Symptom: Any attempt to modify a Protection Group (add a member, update a policy) fails with a message indicating the peer site is unreachable.
Likely cause: The protector-api on the peer site is down, or network connectivity between the sites on port 8788 is blocked.
Fix:
- Confirm the peer site's
protector-apiis running:systemctl status protector-apion the remote controller. - Test connectivity from the local site to the remote API:
curl http://<remote-controller>:8788/ - Check firewall rules on both sites allow TCP port
8788between site management networks. - Once connectivity is restored, check the sync status and push any pending changes:
openstack protector protection-group sync-status <pg-name>followed byopenstack protector protection-group sync-force <pg-name>.
This behavior is by design ā modifications are blocked when the peer is unreachable to prevent metadata divergence between sites.
Cinder volume_manage call fails with 403 Forbidden during failover
Symptom: A failover DR operation fails at the "Manage volumes into Cinder" step with a 403 error in the engine log.
Likely cause: The Cinder policy changes from the prerequisites have not been applied on the target site, or Cinder was not restarted after the policy file was updated.
Fix:
- On the target site, verify
/etc/cinder/policy.yamlcontains the three required rules (volume_manage,volume_unmanage,services:index). - Restart the Cinder API service:
systemctl restart openstack-cinder-api - For Kolla-Ansible deployments, re-run:
kolla-ansible -i inventory reconfigure -t cinder - Retry the failover operation.