Protection Groups Panel
Creating PGs, managing VM membership, viewing replication status
The Protection Groups panel is your primary workspace for defining which Nova VMs are protected under Trilio Site Recovery and for monitoring the health of their replication. A Protection Group (PG) is a logical unit ā it groups one or more VM instances that must fail over together, binds them to a Cinder Consistency Group for crash-consistent replication, and tracks the lifecycle of every DR operation performed against those workloads. This page explains how to create a Protection Group, add VMs to it, interpret replication status, and manage group membership through both the OSC CLI plugin and the Horizon dashboard.
Before working with Protection Groups, confirm the following are in place:
- Two registered OpenStack sites ā both the primary site and the secondary (DR) site must be registered with the Protector service and reachable. Verify with
openstack protector site list. - Sites validated ā each site must pass connectivity and capability checks (
openstack protector site validate <site-name>). - Replication-enabled Cinder volume types exist on both sites. Each volume type must carry
replication_enabled='<is> True'and areplication_type='<in> async'orreplication_type='<in> sync'property. Volume types with these properties present on both sites are the only ones eligible for Protection Group creation. - All VM volumes use a replication-enabled type ā any Nova instance you intend to add must have all attached Cinder volumes backed by a qualifying volume type on the same storage backend.
- Replication policy credentials available ā you will need the Pure Storage FlashArray management URLs and API tokens for both arrays before you can validate replication readiness.
protectorclientOSC plugin installed on your workstation, authenticated against the primary site, and configured withclouds.yamlentries for both sites.- Trilio Site Recovery API version 1.1 or later ā confirm with
openstack protector --version.
The Protection Groups panel itself requires no separate installation ā it is part of the protectorclient OSC plugin and the Protector Horizon extension deployed during the standard Trilio Site Recovery installation. If the panel is missing, verify the plugin is installed and the API endpoint is registered.
Step 1 ā Confirm the OSC plugin is installed
pip show python-protectorclient
The command should return package metadata. If it returns nothing, install the client:
pip install python-protectorclient
Step 2 ā Confirm the Protector API endpoint is registered in Keystone
openstack endpoint list --service protector
Expected output includes public, internal, and admin endpoints at port 8788.
Step 3 ā Confirm both Protector services are running on each site
On each controller node:
systemctl status protector-api
systemctl status protector-engine
Both services must be active (running). If either is stopped:
systemctl start protector-api protector-engine
Step 4 ā Configure multi-site credentials in clouds.yaml
Ensure ~/.config/openstack/clouds.yaml contains entries for both sites:
clouds:
site-a:
auth:
auth_url: http://site-a-controller:5000/v3
project_name: <your-project>
username: <your-user>
password: <your-password>
user_domain_name: Default
project_domain_name: Default
region_name: RegionOne
site-b:
auth:
auth_url: http://site-b-controller:5000/v3
project_name: <your-project>
username: <your-user>
password: <your-password>
user_domain_name: Default
project_domain_name: Default
region_name: RegionOne
Step 5 ā Verify Protection Groups panel access
openstack protector protection-group list
An empty list (not an error) confirms the panel is accessible.
Protection Group behavior is shaped by the parameters set at creation time and by the replication policy attached afterward. The following options are fixed at creation; changing them later requires deleting and recreating the group.
Protection Group creation parameters
| Parameter | Required | Valid values | Effect |
|---|---|---|---|
--name | Yes | Any string, unique per tenant | Human-readable identifier for the group |
--description | No | Any string | Informational label; stored in metadata |
--primary-site | Yes | Registered site name | The site where VMs currently run. This designation is workload-relative ā it swaps on failover. |
--secondary-site | Yes | Registered site name (different from primary) | The DR site to which workloads will be promoted |
--replication-type | Yes | async, sync | Controls replication mode for the underlying Pure Storage Protection Group. async uses periodic snapshots (configurable RPO). sync uses ActiveCluster Pods with zero RPO but requires quiescence on writes. |
--volume-type | Yes | Cinder volume type name or ID | Must have replication_enabled='<is> True' and a matching replication_type property on both sites. All volumes added to the group must use this type. |
Automatic side-effects of creation
When you create a Protection Group, the service automatically:
- Creates a Cinder Consistency Group on the primary site.
- Creates a matching Cinder Consistency Group on the secondary site.
- Creates a metadata record (version 1) and synchronizes it to the secondary site.
- Establishes the 1:1:1 binding between the Protection Group, the Consistency Group, and the Pure Storage Protection Group.
You cannot override this automation ā the 1:1 relationship is structural.
Metadata synchronization behavior
Every modification to a Protection Group (adding members, updating description, policy changes) increments a version number and immediately attempts to push the updated metadata to the peer site. If the peer site is unreachable, the modification is blocked. This is intentional: the service prevents metadata divergence because a diverged secondary cannot execute a reliable failover. You must wait for the peer to become reachable or use openstack protector protection-group sync-force once connectivity is restored.
Replication policy parameters
The replication policy is a separate object attached to the group after creation. It stores the Pure Storage FlashArray credentials and RPO target.
| Parameter | Required | Effect |
|---|---|---|
--primary-fa-url | Yes | HTTPS management URL of the primary FlashArray |
--primary-fa-token | Yes | API token for the primary FlashArray (stored encrypted) |
--secondary-fa-url | Yes | HTTPS management URL of the secondary FlashArray |
--secondary-fa-token | Yes | API token for the secondary FlashArray (stored encrypted) |
--pure-pg-name | Yes | Name of the Pure Storage Protection Group ā must match the name configured on the array |
--replication-interval | Async only | Snapshot replication interval in seconds (e.g., 300 for 5 minutes) |
--rpo-minutes | Async only | Recovery Point Objective in minutes; used for replication readiness validation |
Security note: FlashArray API tokens are encrypted at rest. Use Barbican or an equivalent secrets manager in production deployments rather than passing tokens directly on the command line.
Creating a Protection Group
Create the group from the site where the VMs currently reside. The OSC plugin authenticates to both sites automatically using your clouds.yaml.
openstack protector protection-group create \
--name prod-web-app \
--description "Production web application tier" \
--replication-type async \
--primary-site site-a \
--secondary-site site-b \
--volume-type replicated-ssd
Wait for status to transition from creating to active before proceeding. Poll with:
openstack protector protection-group show prod-web-app
Attaching a replication policy
A Protection Group without a policy cannot validate replication readiness or execute failover. Attach a policy immediately after the group reaches active:
openstack protector protection-group policy-create prod-web-app \
--primary-fa-url https://flasharray-a.example.com \
--primary-fa-token "T-12345678-abcd-efgh-ijkl-mnopqrstuvwx" \
--secondary-fa-url https://flasharray-b.example.com \
--secondary-fa-token "T-87654321-dcba-hgfe-lkji-xwvutsrqponm" \
--pure-pg-name "pg-prod-web-app" \
--replication-interval 300 \
--rpo-minutes 15
Adding VMs to the group
Add each VM by Nova instance ID. The service discovers all attached Cinder volumes, validates they use the group's volume type, and adds them to the Consistency Group automatically. Every addition triggers a metadata sync to the secondary site.
openstack protector protection-group member-add prod-web-app \
--instance-id <nova-instance-uuid>
Repeat for each VM that must fail over as part of this group. Volumes belonging to VMs with mixed volume types (some replicated, some not) will cause the member-add to fail ā all volumes for a given VM must use the group's volume type.
Listing members
openstack protector protection-group member-list prod-web-app
Each member shows its status field. Healthy members show protected. A member in error state requires investigation before failover.
Viewing replication status
# Protection Group-level status
openstack protector protection-group show prod-web-app
# Consistency Group and volume-level replication detail
openstack protector consistency-group show prod-web-app
# Metadata sync status between sites
openstack protector protection-group sync-status prod-web-app
The sync-status output reports the local metadata version, the remote metadata version, and whether the two sites are SYNCED, FAILED, or UNREACHABLE. Both versions must match before you execute a failover.
Removing a VM from the group
Removing a member detaches its volumes from the Consistency Group and syncs the updated metadata to the secondary site. The VM itself is not affected ā only its DR coverage is removed.
openstack protector protection-group member-remove prod-web-app \
--member-id <member-uuid>
Deleting a Protection Group
Deleting a Protection Group cascades to the Consistency Group and all member records on both sites. VMs are not deleted ā only the DR configuration is removed.
openstack protector protection-group delete prod-web-app
Deletion is blocked if a DR operation is currently in progress (failing_over, failing_back). Wait for the operation to complete or reach a terminal state first.
Example 1 ā Create a Protection Group for an async-replicated application tier
This example creates a group for a three-VM web tier with a 5-minute RPO.
openstack protector protection-group create \
--name prod-web-app \
--description "Production web tier - async replication" \
--replication-type async \
--primary-site site-a \
--secondary-site site-b \
--volume-type replicated-ssd
Expected output:
+------------------------+----------------------------------------------+
| Field | Value |
+------------------------+----------------------------------------------+
| id | pg-12345678-1234-1234-1234-123456789abc |
| name | prod-web-app |
| status | creating |
| replication_type | async |
| primary_site | site-a |
| secondary_site | site-b |
| consistency_group_id | cg-87654321-4321-4321-4321-876543210def |
| failover_count | 0 |
| created_at | 2025-01-15T09:00:00Z |
+------------------------+----------------------------------------------+
Poll until status shows active before adding members.
Example 2 ā Add three VMs to the group
# Add web server 1
openstack protector protection-group member-add prod-web-app \
--instance-id a1b2c3d4-1111-2222-3333-aabbccddeeff
# Add web server 2
openstack protector protection-group member-add prod-web-app \
--instance-id b2c3d4e5-1111-2222-3333-bbccddeeffaa
# Add database server
openstack protector protection-group member-add prod-web-app \
--instance-id c3d4e5f6-1111-2222-3333-ccddeeffaabb
Expected output for each member-add:
+------------------------+----------------------------------------------+
| Field | Value |
+------------------------+----------------------------------------------+
| id | member-aaaabbbb-1234-5678-90ab-ccddeeff0011 |
| instance_id | a1b2c3d4-1111-2222-3333-aabbccddeeff |
| instance_name | web-server-1 |
| status | protected |
| volumes_added | 2 |
+------------------------+----------------------------------------------+
Example 3 ā List members and verify their protection status
openstack protector protection-group member-list prod-web-app
Expected output:
+------------------+---------------+---------------+-----------+----------------+
| ID | Instance Name | Instance ID | Status | Volumes |
+------------------+---------------+---------------+-----------+----------------+
| member-aaaa... | web-server-1 | a1b2c3d4-... | protected | 2 |
| member-bbbb... | web-server-2 | b2c3d4e5-... | protected | 2 |
| member-cccc... | db-server-1 | c3d4e5f6-... | protected | 3 |
+------------------+---------------+---------------+-----------+----------------+
All members must show protected before you execute a failover or DR drill.
Example 4 ā Check metadata sync status between sites
openstack protector protection-group sync-status prod-web-app
Expected output when fully synchronized:
Sync Status: ā
IN SYNC
Local Metadata:
Version: 4
Current Site: site-a
Last Modified: 2025-01-15T09:15:00Z
Remote Sync:
Status: SYNCED
Remote Version: 4
Last Sync: 2025-01-15T09:15:05Z (5 seconds ago)
Validation:
ā
Versions match (4 = 4)
ā
Sync status is 'synced'
ā
Last sync is recent
Both sites have identical metadata.
Example 5 ā Force a metadata sync after a peer site outage
If site-b was briefly unreachable and the local version has advanced, use force-sync once the peer recovers:
openstack protector protection-group sync-force prod-web-app
Expected output:
Force Sync Initiated...
Checking remote site connectivity...
ā
site-b is reachable
Syncing metadata (version 5)...
Gathering current metadata... ā
Calculating checksum... ā
Pushing to site-b... ā
Remote Site Response:
Status: success
Version: 5
Duration: 450ms
ā
Sync completed successfully
Both sites now at version 5
Example 6 ā View Consistency Group volume membership
openstack protector consistency-group show prod-web-app
Expected output:
+------------------------+----------------------------------------------+
| Field | Value |
+------------------------+----------------------------------------------+
| id | cg-87654321-4321-4321-4321-876543210def |
| protection_group_id | pg-12345678-1234-1234-1234-123456789abc |
| volume_type_name | replicated-ssd |
| backend_name | pure@backend-a |
| primary_cg_id | <cinder-cg-uuid-site-a> |
| secondary_cg_id | <cinder-cg-uuid-site-b> |
| status | active |
| volume_count | 7 |
+------------------------+----------------------------------------------+
The secondary_cg_id field being populated confirms that the secondary site Consistency Group was created successfully during Protection Group initialization.
Issue: protection-group create fails with "volume type not eligible for replication"
Symptom: The create command exits immediately with an error referencing the volume type.
Likely cause: The specified Cinder volume type is missing replication_enabled='<is> True' or replication_type properties on one or both sites.
Fix:
- On each site, inspect the volume type:
openstack volume type show replicated-ssd - Confirm both properties are present:
openstack volume type set replicated-ssd \ --property replication_enabled='<is> True' \ --property replication_type='<in> async' - Repeat on the secondary site, then retry the Protection Group creation.
Issue: member-add fails with "volume type mismatch"
Symptom: Adding a VM returns an error stating that one or more of its volumes do not match the group's volume type.
Likely cause: The VM has volumes backed by a non-replication-enabled volume type (e.g., a local SSD type or an ephemeral-backed root disk). All Cinder volumes attached to a VM must use the Protection Group's volume type.
Fix:
- List the VM's volumes:
openstack server show <instance-id>and inspect thevolumes_attachedfield. - For each volume, check its type:
openstack volume show <volume-id>. - Migrate non-conforming volumes to the replication-enabled type, or exclude VMs whose storage cannot be migrated.
Issue: Member shows status: error in member-list
Symptom: One or more members display error instead of protected after being added.
Likely cause: Volume addition to the Cinder Consistency Group failed, typically because the volume is on a different storage backend than the other volumes in the group (all volumes must share the same backend), or the backend reported a capacity or capability error.
Fix:
- Show the member detail and check the error message:
openstack protector protection-group member-show <pg> <member-id>. - Verify the volume's backend:
openstack volume show <volume-id>ā look foros-vol-host-attr:host. - All volumes in the Consistency Group must share the same backend value (e.g.,
pure@backend-a). If the volume is on a different backend, it cannot be in this group. - Remove the problematic member, resolve the storage placement, and re-add.
Issue: member-add or member-remove is blocked with "remote site unreachable"
Symptom: Modifications to the Protection Group are rejected even though your local site is healthy.
Likely cause: The service requires both sites to be reachable before committing any metadata change. This prevents the two sites from diverging into inconsistent states. If the peer site is down, modifications are intentionally blocked.
Fix:
- Check peer site reachability:
openstack protector site validate site-b. - If the peer site is temporarily offline, wait for it to recover.
- Once it recovers, check sync status:
openstack protector protection-group sync-status <pg>. - If versions differ, force a sync:
openstack protector protection-group sync-force <pg>. - Retry your membership change.
Issue: sync-status shows FAILED or OUT OF SYNC after a completed operation
Symptom: The sync-status command reports a version mismatch or a failed sync timestamp from a recent operation.
Likely cause: A transient network interruption occurred between sites during a metadata push. The local site completed the operation but the remote confirmation was not received.
Fix:
- Verify the peer site is now reachable:
openstack protector site validate <site>. - Review the sync history to understand what changed:
openstack protector protection-group sync-log <pg> --limit 10. - If the peer is reachable and the local version is higher, push the current state:
openstack protector protection-group sync-force <pg>. - Confirm both versions match before executing any DR operation.
Issue: Protection Group status is stuck in failing_over or failing_back
Symptom: A DR operation started but has not completed or failed. The Protection Group status has not moved for an extended period.
Likely cause: The protector-engine service may have crashed mid-operation, or a step in the workflow (e.g., volume promotion or VM recreation on the target site) encountered an unrecoverable error that did not transition the operation to failed.
Fix:
- Check the engine service on both sites:
systemctl status protector-engine. - Review engine logs for the specific operation:
journalctl -u protector-engine --since "30 minutes ago". - Retrieve the operation detail:
openstack protector operation show <op-id>. - If the engine is running but the operation is genuinely stuck, contact support ā do not manually delete the Protection Group record while an operation is in an indeterminate state, as this can leave orphaned volumes on the secondary site.