Protection Groups
Creating and managing Protection Groups ā the core DR unit.
A Protection Group (PG) is the fundamental unit of disaster recovery in Trilio Site Recovery for OpenStack. It defines a set of Nova VMs that must fail over together as an atomic unit ā if any member needs to move to the secondary site, all members move together. Creating a Protection Group automatically provisions a Cinder Consistency Group on both the primary and secondary sites, along with a corresponding Pure Storage Protection Group (or Pod for sync replication), establishing the full replication chain before any DR event occurs. Understanding how to create, manage, and monitor Protection Groups is essential: every DR workflow ā failover, failback, and DR drills ā operates on a Protection Group as its target.
Before creating a Protection Group, ensure the following are in place:
- Two registered OpenStack sites ā a primary site and a secondary (DR) site, each with independent Nova, Cinder, Neutron, and Keystone endpoints. Both sites must be registered and reachable via
openstack protector site validate. - Trilio Protector services running on both sites ā
protector-apiandprotector-enginemust be active on each site independently. protectorclientOSC plugin installed ā the CLI plugin (protectorclient) must be installed on the host from which you run commands. This is the coordination layer that authenticates to both sites.clouds.yamlconfigured for both sites ā your~/.config/openstack/clouds.yamlmust contain named entries for both the primary and secondary site credentials.- Replication-enabled Cinder volume types on both sites ā each site must have a Cinder volume type with
replication_enabled='<is> True'and areplication_typeproperty set to'<in> async'or'<in> sync'. The volume type name does not need to match across sites, but the replication type must be consistent. See [Prepare Replication-Enabled Volume Types] for setup instructions. - All VM volumes must use a replication-enabled volume type ā any Cinder volume attached to a member VM that does not use a qualifying volume type will block the member-add operation.
- Pure Storage FlashArray replication configured ā the underlying storage arrays must already have replication configured between them. The Mock storage driver may be used for testing without physical arrays.
- Sufficient project quota ā Cinder Consistency Groups consume quota on both sites at creation time.
Protection Groups are created through the protectorclient OSC plugin or directly via the Protector REST API. No separate installation is required beyond the Protector service itself. The steps below walk through creating a Protection Group end-to-end.
Step 1: Source credentials for your primary site
source ~/site-a-openrc
# or use --os-cloud:
export OS_CLOUD=site-a
Step 2: Verify both sites are reachable
openstack protector site validate site-a
openstack protector site validate site-b
Both commands must return a successful connectivity status before you proceed. Protection Group creation is blocked if the secondary site is unreachable ā the service enforces this to prevent metadata divergence between sites.
Step 3: Confirm replication-enabled volume types are available
openstack protector site list-volume-types site-a
openstack protector site list-volume-types site-b
Identify the volume type you will use. It must appear on both sites and have replication_enabled='<is> True'.
Step 4: Create the Protection Group
openstack protector protection-group create \
--name prod-web-app \
--description "Production web application" \
--replication-type async \
--primary-site site-a \
--secondary-site site-b \
--volume-type replicated-ssd
The service performs the following actions synchronously:
- Validates that both sites are reachable.
- Validates that the specified volume type exists on both sites and has
replication_enabled='<is> True'. - Creates a Cinder Consistency Group on the primary site.
- Creates a matching Cinder Consistency Group on the secondary site.
- Creates the Protection Group record and links it 1:1 to the Consistency Group.
- Pushes metadata to the secondary site (version 1).
- Transitions the Protection Group status to
active.
Step 5: Verify the Protection Group was created
openstack protector protection-group show prod-web-app
Confirm status is active and consistency_group_id is populated before proceeding to add members or configure the replication policy.
Protection Group configuration is set at creation time and can be updated while the PG is in active status and both sites are reachable. The following table describes the key fields.
| Field | Required | Valid Values | Effect |
|---|---|---|---|
name | Yes | Alphanumeric string | Human-readable identifier; used in CLI commands by name or UUID |
description | No | Free text | Informational only |
replication-type | Yes | async, sync | Determines how Pure Storage replicates data. async uses Protection Group snapshots with a configurable interval; sync uses ActiveCluster Pods for zero-RPO replication. Must match the replication_type property on the selected volume type. |
primary-site | Yes | Registered site name or UUID | The site where workloads initially run. This designation is workload-relative and dynamic ā it updates automatically after a failover. |
secondary-site | Yes | Registered site name or UUID | The DR target site. Must be a different registered site from primary-site. |
volume-type | Yes | Cinder volume type name or UUID | The volume type used for all volumes in this Protection Group's Consistency Group. All volumes attached to member VMs must use this type. The type must have replication_enabled='<is> True' on both sites. |
Immutable fields: replication-type, primary-site, secondary-site, and volume-type cannot be changed after creation. To change these, delete the Protection Group and create a new one.
Metadata synchronization behavior: Any modification to a Protection Group (adding members, updating the replication policy, removing members) requires that the peer site is reachable at the time of the change. If the remote site is unreachable, the operation is blocked and returns an error. This is by design ā it prevents the two sites from holding divergent metadata, which would cause conflicts during a failover. Once the remote site recovers, use openstack protector protection-group sync-force <pg-name> to push the current metadata before retrying the blocked operation.
Status field: The status field is managed entirely by the service and reflects the current DR state of the Protection Group:
| Status | Meaning |
|---|---|
active | Healthy, replicating normally from the current primary site |
failing_over | A failover operation is in progress; no modifications allowed |
failed_over | Workloads are running on the secondary site after a successful failover |
failing_back | A failback operation is in progress; no modifications allowed |
error | A DR operation failed; inspect the associated DR Operation record for details |
deleting | A delete operation is in progress |
While the PG is in any transitional state (failing_over, failing_back, deleting), modifications are blocked.
The most common Protection Group operations are creating and deleting PGs, managing VM membership, checking sync status, and triggering force syncs when the remote site recovers from an outage.
List all Protection Groups
openstack protector protection-group list
Show details for a specific Protection Group
openstack protector protection-group show prod-web-app
This returns the PG status, the associated Consistency Group ID, the current primary site (which changes after failover), the failover count, and the last failover timestamp.
Add a VM to a Protection Group
When you add a VM, the service automatically discovers all Cinder volumes attached to that instance, validates that each volume uses the PG's designated replication-enabled volume type, and adds each volume to the Consistency Group on the primary site. If any attached volume uses a non-replication-enabled type, the operation fails and no changes are made.
openstack protector protection-group member-add prod-web-app \
--instance-id <nova-instance-uuid>
Each member-add also increments the metadata version and syncs to the secondary site. If the secondary site is unreachable at the time of the call, the operation is blocked.
List members of a Protection Group
openstack protector protection-group member-list prod-web-app
Remove a VM from a Protection Group
Removing a member also removes that VM's volumes from the Consistency Group. The VM continues running on the primary site ā it simply loses DR protection.
openstack protector protection-group member-remove prod-web-app \
--member-id <member-uuid>
View the associated Consistency Group
openstack protector consistency-group show prod-web-app
This shows the Cinder Consistency Group IDs on both the primary and secondary sites, the backend name, the volume count, and the replication status of each volume.
Check metadata sync status
Use this after any outage or before executing DR operations to confirm that both sites hold identical metadata.
openstack protector protection-group sync-status prod-web-app
Force a metadata sync to the remote site
Use this after the remote site recovers from an outage. The local site (where VMs are currently running) is treated as authoritative.
openstack protector protection-group sync-force prod-web-app
Delete a Protection Group
Deleting a PG cascades to the Consistency Group on both sites. All member VMs are unregistered from DR protection, but the VMs and their volumes are not deleted from Nova or Cinder.
openstack protector protection-group delete prod-web-app
Deletion is blocked if the remote site is unreachable, for the same metadata-consistency reasons that apply to modifications.
Example 1: Create an async Protection Group and add two web-tier VMs
This is the standard workflow for protecting a multi-VM application tier.
# Create the Protection Group
openstack protector protection-group create \
--name prod-web-tier \
--description "Web tier VMs - async replication to site-b" \
--replication-type async \
--primary-site site-a \
--secondary-site site-b \
--volume-type replicated-ssd
Expected output:
+------------------------+--------------------------------------+
| Field | Value |
+------------------------+--------------------------------------+
| id | pg-12345678-1234-1234-1234-123456789abc |
| name | prod-web-tier |
| status | active |
| replication_type | async |
| primary_site | site-a |
| secondary_site | site-b |
| consistency_group_id | cg-87654321-4321-4321-4321-87654321abcd |
| failover_count | 0 |
| last_failover_at | None |
+------------------------+--------------------------------------+
# Add the first web server
openstack protector protection-group member-add prod-web-tier \
--instance-id a1b2c3d4-e5f6-7890-abcd-ef1234567890
Expected output:
+------------------------+--------------------------------------+
| Field | Value |
+------------------------+--------------------------------------+
| id | member-aaaa1111-... |
| instance_id | a1b2c3d4-e5f6-7890-abcd-ef1234567890 |
| instance_name | web-server-1 |
| status | protected |
| volumes_added | 2 |
+------------------------+--------------------------------------+
# Add the second web server
openstack protector protection-group member-add prod-web-tier \
--instance-id b2c3d4e5-f6a7-8901-bcde-f12345678901
# Confirm both members are protected
openstack protector protection-group member-list prod-web-tier
Expected output:
+-------------------+---------------------+------------------+-----------+
| id | instance_name | instance_id | status |
+-------------------+---------------------+------------------+-----------+
| member-aaaa1111-..| web-server-1 | a1b2c3d4-... | protected |
| member-bbbb2222-..| web-server-2 | b2c3d4e5-... | protected |
+-------------------+---------------------+------------------+-----------+
Example 2: Verify metadata sync status after a remote site outage
After the secondary site recovers, always check sync status before executing any DR operation or making PG modifications.
openstack protector protection-group sync-status prod-web-tier
Expected output when out of sync:
Sync Status: ā OUT OF SYNC
Local Metadata:
Version: 4
Current Site: Site A
Last Modified: 2025-06-10T09:15:00Z
Remote Sync:
Status: FAILED
Remote Version: 3
Last Sync: 2025-06-10T08:45:00Z (30 minutes ago)
Error: Connection timeout
Action Required:
1. Check remote site connectivity
2. Force sync once remote site is available
# Once site-b is confirmed reachable, push authoritative metadata from site-a
openstack protector protection-group sync-force prod-web-tier
Expected output:
Force Sync Initiated...
Checking remote site connectivity...
ā
Site B is reachable
Syncing metadata (version 4)...
Gathering current metadata... ā
Calculating checksum... ā
Pushing to Site B... ā
Remote Site Response:
Status: success
Version: 4
Duration: 380ms
ā
Sync completed successfully
Both sites now at version 4
Example 3: Inspect the Consistency Group associated with a Protection Group
openstack protector consistency-group show prod-web-tier
Expected output:
+-------------------------+------------------------------------------+
| Field | Value |
+-------------------------+------------------------------------------+
| id | cg-87654321-4321-4321-4321-87654321abcd |
| protection_group_id | pg-12345678-... |
| volume_type_name | replicated-ssd |
| backend_name | pure@backend-a |
| primary_cg_id | cinder-cg-uuid-on-site-a |
| secondary_cg_id | cinder-cg-uuid-on-site-b |
| status | active |
| volume_count | 4 |
+-------------------------+------------------------------------------+
Example 4: Remove a VM from a Protection Group
This is safe to run while the PG is active. The VM keeps running; it simply loses DR protection.
# Find the member ID
openstack protector protection-group member-list prod-web-tier
# Remove by member ID
openstack protector protection-group member-remove prod-web-tier \
--member-id member-aaaa1111-bbbb-cccc-dddd-eeeeeeeeeeee
Expected output:
Member removed: web-server-1
ā
Volumes removed from consistency group (2 volumes)
ā
Local metadata updated (version 4 ā 5)
ā
Synced to site-b (version 5)
Issue: Protection Group creation fails with "volume type not replication-enabled"
Symptom: openstack protector protection-group create returns an error indicating the volume type does not support replication.
Cause: The Cinder volume type is missing the replication_enabled='<is> True' property, or the replication_type property is absent or mismatched with the requested --replication-type flag.
Fix: On both sites, inspect the volume type:
openstack volume type show replicated-ssd
Verify the properties include:
replication_enabled : <is> True
replication_type : <in> async
If either property is missing or incorrect, set it:
openstack volume type set replicated-ssd \
--property replication_enabled='<is> True' \
--property replication_type='<in> async'
Repeat on both sites. Then retry the Protection Group creation.
Issue: Protection Group creation fails with "remote site unreachable"
Symptom: Creation is rejected immediately with an error stating the secondary site cannot be reached.
Cause: The Protector service cannot contact the secondary site's protector-api endpoint. Metadata cannot be synchronized, so creation is blocked by design.
Fix: Verify the secondary site's protector-api is running and the endpoint is reachable from the primary site's protector-engine:
openstack protector site validate site-b
Check protector-api status on the secondary site:
systemctl status protector-api
Verify the secondary site's auth URL and region are correctly registered:
openstack protector site show site-b
Once connectivity is restored, retry the create command.
Issue: member-add fails with "volume not using a replication-enabled type"
Symptom: Adding a VM returns an error indicating one or more of its attached volumes uses an ineligible volume type.
Cause: One or more Cinder volumes attached to the target VM were created with a volume type that does not have replication_enabled='<is> True'. Every volume attached to a member VM must use the Protection Group's designated replication-enabled volume type.
Fix: Identify the offending volumes:
openstack server show <instance-uuid> -f json | grep -i volume
openstack volume show <volume-uuid> | grep volume_type
Migrate the volume to the correct type using a Cinder volume retype or create a new volume of the correct type, copy the data, and reattach it. There is no in-place fix ā the volume type on an existing volume cannot be changed to one that requires backend migration without explicit retype and backend support.
Issue: PG modification blocked with "Cannot modify protection group - remote site unreachable"
Symptom: A member-add, member-remove, or policy update returns an error stating the remote site is unreachable and the operation cannot proceed.
Cause: This is intentional behavior. All modifications require a successful metadata sync to the peer site before they are committed, to prevent the two sites from holding divergent metadata.
Fix: Wait for the remote site to recover, then force a sync to confirm both sites are aligned:
openstack protector protection-group sync-status <pg-name>
openstack protector protection-group sync-force <pg-name>
Once the sync status shows IN SYNC, retry the blocked operation.
Issue: Protection Group stuck in error status
Symptom: The PG status field shows error and no DR operations can be initiated.
Cause: A previous DR operation (failover, failback, or test failover) failed mid-execution. The PG is locked in error to prevent further operations on an inconsistent state.
Fix: Identify the failed operation and review its error message:
openstack protector operation list --protection-group <pg-name>
openstack protector operation show <operation-uuid>
Review the error_message and steps_failed fields in the operation record. Resolve the underlying cause (e.g., missing flavor on the secondary site, snapshot not found, storage connectivity issue). For planned failover failures, the service automatically rolls back ā verify the rollback completed (rollback_status: completed in the operation response) before retrying. Contact your storage administrator if the failure involves Pure Storage snapshot or replication errors.
Issue: Consistency Group shows secondary_cg_id as null
Symptom: openstack protector consistency-group show <pg-name> shows secondary_cg_id as None or empty.
Cause: The Cinder Consistency Group creation on the secondary site failed during Protection Group creation, or the secondary site was unreachable at the time. The Protection Group may have been created in a degraded state.
Fix: Check the secondary site's Cinder service:
# On site-b
openstack volume service list
Verify the Cinder backend with the replication-enabled volume type is up. If Cinder is healthy, delete the Protection Group and recreate it once both sites are confirmed reachable and healthy via openstack protector site validate.