Guide

Protection Groups

Creating and managing Protection Groups — the core DR unit.

master

Overview

A Protection Group (PG) is the fundamental unit of disaster recovery in Trilio Site Recovery for OpenStack. It defines a set of Nova VMs that must fail over together as an atomic unit — if any member needs to move to the secondary site, all members move together. Creating a Protection Group automatically provisions a Cinder Consistency Group on both the primary and secondary sites, along with a corresponding Pure Storage Protection Group (or Pod for sync replication), establishing the full replication chain before any DR event occurs. Understanding how to create, manage, and monitor Protection Groups is essential: every DR workflow — failover, failback, and DR drills — operates on a Protection Group as its target.

Prerequisites

Before creating a Protection Group, ensure the following are in place:

Two registered OpenStack sites — a primary site and a secondary (DR) site, each with independent Nova, Cinder, Neutron, and Keystone endpoints. Both sites must be registered and reachable via openstack protector site validate.
Trilio Protector services running on both sites — protector-api and protector-engine must be active on each site independently.
protectorclient OSC plugin installed — the CLI plugin (protectorclient) must be installed on the host from which you run commands. This is the coordination layer that authenticates to both sites.
clouds.yaml configured for both sites — your ~/.config/openstack/clouds.yaml must contain named entries for both the primary and secondary site credentials.
Replication-enabled Cinder volume types on both sites — each site must have a Cinder volume type with replication_enabled='<is> True' and a replication_type property set to '<in> async' or '<in> sync'. The volume type name does not need to match across sites, but the replication type must be consistent. See [Prepare Replication-Enabled Volume Types] for setup instructions.
All VM volumes must use a replication-enabled volume type — any Cinder volume attached to a member VM that does not use a qualifying volume type will block the member-add operation.
Pure Storage FlashArray replication configured — the underlying storage arrays must already have replication configured between them. The Mock storage driver may be used for testing without physical arrays.
Sufficient project quota — Cinder Consistency Groups consume quota on both sites at creation time.

Installation

Protection Groups are created through the protectorclient OSC plugin or directly via the Protector REST API. No separate installation is required beyond the Protector service itself. The steps below walk through creating a Protection Group end-to-end.

Step 1: Source credentials for your primary site

source ~/site-a-openrc
# or use --os-cloud:
export OS_CLOUD=site-a

Step 2: Verify both sites are reachable

openstack protector site validate site-a
openstack protector site validate site-b

Both commands must return a successful connectivity status before you proceed. Protection Group creation is blocked if the secondary site is unreachable — the service enforces this to prevent metadata divergence between sites.

Step 3: Confirm replication-enabled volume types are available

openstack protector site list-volume-types site-a
openstack protector site list-volume-types site-b

Identify the volume type you will use. It must appear on both sites and have replication_enabled='<is> True'.

Step 4: Create the Protection Group

openstack protector protection-group create \
  --name prod-web-app \
  --description "Production web application" \
  --replication-type async \
  --primary-site site-a \
  --secondary-site site-b \
  --volume-type replicated-ssd

The service performs the following actions synchronously:

Validates that both sites are reachable.
Validates that the specified volume type exists on both sites and has replication_enabled='<is> True'.
Creates a Cinder Consistency Group on the primary site.
Creates a matching Cinder Consistency Group on the secondary site.
Creates the Protection Group record and links it 1:1 to the Consistency Group.
Pushes metadata to the secondary site (version 1).
Transitions the Protection Group status to active.

Step 5: Verify the Protection Group was created

openstack protector protection-group show prod-web-app

Confirm status is active and consistency_group_id is populated before proceeding to add members or configure the replication policy.

Configuration

Protection Group configuration is set at creation time and can be updated while the PG is in active status and both sites are reachable. The following table describes the key fields.

Field	Required	Valid Values	Effect
`name`	Yes	Alphanumeric string	Human-readable identifier; used in CLI commands by name or UUID
`description`	No	Free text	Informational only
`replication-type`	Yes	`async`, `sync`	Determines how Pure Storage replicates data. `async` uses Protection Group snapshots with a configurable interval; `sync` uses ActiveCluster Pods for zero-RPO replication. Must match the `replication_type` property on the selected volume type.
`primary-site`	Yes	Registered site name or UUID	The site where workloads initially run. This designation is workload-relative and dynamic — it updates automatically after a failover.
`secondary-site`	Yes	Registered site name or UUID	The DR target site. Must be a different registered site from `primary-site`.
`volume-type`	Yes	Cinder volume type name or UUID	The volume type used for all volumes in this Protection Group's Consistency Group. All volumes attached to member VMs must use this type. The type must have `replication_enabled='<is> True'` on both sites.

Immutable fields: replication-type, primary-site, secondary-site, and volume-type cannot be changed after creation. To change these, delete the Protection Group and create a new one.

Metadata synchronization behavior: Any modification to a Protection Group (adding members, updating the replication policy, removing members) requires that the peer site is reachable at the time of the change. If the remote site is unreachable, the operation is blocked and returns an error. This is by design — it prevents the two sites from holding divergent metadata, which would cause conflicts during a failover. Once the remote site recovers, use openstack protector protection-group sync-force <pg-name> to push the current metadata before retrying the blocked operation.

Status field: The status field is managed entirely by the service and reflects the current DR state of the Protection Group:

Status	Meaning
`active`	Healthy, replicating normally from the current primary site
`failing_over`	A failover operation is in progress; no modifications allowed
`failed_over`	Workloads are running on the secondary site after a successful failover
`failing_back`	A failback operation is in progress; no modifications allowed
`error`	A DR operation failed; inspect the associated DR Operation record for details
`deleting`	A delete operation is in progress

While the PG is in any transitional state (failing_over, failing_back, deleting), modifications are blocked.

Usage

The most common Protection Group operations are creating and deleting PGs, managing VM membership, checking sync status, and triggering force syncs when the remote site recovers from an outage.

List all Protection Groups

openstack protector protection-group list

Show details for a specific Protection Group

openstack protector protection-group show prod-web-app

This returns the PG status, the associated Consistency Group ID, the current primary site (which changes after failover), the failover count, and the last failover timestamp.

Add a VM to a Protection Group

When you add a VM, the service automatically discovers all Cinder volumes attached to that instance, validates that each volume uses the PG's designated replication-enabled volume type, and adds each volume to the Consistency Group on the primary site. If any attached volume uses a non-replication-enabled type, the operation fails and no changes are made.

openstack protector protection-group member-add prod-web-app \
  --instance-id <nova-instance-uuid>

Each member-add also increments the metadata version and syncs to the secondary site. If the secondary site is unreachable at the time of the call, the operation is blocked.

List members of a Protection Group

openstack protector protection-group member-list prod-web-app

Remove a VM from a Protection Group

Removing a member also removes that VM's volumes from the Consistency Group. The VM continues running on the primary site — it simply loses DR protection.

openstack protector protection-group member-remove prod-web-app \
  --member-id <member-uuid>

View the associated Consistency Group

openstack protector consistency-group show prod-web-app

This shows the Cinder Consistency Group IDs on both the primary and secondary sites, the backend name, the volume count, and the replication status of each volume.

Check metadata sync status

Use this after any outage or before executing DR operations to confirm that both sites hold identical metadata.

openstack protector protection-group sync-status prod-web-app

Force a metadata sync to the remote site

Use this after the remote site recovers from an outage. The local site (where VMs are currently running) is treated as authoritative.

openstack protector protection-group sync-force prod-web-app

Delete a Protection Group

Deleting a PG cascades to the Consistency Group on both sites. All member VMs are unregistered from DR protection, but the VMs and their volumes are not deleted from Nova or Cinder.

openstack protector protection-group delete prod-web-app

Deletion is blocked if the remote site is unreachable, for the same metadata-consistency reasons that apply to modifications.

Examples

Example 1: Create an async Protection Group and add two web-tier VMs

This is the standard workflow for protecting a multi-VM application tier.

# Create the Protection Group
openstack protector protection-group create \
  --name prod-web-tier \
  --description "Web tier VMs - async replication to site-b" \
  --replication-type async \
  --primary-site site-a \
  --secondary-site site-b \
  --volume-type replicated-ssd

Expected output:

+------------------------+--------------------------------------+
| Field                  | Value                                |
+------------------------+--------------------------------------+
| id                     | pg-12345678-1234-1234-1234-123456789abc |
| name                   | prod-web-tier                        |
| status                 | active                               |
| replication_type       | async                                |
| primary_site           | site-a                               |
| secondary_site         | site-b                               |
| consistency_group_id   | cg-87654321-4321-4321-4321-87654321abcd |
| failover_count         | 0                                    |
| last_failover_at       | None                                 |
+------------------------+--------------------------------------+

# Add the first web server
openstack protector protection-group member-add prod-web-tier \
  --instance-id a1b2c3d4-e5f6-7890-abcd-ef1234567890

Expected output:

+------------------------+--------------------------------------+
| Field                  | Value                                |
+------------------------+--------------------------------------+
| id                     | member-aaaa1111-...                  |
| instance_id            | a1b2c3d4-e5f6-7890-abcd-ef1234567890 |
| instance_name          | web-server-1                         |
| status                 | protected                            |
| volumes_added          | 2                                    |
+------------------------+--------------------------------------+

# Add the second web server
openstack protector protection-group member-add prod-web-tier \
  --instance-id b2c3d4e5-f6a7-8901-bcde-f12345678901

# Confirm both members are protected
openstack protector protection-group member-list prod-web-tier

Expected output:

+-------------------+---------------------+------------------+-----------+
| id                | instance_name       | instance_id      | status    |
+-------------------+---------------------+------------------+-----------+
| member-aaaa1111-..| web-server-1        | a1b2c3d4-...     | protected |
| member-bbbb2222-..| web-server-2        | b2c3d4e5-...     | protected |
+-------------------+---------------------+------------------+-----------+

Example 2: Verify metadata sync status after a remote site outage

After the secondary site recovers, always check sync status before executing any DR operation or making PG modifications.

openstack protector protection-group sync-status prod-web-tier

Expected output when out of sync:

Sync Status: ❌ OUT OF SYNC

Local Metadata:
  Version: 4
  Current Site: Site A
  Last Modified: 2025-06-10T09:15:00Z

Remote Sync:
  Status: FAILED
  Remote Version: 3
  Last Sync: 2025-06-10T08:45:00Z (30 minutes ago)
  Error: Connection timeout

Action Required:
  1. Check remote site connectivity
  2. Force sync once remote site is available

# Once site-b is confirmed reachable, push authoritative metadata from site-a
openstack protector protection-group sync-force prod-web-tier

Expected output:

Force Sync Initiated...

Checking remote site connectivity...
  ✅ Site B is reachable

Syncing metadata (version 4)...
  Gathering current metadata... ✓
  Calculating checksum... ✓
  Pushing to Site B... ✓

Remote Site Response:
  Status: success
  Version: 4
  Duration: 380ms

✅ Sync completed successfully
Both sites now at version 4

Example 3: Inspect the Consistency Group associated with a Protection Group

openstack protector consistency-group show prod-web-tier

Expected output:

+-------------------------+------------------------------------------+
| Field                   | Value                                    |
+-------------------------+------------------------------------------+
| id                      | cg-87654321-4321-4321-4321-87654321abcd |
| protection_group_id     | pg-12345678-...                          |
| volume_type_name        | replicated-ssd                           |
| backend_name            | pure@backend-a                           |
| primary_cg_id           | cinder-cg-uuid-on-site-a                 |
| secondary_cg_id         | cinder-cg-uuid-on-site-b                 |
| status                  | active                                   |
| volume_count            | 4                                        |
+-------------------------+------------------------------------------+

Example 4: Remove a VM from a Protection Group

This is safe to run while the PG is active. The VM keeps running; it simply loses DR protection.

# Find the member ID
openstack protector protection-group member-list prod-web-tier

# Remove by member ID
openstack protector protection-group member-remove prod-web-tier \
  --member-id member-aaaa1111-bbbb-cccc-dddd-eeeeeeeeeeee

Expected output:

Member removed: web-server-1
✅ Volumes removed from consistency group (2 volumes)
✅ Local metadata updated (version 4 → 5)
✅ Synced to site-b (version 5)

Troubleshooting

Issue: Protection Group creation fails with "volume type not replication-enabled"

Symptom: openstack protector protection-group create returns an error indicating the volume type does not support replication.

Cause: The Cinder volume type is missing the replication_enabled='<is> True' property, or the replication_type property is absent or mismatched with the requested --replication-type flag.

Fix: On both sites, inspect the volume type:

openstack volume type show replicated-ssd

Verify the properties include:

replication_enabled : <is> True
replication_type    : <in> async

If either property is missing or incorrect, set it:

openstack volume type set replicated-ssd \
  --property replication_enabled='<is> True' \
  --property replication_type='<in> async'

Repeat on both sites. Then retry the Protection Group creation.

Issue: Protection Group creation fails with "remote site unreachable"

Symptom: Creation is rejected immediately with an error stating the secondary site cannot be reached.

Cause: The Protector service cannot contact the secondary site's protector-api endpoint. Metadata cannot be synchronized, so creation is blocked by design.

Fix: Verify the secondary site's protector-api is running and the endpoint is reachable from the primary site's protector-engine:

openstack protector site validate site-b

Check protector-api status on the secondary site:

systemctl status protector-api

Verify the secondary site's auth URL and region are correctly registered:

openstack protector site show site-b

Once connectivity is restored, retry the create command.

Issue: member-add fails with "volume not using a replication-enabled type"

Symptom: Adding a VM returns an error indicating one or more of its attached volumes uses an ineligible volume type.

Cause: One or more Cinder volumes attached to the target VM were created with a volume type that does not have replication_enabled='<is> True'. Every volume attached to a member VM must use the Protection Group's designated replication-enabled volume type.

Fix: Identify the offending volumes:

openstack server show <instance-uuid> -f json | grep -i volume
openstack volume show <volume-uuid> | grep volume_type

Migrate the volume to the correct type using a Cinder volume retype or create a new volume of the correct type, copy the data, and reattach it. There is no in-place fix — the volume type on an existing volume cannot be changed to one that requires backend migration without explicit retype and backend support.

Issue: PG modification blocked with "Cannot modify protection group - remote site unreachable"

Symptom: A member-add, member-remove, or policy update returns an error stating the remote site is unreachable and the operation cannot proceed.

Cause: This is intentional behavior. All modifications require a successful metadata sync to the peer site before they are committed, to prevent the two sites from holding divergent metadata.

Fix: Wait for the remote site to recover, then force a sync to confirm both sites are aligned:

openstack protector protection-group sync-status <pg-name>
openstack protector protection-group sync-force <pg-name>

Once the sync status shows IN SYNC, retry the blocked operation.

Issue: Protection Group stuck in error status

Symptom: The PG status field shows error and no DR operations can be initiated.

Cause: A previous DR operation (failover, failback, or test failover) failed mid-execution. The PG is locked in error to prevent further operations on an inconsistent state.

Fix: Identify the failed operation and review its error message:

openstack protector operation list --protection-group <pg-name>
openstack protector operation show <operation-uuid>

Review the error_message and steps_failed fields in the operation record. Resolve the underlying cause (e.g., missing flavor on the secondary site, snapshot not found, storage connectivity issue). For planned failover failures, the service automatically rolls back — verify the rollback completed (rollback_status: completed in the operation response) before retrying. Contact your storage administrator if the failure involves Pure Storage snapshot or replication errors.

Issue: Consistency Group shows secondary_cg_id as null

Symptom: openstack protector consistency-group show <pg-name> shows secondary_cg_id as None or empty.

Cause: The Cinder Consistency Group creation on the secondary site failed during Protection Group creation, or the secondary site was unreachable at the time. The Protection Group may have been created in a degraded state.

Fix: Check the secondary site's Cinder service:

# On site-b
openstack volume service list

Verify the Cinder backend with the replication-enabled volume type is up. If Cinder is healthy, delete the Protection Group and recreate it once both sites are confirmed reachable and healthy via openstack protector site validate.