Site Recoveryfor OpenStack
Guide

Protection Group Issues

PG creation failures, Consistency Group errors, volume type compatibility

master

Overview

This page helps you diagnose and resolve the most common failures that occur when creating or managing Protection Groups (PGs) in Trilio Site Recovery. Because PG creation triggers a chain of interdependent operations — Cinder Consistency Group creation on both sites, Pure Storage Protection Group or Pod provisioning, and metadata synchronization — a failure at any stage lands the PG in an error state and blocks DR readiness. Understanding which layer failed, and why, lets you resolve the issue and restore protection without recreating the PG from scratch.


Prerequisites

Before using this guide, confirm the following:

  • Trilio Site Recovery is deployed on both your primary and secondary OpenStack sites, with protector-api and protector-engine running independently on each.
  • You have OpenStack CLI access to both sites (via clouds.yaml or separate openrc files).
  • Your user has sufficient privileges to inspect Cinder volume types, Consistency Groups, and Protector Protection Groups on both sites.
  • You have access to the protector-api and protector-engine logs on both sites (/var/log/protector/).
  • You know the FlashArray management URLs and API tokens configured in your Replication Policy.
  • For Pure Storage connectivity checks, you have network access from the controller nodes to the FlashArray management interfaces on both sites.

Installation

Before investigating a specific issue, collect baseline diagnostic output so you can correlate errors across layers.

Step 1: Show the Protection Group status

openstack protector protection-group show <pg-name-or-id>

Note the status field. A value of error means the PG creation or last operation failed. Also record the consistency_group_id.

Step 2: Inspect the Consistency Group

openstack protector protection-group consistency-group show <pg-name-or-id>

Record the status, primary_cg_id, and secondary_cg_id fields. A null secondary_cg_id indicates the secondary-site Cinder CG was never created.

Step 3: Check protector-engine logs on the primary site

tail -n 200 /var/log/protector/protector-engine.log | grep -i error

Step 4: Check protector-engine logs on the secondary site

# SSH to secondary controller, then:
tail -n 200 /var/log/protector/protector-engine.log | grep -i error

Step 5: Verify service health on both sites

# Primary site
systemctl status protector-api protector-engine

# Secondary site (SSH first)
systemctl status protector-api protector-engine

Step 6: Validate registered site connectivity

openstack protector site validate site-a
openstack protector site validate site-b

Keep this output on hand as you work through the issues below.


Configuration

The following configuration properties directly affect PG creation behavior. Misconfiguration in any of these is the root cause of most PG failures.

Cinder volume type properties

Every volume attached to a member VM must use a Cinder volume type with both of the following properties set correctly on both sites:

PropertyRequired valueEffect
replication_enabled'<is> True'Marks the volume type as eligible for geo-replication. Without this, Protector rejects the volume type at PG creation time.
replication_type'<in> async' or '<in> sync'Must match the replication_type you specify when creating the Protection Group. A mismatch causes PG creation to fail with a compatibility error.
volume_backend_namee.g. pure@backend-a (site-specific)Routes volumes to the correct Pure Storage backend. Must reference a backend that actually has replication configured.

Verify these properties on each site:

# Primary site
openstack volume type show replicated-ssd

# Secondary site
openstack volume type show replicated-ssd

Replication Policy fields

The Replication Policy is attached per-PG and controls Pure Storage connectivity. The fields that cause the most failures are:

FieldDescriptionCommon mistake
primary_fa_urlHTTPS URL of the primary FlashArray management interfaceUsing HTTP instead of HTTPS, or an unreachable hostname
primary_fa_api_tokenAPI token for the primary FlashArrayExpired or incorrect token
secondary_fa_urlHTTPS URL of the secondary FlashArray management interfaceWrong array URL after a hardware change
secondary_fa_api_tokenAPI token for the secondary FlashArrayToken scoped to wrong array
pure_pg_nameName of the Protection Group (async) or Pod (sync) on the Pure Storage arraysMust match an existing, connected PG or Pod on the arrays
replication_intervalSeconds between async snapshot cyclesMust be ≥ the minimum interval supported by your FlashArray firmware

View the current policy:

openstack protector protection-group policy-show <pg-name-or-id>

Secondary site Cinder quotas

Protector creates a Cinder Consistency Group and adds volumes to it on the secondary site during PG creation. If the secondary site tenant has insufficient Cinder quota, CG creation fails. Check and adjust quotas on the secondary site:

# Check current usage
openstack quota show --detail <project-id>

# Increase consistencygroups quota if needed (admin)
openstack quota set --consistencygroups <new-limit> <project-id>

Usage

When a Protection Group lands in error state, your workflow is:

  1. Identify which of the four failure categories applies (volume type, secondary-site CG, Pure Storage, or member eligibility).
  2. Fix the underlying condition.
  3. Delete and recreate the PG, or use the force-sync / retry path if the PG record itself is intact.

Protector blocks modifications to a PG when the peer site is unreachable. If you need to fix a PG configuration while the secondary site is down, you must restore secondary-site connectivity first, then force a metadata sync before retrying.

Checking volume type compatibility before creating a PG

Run this check before creating a PG to avoid the most common creation failure:

# On the primary site — list volume types with replication properties
openstack volume type list --long

Confirm that the volume type you plan to use shows replication_enabled='<is> True' and a replication_type matching your intended PG replication mode (async or sync).

Then repeat the same check on the secondary site to confirm the volume type exists there with identical properties.

Retrying after fixing a PG in error state

If the PG record exists but is in error state and you have resolved the underlying condition:

# Check current sync status
openstack protector protection-group sync-status <pg-name-or-id>

# Force metadata re-sync if the peer site was temporarily unreachable
openstack protector protection-group sync-force <pg-name-or-id>

If the PG cannot be recovered in place (for example, the Cinder CG on the secondary site is in a corrupt state), delete the PG and recreate it after resolving the root cause:

openstack protector protection-group delete <pg-name-or-id>
# Then recreate:
openstack protector protection-group create \
  --name <name> \
  --replication-type async \
  --primary-site site-a \
  --secondary-site site-b \
  --volume-type replicated-ssd

Examples

Example 1: Diagnosing a PG stuck in error after creation

You create a PG and it transitions to error instead of active.

openstack protector protection-group show prod-web-app

Expected output (truncated):

+------------------------+------------------------------------------+
| Field                  | Value                                    |
+------------------------+------------------------------------------+
| id                     | pg-12345678-1234-1234-1234-123456789abc  |
| name                   | prod-web-app                             |
| status                 | error                                    |
| consistency_group_id   | cg-87654321-4321-4321-4321-87654321abcd  |
| primary_site           | site-a                                   |
| secondary_site         | site-b                                   |
+------------------------+------------------------------------------+

Inspect the Consistency Group:

openstack protector protection-group consistency-group show prod-web-app
+------------------------+------------------------------------------+
| Field                  | Value                                    |
+------------------------+------------------------------------------+
| status                 | error                                    |
| volume_type_name       | standard-ssd                             |
| primary_cg_id          | cinder-cg-uuid-primary                   |
| secondary_cg_id        | None                                     |
+------------------------+------------------------------------------+

The volume_type_name is standard-ssd, not a replication-enabled type. Check the volume type:

openstack volume type show standard-ssd
+--------------------+----------------------------+
| Field              | Value                      |
+--------------------+----------------------------+
| name               | standard-ssd               |
| properties         | volume_backend_name='pure' |
+--------------------+----------------------------+

replication_enabled is absent. The volume type is not eligible. Fix: migrate your VM volumes to a replication-enabled volume type, delete the PG, and recreate it with the correct volume type.


Example 2: Secondary-site Consistency Group creation failure

The primary-site CG was created (primary_cg_id is set) but secondary_cg_id is None and the engine log on the secondary site shows:

ERROR protector.engine.consistency_group: Failed to create consistency group on secondary site: QuotaError: Quota exceeded for resources: ['consistencygroups']

Fix the quota on the secondary site, then delete and recreate the PG:

# On secondary site (admin credentials)
openstack quota set --consistencygroups 20 <project-id>

# Verify
openstack quota show <project-id> | grep consistencygroup

Expected output:

| consistencygroups          | 20    |

Now delete the failed PG and recreate:

openstack protector protection-group delete prod-web-app

openstack protector protection-group create \
  --name prod-web-app \
  --replication-type async \
  --primary-site site-a \
  --secondary-site site-b \
  --volume-type replicated-ssd

Expected output after successful creation:

+------------------------+--------------------------------------+
| Field                  | Value                                |
+------------------------+--------------------------------------+
| status                 | active                               |
| consistency_group_id   | cg-new-uuid                          |
| primary_site           | site-a                               |
| secondary_site         | site-b                               |
+------------------------+--------------------------------------+

Example 3: Attempting to add an ephemeral-only VM

openstack protector protection-group member-add prod-web-app \
  --instance-id ephemeral-vm-uuid

Expected error:

ERROR: Instance 'ephemeral-vm-uuid' cannot be added to a Protection Group.
Reason: VM has no Cinder-backed volumes. Only VMs with at least one
attached Cinder volume are eligible for protection.

Verify the VM's attached volumes:

openstack server show ephemeral-vm-uuid -c volumes_attached
+-----------------+-------+
| Field           | Value |
+-----------------+-------+
| volumes_attached | []   |
+-----------------+-------+

The VM uses only ephemeral storage. To protect this workload, rebuild the VM with a Cinder boot volume (--boot-from-volume) and replication-enabled volume type before adding it to the PG.


Troubleshooting

Use the following format for each issue: Symptom → Likely cause → Fix.


Issue 1: PG stuck in error after creation — volume type not replication-enabled

Symptom The Protection Group transitions to error immediately after creation. The Consistency Group status is error. The protector-engine log on the primary site contains a message such as:

Volume type 'standard-ssd' does not have replication_enabled='<is> True'

Likely cause One or more of the Cinder volumes that would be placed into the Consistency Group uses a volume type that lacks the replication_enabled='<is> True' property. All volumes in a Consistency Group must belong to a volume type that has this property, because Pure Storage replication operates at the volume-type (backend) level.

Fix

  1. Identify which volume type is missing the property:
    openstack volume type list --long
    
  2. Set the missing property on the correct volume type (requires admin):
    openstack volume type set replicated-ssd \
      --property replication_enabled='<is> True' \
      --property replication_type='<in> async'
    
  3. Confirm the fix on both sites — the secondary site must have an identically configured volume type.
  4. Delete the failed PG and recreate it.

Issue 2: Consistency Group creation fails on secondary site

Symptom The PG record shows status: error. The Consistency Group has a valid primary_cg_id but secondary_cg_id is null. The protector-engine log on the secondary site shows a connectivity error or quota error.

Likely cause A: protector-engine on the secondary site is not running or not reachable Protector creates the Cinder CG on the secondary site by having the secondary protector-engine call the secondary Cinder API. If the service is stopped or its endpoint is not reachable from the primary site's coordination layer, the CG cannot be created.

Fix A

  1. Check the service on the secondary site:
    systemctl status protector-engine
    
  2. If stopped, start it:
    systemctl start protector-engine
    
  3. Validate the secondary site is reachable from the Protector control plane:
    openstack protector site validate site-b
    
  4. Delete the failed PG and recreate.

Likely cause B: Cinder quota exceeded on the secondary site The secondary site project does not have enough consistencygroups quota to accommodate the new CG.

Fix B

  1. Check the quota on the secondary site:
    openstack quota show --detail <project-id>
    
  2. Increase the limit:
    openstack quota set --consistencygroups <new-limit> <project-id>
    
  3. Delete the failed PG and recreate.

Issue 3: Pure Storage Protection Group or Pod creation fails

Symptom The PG and Cinder CG records are created but the PG transitions to error after the Replication Policy is applied. The protector-engine log contains errors referencing the FlashArray API, such as authentication failures or missing peer connections.

Likely cause A: Invalid FlashArray credentials in the Replication Policy The primary_fa_api_token or secondary_fa_api_token is expired, incorrect, or scoped to the wrong array.

Fix A

  1. Retrieve a fresh API token from each FlashArray management UI or CLI.
  2. Update the Replication Policy:
    openstack protector protection-group policy-create <pg-name-or-id> \
      --primary-fa-url https://flasharray-a.example.com \
      --primary-fa-token "T-new-token-a" \
      --secondary-fa-url https://flasharray-b.example.com \
      --secondary-fa-token "T-new-token-b" \
      --pure-pg-name "pg-prod-web-app" \
      --replication-interval 300 \
      --rpo-minutes 15
    

Likely cause B: Arrays are not connected (missing peer connection or replication target) For async replication, a replication target (remote array connection) must be configured between FlashArray A and FlashArray B. For sync replication (ActiveCluster), a peer connection and Pod must exist. If these are absent, Pure Storage cannot replicate the Protection Group.

Fix B

  1. Log in to each FlashArray management interface.
  2. For async: verify that a replication target pointing to the remote array is configured and shows connected status.
  3. For sync: verify that the ActiveCluster peer connection is established and that the Pod referenced in pure_pg_name exists on both arrays.
  4. After confirming array connectivity, re-apply the Replication Policy (see Fix A command above).

Issue 4: Member VM cannot be added to the Protection Group

Symptom Running openstack protector protection-group member-add returns an error indicating the VM is not eligible for protection.

Likely cause A: VM has no Cinder-backed volumes (ephemeral-only) Protector protects Nova VMs by replicating their Cinder volumes through a Consistency Group. A VM that boots from ephemeral storage and has no attached Cinder volumes has nothing to replicate, so it cannot be added as a PG member.

Fix A Rebuild or re-deploy the VM to use a Cinder boot volume with a replication-enabled volume type:

openstack server create \
  --flavor m1.large \
  --volume <cinder-boot-volume-uuid> \
  --network <network-uuid> \
  <vm-name>

Ensure the boot volume uses a volume type with replication_enabled='<is> True'.

Likely cause B: VM volumes use a non-replication-enabled volume type Even if the VM has Cinder volumes, if those volumes use a volume type without replication_enabled='<is> True', Protector rejects them. All volumes belonging to a PG member must use the same replication-enabled volume type as the Protection Group's Consistency Group.

Fix B

  1. Check the volume type of each attached volume:
    openstack volume show <volume-id> -c volume_type
    
  2. Retype the volume to the correct volume type (this may require a Cinder migration):
    openstack volume set --type replicated-ssd --retype-policy on-demand <volume-id>
    
  3. After the retype completes, retry the member-add.

Issue 5: Protection Group modification blocked — peer site unreachable

Symptom Any attempt to modify the PG (add/remove members, update policy) returns:

ERROR: Cannot modify protection group - remote site unreachable

Likely cause Protector enforces strict metadata synchronization: a change to a PG must be written to both sites atomically. If the peer site's protector-api or protector-engine is unreachable, the modification is blocked to prevent metadata divergence.

Fix

  1. Restore connectivity to the peer site.
  2. Verify the peer site services are running:
    systemctl status protector-api protector-engine
    
  3. Validate the site:
    openstack protector site validate <peer-site-name>
    
  4. Once the peer site is reachable, force a metadata sync:
    openstack protector protection-group sync-force <pg-name-or-id>
    
  5. Confirm both sites are in sync:
    openstack protector protection-group sync-status <pg-name-or-id>
    
  6. Retry your modification.