Site Recoveryfor OpenStack
Guide

Protection Groups Panel

Creating PGs, managing VM membership, viewing replication status

master

Overview

The Protection Groups panel is your primary workspace for defining which Nova VMs are protected under Trilio Site Recovery and for monitoring the health of their replication. A Protection Group (PG) is a logical unit — it groups one or more VM instances that must fail over together, binds them to a Cinder Consistency Group for crash-consistent replication, and tracks the lifecycle of every DR operation performed against those workloads. This page explains how to create a Protection Group, add VMs to it, interpret replication status, and manage group membership through both the OSC CLI plugin and the Horizon dashboard.


Prerequisites

Before working with Protection Groups, confirm the following are in place:

  • Two registered OpenStack sites — both the primary site and the secondary (DR) site must be registered with the Protector service and reachable. Verify with openstack protector site list.
  • Sites validated — each site must pass connectivity and capability checks (openstack protector site validate <site-name>).
  • Replication-enabled Cinder volume types exist on both sites. Each volume type must carry replication_enabled='<is> True' and a replication_type='<in> async' or replication_type='<in> sync' property. Volume types with these properties present on both sites are the only ones eligible for Protection Group creation.
  • All VM volumes use a replication-enabled type — any Nova instance you intend to add must have all attached Cinder volumes backed by a qualifying volume type on the same storage backend.
  • Replication policy credentials available — you will need the Pure Storage FlashArray management URLs and API tokens for both arrays before you can validate replication readiness.
  • protectorclient OSC plugin installed on your workstation, authenticated against the primary site, and configured with clouds.yaml entries for both sites.
  • Trilio Site Recovery API version 1.1 or later — confirm with openstack protector --version.

Installation

The Protection Groups panel itself requires no separate installation — it is part of the protectorclient OSC plugin and the Protector Horizon extension deployed during the standard Trilio Site Recovery installation. If the panel is missing, verify the plugin is installed and the API endpoint is registered.

Step 1 — Confirm the OSC plugin is installed

pip show python-protectorclient

The command should return package metadata. If it returns nothing, install the client:

pip install python-protectorclient

Step 2 — Confirm the Protector API endpoint is registered in Keystone

openstack endpoint list --service protector

Expected output includes public, internal, and admin endpoints at port 8788.

Step 3 — Confirm both Protector services are running on each site

On each controller node:

systemctl status protector-api
systemctl status protector-engine

Both services must be active (running). If either is stopped:

systemctl start protector-api protector-engine

Step 4 — Configure multi-site credentials in clouds.yaml

Ensure ~/.config/openstack/clouds.yaml contains entries for both sites:

clouds:
  site-a:
    auth:
      auth_url: http://site-a-controller:5000/v3
      project_name: <your-project>
      username: <your-user>
      password: <your-password>
      user_domain_name: Default
      project_domain_name: Default
    region_name: RegionOne

  site-b:
    auth:
      auth_url: http://site-b-controller:5000/v3
      project_name: <your-project>
      username: <your-user>
      password: <your-password>
      user_domain_name: Default
      project_domain_name: Default
    region_name: RegionOne

Step 5 — Verify Protection Groups panel access

openstack protector protection-group list

An empty list (not an error) confirms the panel is accessible.


Configuration

Protection Group behavior is shaped by the parameters set at creation time and by the replication policy attached afterward. The following options are fixed at creation; changing them later requires deleting and recreating the group.

Protection Group creation parameters

ParameterRequiredValid valuesEffect
--nameYesAny string, unique per tenantHuman-readable identifier for the group
--descriptionNoAny stringInformational label; stored in metadata
--primary-siteYesRegistered site nameThe site where VMs currently run. This designation is workload-relative — it swaps on failover.
--secondary-siteYesRegistered site name (different from primary)The DR site to which workloads will be promoted
--replication-typeYesasync, syncControls replication mode for the underlying Pure Storage Protection Group. async uses periodic snapshots (configurable RPO). sync uses ActiveCluster Pods with zero RPO but requires quiescence on writes.
--volume-typeYesCinder volume type name or IDMust have replication_enabled='<is> True' and a matching replication_type property on both sites. All volumes added to the group must use this type.

Automatic side-effects of creation

When you create a Protection Group, the service automatically:

  1. Creates a Cinder Consistency Group on the primary site.
  2. Creates a matching Cinder Consistency Group on the secondary site.
  3. Creates a metadata record (version 1) and synchronizes it to the secondary site.
  4. Establishes the 1:1:1 binding between the Protection Group, the Consistency Group, and the Pure Storage Protection Group.

You cannot override this automation — the 1:1 relationship is structural.

Metadata synchronization behavior

Every modification to a Protection Group (adding members, updating description, policy changes) increments a version number and immediately attempts to push the updated metadata to the peer site. If the peer site is unreachable, the modification is blocked. This is intentional: the service prevents metadata divergence because a diverged secondary cannot execute a reliable failover. You must wait for the peer to become reachable or use openstack protector protection-group sync-force once connectivity is restored.

Replication policy parameters

The replication policy is a separate object attached to the group after creation. It stores the Pure Storage FlashArray credentials and RPO target.

ParameterRequiredEffect
--primary-fa-urlYesHTTPS management URL of the primary FlashArray
--primary-fa-tokenYesAPI token for the primary FlashArray (stored encrypted)
--secondary-fa-urlYesHTTPS management URL of the secondary FlashArray
--secondary-fa-tokenYesAPI token for the secondary FlashArray (stored encrypted)
--pure-pg-nameYesName of the Pure Storage Protection Group — must match the name configured on the array
--replication-intervalAsync onlySnapshot replication interval in seconds (e.g., 300 for 5 minutes)
--rpo-minutesAsync onlyRecovery Point Objective in minutes; used for replication readiness validation

Security note: FlashArray API tokens are encrypted at rest. Use Barbican or an equivalent secrets manager in production deployments rather than passing tokens directly on the command line.


Usage

Creating a Protection Group

Create the group from the site where the VMs currently reside. The OSC plugin authenticates to both sites automatically using your clouds.yaml.

openstack protector protection-group create \
  --name prod-web-app \
  --description "Production web application tier" \
  --replication-type async \
  --primary-site site-a \
  --secondary-site site-b \
  --volume-type replicated-ssd

Wait for status to transition from creating to active before proceeding. Poll with:

openstack protector protection-group show prod-web-app

Attaching a replication policy

A Protection Group without a policy cannot validate replication readiness or execute failover. Attach a policy immediately after the group reaches active:

openstack protector protection-group policy-create prod-web-app \
  --primary-fa-url https://flasharray-a.example.com \
  --primary-fa-token "T-12345678-abcd-efgh-ijkl-mnopqrstuvwx" \
  --secondary-fa-url https://flasharray-b.example.com \
  --secondary-fa-token "T-87654321-dcba-hgfe-lkji-xwvutsrqponm" \
  --pure-pg-name "pg-prod-web-app" \
  --replication-interval 300 \
  --rpo-minutes 15

Adding VMs to the group

Add each VM by Nova instance ID. The service discovers all attached Cinder volumes, validates they use the group's volume type, and adds them to the Consistency Group automatically. Every addition triggers a metadata sync to the secondary site.

openstack protector protection-group member-add prod-web-app \
  --instance-id <nova-instance-uuid>

Repeat for each VM that must fail over as part of this group. Volumes belonging to VMs with mixed volume types (some replicated, some not) will cause the member-add to fail — all volumes for a given VM must use the group's volume type.

Listing members

openstack protector protection-group member-list prod-web-app

Each member shows its status field. Healthy members show protected. A member in error state requires investigation before failover.

Viewing replication status

# Protection Group-level status
openstack protector protection-group show prod-web-app

# Consistency Group and volume-level replication detail
openstack protector consistency-group show prod-web-app

# Metadata sync status between sites
openstack protector protection-group sync-status prod-web-app

The sync-status output reports the local metadata version, the remote metadata version, and whether the two sites are SYNCED, FAILED, or UNREACHABLE. Both versions must match before you execute a failover.

Removing a VM from the group

Removing a member detaches its volumes from the Consistency Group and syncs the updated metadata to the secondary site. The VM itself is not affected — only its DR coverage is removed.

openstack protector protection-group member-remove prod-web-app \
  --member-id <member-uuid>

Deleting a Protection Group

Deleting a Protection Group cascades to the Consistency Group and all member records on both sites. VMs are not deleted — only the DR configuration is removed.

openstack protector protection-group delete prod-web-app

Deletion is blocked if a DR operation is currently in progress (failing_over, failing_back). Wait for the operation to complete or reach a terminal state first.


Examples

Example 1 — Create a Protection Group for an async-replicated application tier

This example creates a group for a three-VM web tier with a 5-minute RPO.

openstack protector protection-group create \
  --name prod-web-app \
  --description "Production web tier - async replication" \
  --replication-type async \
  --primary-site site-a \
  --secondary-site site-b \
  --volume-type replicated-ssd

Expected output:

+------------------------+----------------------------------------------+
| Field                  | Value                                        |
+------------------------+----------------------------------------------+
| id                     | pg-12345678-1234-1234-1234-123456789abc      |
| name                   | prod-web-app                                 |
| status                 | creating                                     |
| replication_type       | async                                        |
| primary_site           | site-a                                       |
| secondary_site         | site-b                                       |
| consistency_group_id   | cg-87654321-4321-4321-4321-876543210def      |
| failover_count         | 0                                            |
| created_at             | 2025-01-15T09:00:00Z                         |
+------------------------+----------------------------------------------+

Poll until status shows active before adding members.


Example 2 — Add three VMs to the group

# Add web server 1
openstack protector protection-group member-add prod-web-app \
  --instance-id a1b2c3d4-1111-2222-3333-aabbccddeeff

# Add web server 2
openstack protector protection-group member-add prod-web-app \
  --instance-id b2c3d4e5-1111-2222-3333-bbccddeeffaa

# Add database server
openstack protector protection-group member-add prod-web-app \
  --instance-id c3d4e5f6-1111-2222-3333-ccddeeffaabb

Expected output for each member-add:

+------------------------+----------------------------------------------+
| Field                  | Value                                        |
+------------------------+----------------------------------------------+
| id                     | member-aaaabbbb-1234-5678-90ab-ccddeeff0011  |
| instance_id            | a1b2c3d4-1111-2222-3333-aabbccddeeff         |
| instance_name          | web-server-1                                 |
| status                 | protected                                    |
| volumes_added          | 2                                            |
+------------------------+----------------------------------------------+

Example 3 — List members and verify their protection status

openstack protector protection-group member-list prod-web-app

Expected output:

+------------------+---------------+---------------+-----------+----------------+
| ID               | Instance Name | Instance ID   | Status    | Volumes        |
+------------------+---------------+---------------+-----------+----------------+
| member-aaaa...   | web-server-1  | a1b2c3d4-...  | protected | 2              |
| member-bbbb...   | web-server-2  | b2c3d4e5-...  | protected | 2              |
| member-cccc...   | db-server-1   | c3d4e5f6-...  | protected | 3              |
+------------------+---------------+---------------+-----------+----------------+

All members must show protected before you execute a failover or DR drill.


Example 4 — Check metadata sync status between sites

openstack protector protection-group sync-status prod-web-app

Expected output when fully synchronized:

Sync Status: āœ… IN SYNC

Local Metadata:
  Version: 4
  Current Site: site-a
  Last Modified: 2025-01-15T09:15:00Z

Remote Sync:
  Status: SYNCED
  Remote Version: 4
  Last Sync: 2025-01-15T09:15:05Z (5 seconds ago)

Validation:
  āœ… Versions match (4 = 4)
  āœ… Sync status is 'synced'
  āœ… Last sync is recent

Both sites have identical metadata.

Example 5 — Force a metadata sync after a peer site outage

If site-b was briefly unreachable and the local version has advanced, use force-sync once the peer recovers:

openstack protector protection-group sync-force prod-web-app

Expected output:

Force Sync Initiated...

Checking remote site connectivity...
  āœ… site-b is reachable

Syncing metadata (version 5)...
  Gathering current metadata... āœ“
  Calculating checksum... āœ“
  Pushing to site-b... āœ“

Remote Site Response:
  Status: success
  Version: 5
  Duration: 450ms

āœ… Sync completed successfully
Both sites now at version 5

Example 6 — View Consistency Group volume membership

openstack protector consistency-group show prod-web-app

Expected output:

+------------------------+----------------------------------------------+
| Field                  | Value                                        |
+------------------------+----------------------------------------------+
| id                     | cg-87654321-4321-4321-4321-876543210def      |
| protection_group_id    | pg-12345678-1234-1234-1234-123456789abc      |
| volume_type_name       | replicated-ssd                               |
| backend_name           | pure@backend-a                               |
| primary_cg_id          | <cinder-cg-uuid-site-a>                      |
| secondary_cg_id        | <cinder-cg-uuid-site-b>                      |
| status                 | active                                       |
| volume_count           | 7                                            |
+------------------------+----------------------------------------------+

The secondary_cg_id field being populated confirms that the secondary site Consistency Group was created successfully during Protection Group initialization.


Troubleshooting

Issue: protection-group create fails with "volume type not eligible for replication"

Symptom: The create command exits immediately with an error referencing the volume type.

Likely cause: The specified Cinder volume type is missing replication_enabled='<is> True' or replication_type properties on one or both sites.

Fix:

  1. On each site, inspect the volume type: openstack volume type show replicated-ssd
  2. Confirm both properties are present:
    openstack volume type set replicated-ssd \
      --property replication_enabled='<is> True' \
      --property replication_type='<in> async'
    
  3. Repeat on the secondary site, then retry the Protection Group creation.

Issue: member-add fails with "volume type mismatch"

Symptom: Adding a VM returns an error stating that one or more of its volumes do not match the group's volume type.

Likely cause: The VM has volumes backed by a non-replication-enabled volume type (e.g., a local SSD type or an ephemeral-backed root disk). All Cinder volumes attached to a VM must use the Protection Group's volume type.

Fix:

  1. List the VM's volumes: openstack server show <instance-id> and inspect the volumes_attached field.
  2. For each volume, check its type: openstack volume show <volume-id>.
  3. Migrate non-conforming volumes to the replication-enabled type, or exclude VMs whose storage cannot be migrated.

Issue: Member shows status: error in member-list

Symptom: One or more members display error instead of protected after being added.

Likely cause: Volume addition to the Cinder Consistency Group failed, typically because the volume is on a different storage backend than the other volumes in the group (all volumes must share the same backend), or the backend reported a capacity or capability error.

Fix:

  1. Show the member detail and check the error message: openstack protector protection-group member-show <pg> <member-id>.
  2. Verify the volume's backend: openstack volume show <volume-id> — look for os-vol-host-attr:host.
  3. All volumes in the Consistency Group must share the same backend value (e.g., pure@backend-a). If the volume is on a different backend, it cannot be in this group.
  4. Remove the problematic member, resolve the storage placement, and re-add.

Issue: member-add or member-remove is blocked with "remote site unreachable"

Symptom: Modifications to the Protection Group are rejected even though your local site is healthy.

Likely cause: The service requires both sites to be reachable before committing any metadata change. This prevents the two sites from diverging into inconsistent states. If the peer site is down, modifications are intentionally blocked.

Fix:

  1. Check peer site reachability: openstack protector site validate site-b.
  2. If the peer site is temporarily offline, wait for it to recover.
  3. Once it recovers, check sync status: openstack protector protection-group sync-status <pg>.
  4. If versions differ, force a sync: openstack protector protection-group sync-force <pg>.
  5. Retry your membership change.

Issue: sync-status shows FAILED or OUT OF SYNC after a completed operation

Symptom: The sync-status command reports a version mismatch or a failed sync timestamp from a recent operation.

Likely cause: A transient network interruption occurred between sites during a metadata push. The local site completed the operation but the remote confirmation was not received.

Fix:

  1. Verify the peer site is now reachable: openstack protector site validate <site>.
  2. Review the sync history to understand what changed: openstack protector protection-group sync-log <pg> --limit 10.
  3. If the peer is reachable and the local version is higher, push the current state: openstack protector protection-group sync-force <pg>.
  4. Confirm both versions match before executing any DR operation.

Issue: Protection Group status is stuck in failing_over or failing_back

Symptom: A DR operation started but has not completed or failed. The Protection Group status has not moved for an extended period.

Likely cause: The protector-engine service may have crashed mid-operation, or a step in the workflow (e.g., volume promotion or VM recreation on the target site) encountered an unrecoverable error that did not transition the operation to failed.

Fix:

  1. Check the engine service on both sites: systemctl status protector-engine.
  2. Review engine logs for the specific operation: journalctl -u protector-engine --since "30 minutes ago".
  3. Retrieve the operation detail: openstack protector operation show <op-id>.
  4. If the engine is running but the operation is genuinely stuck, contact support — do not manually delete the Protection Group record while an operation is in an indeterminate state, as this can leave orphaned volumes on the secondary site.