Storage Driver Model
Pluggable StorageDriver interface, PureStorageDriver, MockStorageDriver, and the 1:1:1 PG-CG-FlashArray PG mapping
The storage driver model defines how Trilio Site Recovery's protector-engine communicates with backend storage arrays to create, snapshot, and promote replicated volumes during DR operations. The model is pluggable: a StorageDriver base interface abstracts all array-level operations, with PureStorageDriver providing the production implementation against Pure Storage FlashArray and MockStorageDriver providing a full-fidelity simulation backed by SQLite. Central to the model is the strict 1:1:1 mapping between a Trilio Protection Group, a Cinder Consistency Group, and a Pure Storage Protection Group (or Pod for sync replication) β understanding this mapping is essential before configuring replication policies or troubleshooting failover behaviour.
Before working with the storage driver model, ensure the following are in place:
- Two independent OpenStack clouds β each with its own Nova, Cinder, Neutron, and Keystone endpoints. The protector-api and protector-engine services must be running independently on each site.
- Cinder volume types with
replication_enabled='<is> True'andreplication_type='<in> async'(or'<in> sync') configured on both sites before any Protection Group is created. - Pure Storage FlashArray (production path): an async replication connection already established between the two arrays, and API access tokens available for both arrays.
- Mock driver (lab/CI path): no physical arrays required β the MockStorageDriver bundles its own SQLite backing store. Python 3.8+ and a local MariaDB/MySQL instance for the protector database are still required.
protectorclientOSC CLI plugin installed and a validclouds.yamlreferencing both sites.- The
protector-engineservice on each site must be able to reach the Pure Storage management IP of both arrays (for the PureStorageDriver) or have filesystem write access for the SQLite file (for the MockStorageDriver).
The storage driver is selected by configuration, not by a separate package install. Both PureStorageDriver and MockStorageDriver ship as part of the core openstack-protector package.
Step 1 β Install the protector package on both sites
git clone https://github.com/your-org/openstack-protector.git
cd openstack-protector
pip install -r requirements.txt
python setup.py install
Repeat on the controller node of the secondary site.
Step 2 β Verify the driver modules are present
After installation, confirm the engine storage modules are importable:
python -c "from protector.engine.storage import pure; print('PureStorageDriver OK')"
python -c "from protector.engine.storage import mock; print('MockStorageDriver OK')"
Both commands should print the confirmation string with no traceback.
Step 3 β Confirm Cinder volume types exist on both sites
Run the following on each site before proceeding to configuration. Substitute your cloud names from clouds.yaml:
# On the primary site
openstack --os-cloud site-a volume type list --long
# On the secondary site
openstack --os-cloud site-b volume type list --long
Look for replication_enabled='<is> True' in the properties column. If the volume type is absent, create it now (see the Configuration section for the exact extra-specs required).
Step 4 β Initialize the protector database (if not already done)
protector-manage db sync
This step is the same regardless of which storage driver you choose.
Storage driver selection and per-driver options are set in /etc/protector/protector.conf on each site independently.
Driver selection
| Option | Section | Default | Valid values | Effect |
|---|---|---|---|---|
storage_driver | [engine] | pure | pure, mock | Selects the active driver class loaded by protector-engine at startup |
[engine]
storage_driver = pure # or: mock
PureStorageDriver options
These values are stored in the replication_policies table and set per Protection Group via the CLI (see the Usage section). They are not global protector.conf keys β each Protection Group carries its own array credentials.
| Field | Where set | Effect |
|---|---|---|
primary_fa_url | Replication policy | HTTPS management URL of the primary FlashArray |
primary_fa_api_token | Replication policy | API token for the primary array (stored encrypted) |
secondary_fa_url | Replication policy | HTTPS management URL of the secondary FlashArray |
secondary_fa_api_token | Replication policy | API token for the secondary array (stored encrypted) |
pure_pg_name | Replication policy | Name of the Pure Storage Protection Group (must exist on the primary array before the first sync) |
replication_interval | Replication policy | Snapshot interval in seconds (async only; ignored for sync) |
rpo_minutes | Replication policy | Recovery Point Objective in minutes; used during validation to warn if the latest snapshot exceeds this age |
Cinder volume type extra-specs (required on both sites)
Without these properties the protector-engine will refuse to add volumes to a Consistency Group:
# On each site
openstack volume type set <your-type-name> \
--property replication_enabled='<is> True' \
--property replication_type='<in> async' # or '<in> sync'
For sync replication the driver maps to Pure Storage ActiveCluster Pods rather than Protection Groups. Set replication_type='<in> sync' consistently on both sites.
MockStorageDriver options
The mock driver requires no array credentials. It reads a single optional key:
| Option | Section | Default | Effect |
|---|---|---|---|
mock_db_path | [engine] | :memory: | Path to the SQLite file used to persist simulated array state across restarts. Use :memory: for ephemeral test runs. |
[engine]
storage_driver = mock
mock_db_path = /var/lib/protector/mock_array.db
With mock_db_path set to a file path, simulated replication state (Protection Groups, snapshots, volumes) survives a protector-engine restart, which is useful for multi-session DR drills.
The 1:1:1 PGβCGβFlashArray PG mapping explained
Every Trilio Protection Group maps to exactly one Cinder Consistency Group and exactly one Pure Storage Protection Group (or Pod). This is enforced at creation time β you cannot attach an existing Cinder CG or an existing Pure PG to a new Protection Group; both are created automatically and owned exclusively by the Trilio Protection Group.
- Cinder Consistency Group β ensures that all volume snapshots taken during a DR operation are crash-consistent across every volume belonging to a VM member.
- Pure Storage Protection Group β the unit of replication on the array. All volumes that Cinder places on the backend for this Consistency Group are added to this Pure PG. Trilio does not mix volumes from different Trilio Protection Groups into the same Pure PG.
- Pure Storage Pod (sync replication only) β replaces the Protection Group concept for zero-RPO workloads, using ActiveCluster stretched volumes. The 1:1 relationship holds in the same way.
The replication_policies.pure_pg_name field records the canonical name of the Pure PG. Both sites reference this same name because Pure replication carries the PG name through to the secondary array.
Choosing a driver
Use PureStorageDriver (storage_driver = pure) in any environment where physical Pure Storage FlashArrays are present and replication has been pre-configured between the two arrays.
Use MockStorageDriver (storage_driver = mock) when you want to:
- Run end-to-end DR workflow tests in a lab without physical arrays.
- Develop or validate new DR workflows in CI pipelines.
- Train operators on failover and failback procedures before going to production.
The mock driver is a full-fidelity simulation: it exercises the same Protection Group creation, snapshot, volume-from-snapshot, and promotion code paths as the Pure driver, so operational behaviour is identical from the Trilio perspective.
Creating a Protection Group and its storage objects
Creating a Protection Group automatically triggers the driver to create the corresponding Cinder Consistency Groups on both sites and register the Pure Storage Protection Group name from the subsequent replication policy:
# 1. Create the Protection Group
openstack protector protection-group create \
--name prod-web-app \
--description "Production web application" \
--replication-type async \
--primary-site site-a \
--secondary-site site-b \
--volume-type replicated-ssd
# 2. Attach the replication policy (links Trilio PG to the Pure PG)
openstack protector protection-group policy-create prod-web-app \
--primary-fa-url https://flasharray-a.example.com \
--primary-fa-token "T-12345678-abcd-..." \
--secondary-fa-url https://flasharray-b.example.com \
--secondary-fa-token "T-87654321-dcba-..." \
--pure-pg-name "pg-prod-web-app" \
--replication-interval 300 \
--rpo-minutes 15
After step 1, protector-engine has created Cinder CGs on both sites and stored their IDs in consistency_groups.primary_cg_id and consistency_groups.secondary_cg_id. After step 2, the engine knows which Pure PG to snapshot and promote during failover.
Adding VMs (and their volumes) to the Protection Group
When you add a VM, the driver validates that every attached volume belongs to the replication-enabled volume type, then adds those volumes to the Cinder Consistency Group. The Pure Storage driver then ensures those Cinder-managed volumes are included in the Pure PG:
openstack protector protection-group member-add prod-web-app \
--instance-id <nova-instance-uuid>
If any attached volume uses a volume type that does not have replication_enabled='<is> True', the member-add call will be rejected. All volumes must be on the same Cinder backend that backs the replicated volume type.
Forcing a consistency group sync
For async replication you can request an immediate snapshot outside of the scheduled interval:
openstack protector consistency-group sync prod-web-app
The driver translates this into a create_protection_group_snapshot call on the primary FlashArray (or a simulated equivalent for the mock driver).
Switching drivers between environments
Because the driver is selected per site in protector.conf, you can run MockStorageDriver on a staging clone and PureStorageDriver in production without changing any workflow commands. The CLI and API surface is identical in both cases.
Example 1 β Full setup with PureStorageDriver
Create a Protection Group backed by real FlashArray replication, verify the 1:1:1 mapping, and confirm replication readiness.
# Create volume type on site-a (repeat on site-b with backend-b)
openstack --os-cloud site-a volume type create replicated-ssd \
--property volume_backend_name=pure@backend-a \
--property replication_enabled='<is> True' \
--property replication_type='<in> async'
# Create the Protection Group
openstack protector protection-group create \
--name prod-web-app \
--replication-type async \
--primary-site site-a \
--secondary-site site-b \
--volume-type replicated-ssd
Expected output (abbreviated):
+------------------------+--------------------------------------+
| Field | Value |
+------------------------+--------------------------------------+
| id | pg-12345678-1234-1234-1234-12345678 |
| name | prod-web-app |
| status | creating |
| consistency_group_id | cg-87654321-4321-4321-4321-87654321 |
| primary_site | site-a |
| secondary_site | site-b |
+------------------------+--------------------------------------+
# Attach the replication policy
openstack protector protection-group policy-create prod-web-app \
--primary-fa-url https://flasharray-a.example.com \
--primary-fa-token "T-12345678-abcd-efgh-ijkl-mnopqrstuvwx" \
--secondary-fa-url https://flasharray-b.example.com \
--secondary-fa-token "T-87654321-dcba-hgfe-lkji-xwvutsrqponm" \
--pure-pg-name "pg-prod-web-app" \
--replication-interval 300 \
--rpo-minutes 15
# Inspect the consistency group to confirm the 1:1:1 mapping
openstack protector consistency-group show prod-web-app
Expected output:
+----------------------+--------------------------------------+
| Field | Value |
+----------------------+--------------------------------------+
| id | cg-87654321-4321-4321-4321-87654321 |
| protection_group_id | pg-12345678-1234-1234-1234-12345678 |
| volume_type_name | replicated-ssd |
| backend_name | pure@backend-a |
| primary_cg_id | <cinder-cg-uuid-on-site-a> |
| secondary_cg_id | <cinder-cg-uuid-on-site-b> |
| status | active |
| volume_count | 0 |
+----------------------+--------------------------------------+
The pure_pg_name field in the replication policy (pg-prod-web-app) is the Pure Storage Protection Group name visible on both FlashArrays.
Example 2 β Full setup with MockStorageDriver (CI / DR drill)
Configure both sites to use the mock driver with a persistent SQLite file, then run a test failover.
# /etc/protector/protector.conf on BOTH sites
[engine]
storage_driver = mock
mock_db_path = /var/lib/protector/mock_array.db
# Restart protector-engine on both sites to pick up the driver change
systemctl restart protector-engine
# Verify the driver loaded without errors
journalctl -u protector-engine --no-pager | grep -i "storage driver"
# Expected: INFO protector.engine.manager Storage driver: MockStorageDriver
# Create the Protection Group exactly as you would in production
openstack protector protection-group create \
--name drill-web-app \
--replication-type async \
--primary-site site-a \
--secondary-site site-b \
--volume-type replicated-ssd
# No real array credentials needed β omit FA URLs and tokens
# The mock driver auto-generates a simulated pure_pg_name
openstack protector protection-group policy-create drill-web-app \
--pure-pg-name "mock-pg-drill-web-app" \
--replication-interval 300 \
--rpo-minutes 15
# Add a VM
openstack protector protection-group member-add drill-web-app \
--instance-id <nova-instance-uuid>
# Execute a test failover (non-disruptive, primary stays up)
openstack protector protection-group test-failover drill-web-app \
--retain-primary \
--network-mapping net-primary-web=net-secondary-web
# Monitor until complete
openstack protector operation list
openstack protector operation show <operation-id>
Expected final operation status:
+------------------+-------------------------------------+
| Field | Value |
+------------------+-------------------------------------+
| operation_type | test_failover |
| status | completed |
| progress | 100 |
| instances_failed | 0 |
+------------------+-------------------------------------+
Because mock_db_path is set to a file, you can restart protector-engine and re-run the drill without losing the simulated array state β useful for validating runbook steps across separate sessions.
Example 3 β Validating the 1:1:1 mapping after adding a VM member
# Add a VM with two attached volumes
openstack protector protection-group member-add prod-web-app \
--instance-id <web-server-1-uuid>
# Inspect the consistency group volumes to confirm both volumes are tracked
openstack protector consistency-group volumes prod-web-app
Expected output:
+--------------------------------------+------------------+------------+-----------+
| volume_id | volume_name | size_gb | status |
+--------------------------------------+------------------+------------+-----------+
| vol-aaaa-... | web-server-1-os | 50 | replicating|
| vol-bbbb-... | web-server-1-data| 200 | replicating|
+--------------------------------------+------------------+------------+-----------+
Both volumes now belong to the same Cinder CG (and therefore the same Pure PG), guaranteeing crash-consistent snapshots across the boot disk and data disk during failover.
Issue: protector-engine fails to start with ImportError: cannot import name 'PureStorageDriver'
Symptom: journalctl -u protector-engine shows an ImportError referencing protector.engine.storage.pure.
Likely cause: The package was installed without the Pure Storage Python SDK dependency, or the installation did not complete cleanly.
Fix:
pip install purity_fb py-pure-client # install Pure SDK dependencies
# OR reinstall the full package
pip install -r requirements.txt
python setup.py install
systemctl restart protector-engine
Issue: Protection Group creation fails with Volume type does not support replication
Symptom: openstack protector protection-group create returns a 400 error referencing the volume type.
Likely cause: The Cinder volume type on one or both sites is missing the required extra-specs.
Fix: Check the properties on both sites:
openstack --os-cloud site-a volume type show replicated-ssd -f json | python3 -m json.tool
openstack --os-cloud site-b volume type show replicated-ssd -f json | python3 -m json.tool
Both must include:
"replication_enabled": "<is> True"
"replication_type": "<in> async"
Add any missing property:
openstack --os-cloud site-a volume type set replicated-ssd \
--property replication_enabled='<is> True' \
--property replication_type='<in> async'
Repeat for site-b, then retry the Protection Group creation.
Issue: member-add rejected with Volume type mismatch or Volume not replication-eligible
Symptom: Adding a VM to a Protection Group fails and the error message names a specific volume.
Likely cause: One or more volumes attached to the VM use a volume type that does not have replication_enabled='<is> True', or they reside on a different Cinder backend than the one backing the Consistency Group.
Fix: Identify the volume type of each attached volume:
openstack volume show <volume-id> -c volume_type
Migrate offending volumes to the replicated volume type, or retype them using Cinder's retype command:
cinder --os-cloud site-a retype --migration-policy on-demand <volume-id> replicated-ssd
All volumes in a Consistency Group must share the same volume type and backend.
Issue: RPO violation warning during failover validation
Symptom: A planned failover prints a warning such as Latest snapshot age (47 min) exceeds RPO threshold (15 min) and the operation is blocked.
Likely cause: The most recent replicated snapshot on FlashArray B is older than the rpo_minutes value configured in the replication policy β either the replication interval was missed or the secondary array is lagging.
Fix: First, force a manual sync to bring the secondary up to date:
openstack protector consistency-group sync prod-web-app
Wait for the sync to complete, then re-examine snapshot age. If the secondary array is genuinely lagging, investigate Pure replication health directly on the array. For an emergency unplanned failover where you must proceed despite the RPO violation, use --force:
openstack dr failover prod-web-app --force
Note that --force skips primary-site validation and accepts potential data loss up to the actual snapshot age.
Issue: MockStorageDriver simulated state is lost after engine restart
Symptom: After restarting protector-engine, a DR drill cannot find previously created simulated snapshots or volumes.
Likely cause: mock_db_path is set to :memory: (the default) instead of a file path. In-memory SQLite does not persist across process restarts.
Fix: Set a file-backed path in protector.conf and restart the engine:
[engine]
storage_driver = mock
mock_db_path = /var/lib/protector/mock_array.db
systemctl restart protector-engine
Verify the file is created:
ls -lh /var/lib/protector/mock_array.db
Issue: Metadata sync blocked β Peer site unreachable
Symptom: A Protection Group update (adding a member, changing the replication policy) fails with an error stating the peer site is unreachable and the modification has been blocked.
Likely cause: The protector-api on the secondary site is down, or network connectivity between the CLI coordination layer and the secondary site's Keystone/protector endpoint is broken.
Fix: This behaviour is by design β metadata sync is intentionally strict to prevent Protection Group state from diverging between sites. Verify endpoint reachability:
openstack --os-cloud site-b catalog show protector
curl -v http://<site-b-controller>:8788/
Restore connectivity or bring the secondary protector-api back online before retrying the modification. Do not attempt to bypass this check by directly modifying the database.