Intended Audience
Who this product is for and what background knowledge is assumed.
This page describes who Trilio Site Recovery for OpenStack is designed for, what roles and responsibilities are assumed, and what background knowledge you should have before deploying or operating the product. Understanding the intended audience helps you determine whether this documentation applies directly to your role, or whether you should first build familiarity with prerequisite technologies.
Who This Product Is For
Trilio Site Recovery for OpenStack is built for cloud platform engineers and site reliability engineers (SREs) who are responsible for designing, deploying, and maintaining OpenStack-hosted infrastructure at an organizational or multi-tenant scale. If you are responsible for ensuring that virtual machine workloads can survive a datacenter-level failure — and for defining the recovery procedures that govern how and when that happens — this product is for you.
This is not a product aimed at application developers or end users of OpenStack tenants. While the protection model is tenant-driven (meaning tenants can manage their own Protection Groups and replication policies through the CLI or Horizon dashboard), the underlying infrastructure — Pure Storage FlashArray arrays, Cinder volume type configuration, site registration, and service deployment — requires elevated OpenStack operator privileges and deep familiarity with the OpenStack control plane.
Assumed Role Responsibilities
You are expected to be comfortable with the following operational responsibilities:
- Deploying and configuring OpenStack services, including Nova, Cinder, Neutron, and Keystone, across two independent cloud environments
- Managing Cinder backends and volume type properties, including setting replication metadata such as
replication_enabled='<is> True'andreplication_type - Administering storage infrastructure, specifically Pure Storage FlashArray arrays, including understanding how array-level replication, Protection Groups, and Pods work in the context of synchronous and asynchronous replication
- Operating multi-site or multi-region cloud architectures, where each site runs its own full OpenStack control plane with independent Nova, Cinder, Neutron, and Keystone endpoints
- Writing and executing runbooks for disaster recovery drills, planned failovers, unplanned failovers, and failback procedures
Assumed Technical Knowledge
This documentation is written at an expert level. The following knowledge is assumed and is not explained from first principles:
OpenStack
- How Nova manages virtual machine lifecycle (create, stop, rebuild, migrate)
- How Cinder manages block storage volumes, volume types, consistency groups, and backend drivers
- How Keystone authentication and service catalog endpoints work across multiple deployments
- How Neutron manages networks, subnets, ports, and security groups
- How to use the OpenStack CLI (
openstackcommand) and configureclouds.yamlfor multi-cloud authentication
Pure Storage FlashArray
- What FlashArray Protection Groups are and how they govern snapshot replication schedules
- The difference between asynchronous replication (Protection Groups) and synchronous replication (ActiveCluster Pods)
- How array-to-array replication relationships are established and monitored
Disaster Recovery Concepts
- Recovery Point Objective (RPO) and Recovery Time Objective (RTO) and how replication policies affect them
- The difference between a test failover (DR drill), a planned failover, an unplanned failover, and a point-in-time failover
- The concept of primary and secondary site designations, and why these are workload-relative rather than fixed — designations swap when a failover is executed
- Failback: the process of returning workloads to the original site after a failover event
Distributed Systems
- Why strict metadata consistency between sites matters: Trilio Site Recovery blocks modifications to a Protection Group when the peer site is unreachable, preventing metadata divergence that could compromise recovery integrity
- How a coordination layer (the OSC CLI plugin
protectorclientand the Horizon dashboard) authenticates to both sites independently and orchestrates metadata synchronization, given that there is no direct service-to-service communication between the primary and secondary Trilio Site Recovery services
What You Do Not Need to Know in Advance
You do not need physical Pure Storage FlashArray hardware to evaluate this product. The Mock storage driver simulates FlashArray behavior using a local SQLite database, enabling you to deploy, configure, and execute complete DR workflows — including failover and failback — in a lab or CI environment without access to physical arrays. This documentation covers Mock driver usage wherever it differs from production array configuration.
You also do not need prior experience with Trilio's backup and recovery products. Trilio Site Recovery for OpenStack is a standalone product with its own service architecture (protector-api and protector-engine), its own CLI plugin, and its own operational model.
Cloud platform engineer — A professional responsible for building, configuring, and maintaining shared cloud infrastructure, including compute, storage, and networking services. In this context, they hold OpenStack operator-level privileges and are responsible for deploying Trilio Site Recovery services and configuring Cinder backends for replication.
SRE (Site Reliability Engineer) — A role focused on the reliability, availability, and recoverability of production systems. SREs using this product are typically responsible for defining DR policies, executing failover runbooks, and validating recovery objectives.
Tenant-driven DR — A disaster recovery model in which individual OpenStack tenants (projects) can self-manage their Protection Groups, replication policies, and failover operations, within the boundaries established by the cloud operator. Operators configure the underlying infrastructure; tenants control their own workload protection.
Primary site — The OpenStack cloud where protected virtual machines are actively running. This designation is workload-relative and dynamic: after a failover, what was the secondary site becomes the primary site for that workload.
Secondary (DR) site — The OpenStack cloud to which volume data is replicated and to which workloads are recovered during a failover. Like the primary designation, this is dynamic and swaps on failover.
Protection Group — The central unit of protection in Trilio Site Recovery. Each Protection Group has a 1:1:1 mapping with a Cinder Consistency Group and a Pure Storage Protection Group (or Pod for synchronous replication). VMs are added to a Protection Group to bring them under replication policy.
protector-api / protector-engine — The two Trilio Site Recovery services deployed independently on each OpenStack site. They do not communicate directly with each other; coordination is handled by the CLI plugin or Horizon dashboard acting as an orchestration layer.
protectorclient — The OpenStack CLI plugin that extends the openstack command with Trilio Site Recovery operations. It authenticates to both the primary and secondary site simultaneously and is the primary coordination layer for metadata synchronization between sites.
Mock storage driver — A software-only backend that simulates Pure Storage FlashArray replication behavior using SQLite. It enables full end-to-end DR workflow testing without physical storage hardware.
RPO (Recovery Point Objective) — The maximum acceptable amount of data loss measured in time. RPO is directly influenced by the replication interval configured in the replication policy.
RTO (Recovery Time Objective) — The maximum acceptable duration of service downtime following a failure. RTO is influenced by the speed of the failover procedure and the time required to bring VMs online at the secondary site.
- Architecture overview — Explains how the two-site topology works, how
protector-apiandprotector-engineinteract with OpenStack services, and why the CLI plugin is the coordination layer rather than a centralized server. - Mock storage driver — Covers how to use the SQLite-backed Mock driver to simulate Pure FlashArray replication, enabling lab and CI evaluation without physical hardware.
- Cinder volume type configuration for replication — Details the required volume type properties (
replication_enabled,replication_type) that make a volume eligible for inclusion in a Protection Group. - Protection Groups — Explains the 1:1:1 mapping between Protection Groups, Cinder Consistency Groups, and Pure Storage Protection Groups or Pods, and the metadata consistency guarantees that govern them.
- Failover types — Distinguishes between test failover (DR drill), planned failover, unplanned failover, and point-in-time failover, and describes when to use each.
- Site registration — Describes how to register the primary and secondary sites with the
protectorclientplugin, which is the first operational step after deploying the Trilio Site Recovery services.