Site Recoveryfor OpenStack
Guide

Installing the Horizon Dashboard Plugin

Deploying the Trilio Site Recovery Horizon plugin for tenant self-service

master

Overview

This guide walks you through deploying the Trilio Site Recovery Horizon dashboard plugin, which adds a Disaster Recovery panel to the OpenStack Horizon web interface. The plugin gives tenant users and DR administrators a graphical interface for managing protection groups, executing failover operations, monitoring DR progress, and running test failovers — without requiring CLI access. Because Trilio Site Recovery coordinates across two independent OpenStack sites, the Horizon plugin is configured with credentials for both sites and acts as the coordination layer, authenticating to each site's Keystone endpoint and synchronizing metadata between them.


Prerequisites

Before installing the Horizon dashboard plugin, ensure the following are in place:

  • Two independent OpenStack clouds are deployed and reachable: a primary site and a secondary (DR) site, each with its own Nova, Cinder, Neutron, and Keystone endpoints.
  • Trilio Site Recovery services (protector-api and protector-engine) are installed and running on both sites. The Horizon plugin communicates with these API endpoints — it does not connect site services to each other.
  • Horizon is installed and operational on at least one site. The plugin can be installed on either or both Horizon instances.
  • Python 3.8 or later is available in the environment where Horizon runs.
  • python-protectorclient (the OSC CLI plugin) is installed in the same Python environment as Horizon. The dashboard plugin depends on this library for API client logic.
  • You have admin access to the Horizon server to install packages and restart services.
  • Cinder volume types with replication_enabled='<is> True' are already configured on both sites. Protection groups cannot be created until replicated volume types exist.
  • Network connectivity exists from the Horizon server to both sites' Keystone and Protector API endpoints. The Horizon plugin makes direct API calls to both sites.

Installation

Step 1: Install the plugin package

Install the Horizon plugin from PyPI into the Python environment that runs your Horizon service. If Horizon runs in a virtualenv, activate it first.

pip install trilio-site-recovery-ui

Step 2: Enable the plugin in Horizon

Horizon discovers plugins by reading enabled configuration files from its openstack_dashboard/enabled/ directory. After installation, the package places an enabled file at a known path. Copy or symlink it into Horizon's enabled directory:

cp /path/to/site-packages/trilio_site_recovery_ui/enabled/_7000_disaster_recovery.py \
    /opt/stack/horizon/openstack_dashboard/enabled/

Replace /opt/stack/horizon with the actual path to your Horizon installation.

Step 3: Configure site credentials

The plugin must know how to reach both the primary and secondary Protector API endpoints. Add the following block to your Horizon local_settings.py (typically at /etc/openstack-dashboard/local_settings.py or /opt/stack/horizon/openstack_dashboard/local_settings.py):

# Trilio Site Recovery - Primary site
PROTECTOR_PRIMARY_AUTH_URL = "http://site-a-controller:5000/v3"
PROTECTOR_PRIMARY_API_URL = "http://site-a-controller:8788"

# Trilio Site Recovery - Secondary (DR) site
PROTECTOR_SECONDARY_AUTH_URL = "http://site-b-controller:5000/v3"
PROTECTOR_SECONDARY_API_URL = "http://site-b-controller:8788"

Adjust the hostnames and ports to match your environment.

Step 4: Collect static assets

Horizon requires static files (JavaScript, CSS) to be collected after plugin installation:

cd /opt/stack/horizon
python manage.py collectstatic --noinput

Step 5: Compress static assets (if compression is enabled)

If your Horizon deployment uses Django's static asset compression, run:

cd /opt/stack/horizon
python manage.py compress --force

Step 6: Restart the Horizon web server

Restart the web server process serving Horizon to load the new plugin. The exact command depends on your deployment:

# Apache (most common)
sudo systemctl restart apache2

# Or for nginx + uWSGI
sudo systemctl restart uwsgi

Step 7: Verify the panel is visible

Log in to the Horizon dashboard. Under the Project menu, you should now see a Disaster Recovery panel. Admin users will additionally see a Disaster Recovery section under the Admin menu for site-level operations.

If the panel does not appear, check your web server error log and the Django log for import errors related to the plugin.


Configuration

The plugin's runtime behavior is controlled through settings in Horizon's local_settings.py. The following options are available:


PROTECTOR_PRIMARY_AUTH_URL

Required. The Keystone v3 authentication URL for the primary OpenStack site.

PROTECTOR_PRIMARY_AUTH_URL = "http://site-a-controller:5000/v3"

The plugin uses this to authenticate tenant users against the primary site. This value must match the Keystone endpoint your primary protector-api service is registered with.


PROTECTOR_PRIMARY_API_URL

Required. The base URL for the protector-api service on the primary site.

PROTECTOR_PRIMARY_API_URL = "http://site-a-controller:8788"

Do not include a trailing slash or version path — the plugin appends /v1/<tenant_id>/ automatically.


PROTECTOR_SECONDARY_AUTH_URL

Required. The Keystone v3 authentication URL for the secondary (DR) OpenStack site.

PROTECTOR_SECONDARY_AUTH_URL = "http://site-b-controller:5000/v3"

The plugin authenticates independently to this site to coordinate metadata sync and to execute or monitor DR operations on the secondary site.


PROTECTOR_SECONDARY_API_URL

Required. The base URL for the protector-api service on the secondary site.

PROTECTOR_SECONDARY_API_URL = "http://site-b-controller:8788"

PROTECTOR_DEFAULT_FAILOVER_TYPE

Optional. Sets the default failover type pre-selected in the Failover dialog. Valid values: "planned", "unplanned". Defaults to "planned" to reduce the risk of accidental unplanned failovers.

PROTECTOR_DEFAULT_FAILOVER_TYPE = "planned"

PROTECTOR_OPERATION_POLL_INTERVAL

Optional. How frequently (in seconds) the dashboard polls the API for DR operation progress updates. Lower values give more responsive progress bars but increase API load. Defaults to 5.

PROTECTOR_OPERATION_POLL_INTERVAL = 5

Role-based access

The plugin enforces the same RBAC policies as the underlying API. Specifically:

  • Tenant users can manage their own protection groups, execute per-group failovers, and monitor operations.
  • Admin users can access the Admin > Disaster Recovery panel for site registration and site-level operations.
  • Site failover (the "Big Red Button" for failing over all protection groups at a site simultaneously) requires the dr_site_admin role. Assign it before users need to use that feature:
openstack role create dr_site_admin
openstack role add --user <username> --project <project> dr_site_admin

Usage

Once the plugin is installed, the Horizon dashboard exposes Trilio Site Recovery functionality in two locations:

Project > Disaster Recovery

This is the primary interface for tenant engineers. From here you can:

Manage protection groups

  • View all protection groups for your project, their status, current primary site, member VM count, and replication health.
  • Create a new protection group by clicking Create Protection Group. You will select a replicated volume type, the primary and secondary sites, and the replication type (async or sync).
  • Click a protection group name to open its detail view, where you can add or remove VMs, inspect consistency group volumes, configure resource mappings, and view recent DR operations.

Execute DR operations from the protection group detail view

  • Failover — opens the Failover dialog. Select failover type (Planned or Unplanned), optionally choose a specific recovery point from the dropdown (for point-in-time recovery), and confirm. The plugin initiates the operation on the appropriate site and opens a live progress view.
  • Test Failover — opens the Test Failover dialog. Map production networks to an isolated test network, optionally select a recovery point, and set an instance name prefix. Test VMs are created on the DR site without affecting production workloads. A Cleanup Test Failover button appears once test resources exist.
  • Failback — available after a failover has occurred. Returns workloads to the original primary site.
  • Sync — forces an immediate consistency group sync.

Monitor operations

  • The Operations tab on the protection group detail page shows all DR operations with real-time progress, status, and step-by-step detail.

Admin > Disaster Recovery

This panel is visible to users with the admin role and provides site management functions:

  • DR Sites table — register new sites, view site status (active, unreachable, error), and validate site connectivity.
  • Site Failover — the "Big Red Button" button in the DR Sites table triggers a site-level failover that fails over all protection groups at a site in parallel. This operation requires the dr_site_admin role in addition to admin.
  • Site Operations table — monitor the progress of site-level failover operations, including per-protection-group status and error summaries for partial failures.

Important behavioral notes

  • The plugin authenticates to both sites on your behalf when you perform any cross-site operation. It uses your current Horizon session credentials against the primary site and the configured secondary site endpoint.
  • Modifications to a protection group are blocked if the peer site is unreachable. The UI will display an error and prevent the change to avoid metadata divergence between sites.
  • Primary and secondary site designations are workload-relative and swap on failover. After a failover, the protection group detail view reflects the new current primary site.

Examples

Example 1: Navigating to the Disaster Recovery panel

After logging in to Horizon, expand the Project menu in the left sidebar and click Disaster Recovery.

Project
└── Disaster Recovery
    ā”œā”€ā”€ Protection Groups
    └── Operations

The Protection Groups list shows all protection groups for your project:

+-----------------+-----+----------+--------------+
| Name            | VMs | Status   | Current Site |
+-----------------+-----+----------+--------------+
| prod-web-app    |  3  | Active   | Site A       |
| prod-db         |  2  | Active   | Site A       |
| analytics       |  1  | Error    | Site B       |
+-----------------+-----+----------+--------------+

Example 2: Creating a protection group

  1. Click Create Protection Group.
  2. Fill in the form:
    • Name: prod-web-app
    • Description: Production web application
    • Primary Site: site-a
    • Secondary Site: site-b
    • Volume Type: replicated-ssd (only volume types with replication_enabled='<is> True' appear in this list)
    • Replication Type: async
  3. Click Create.

The protection group is created. A Cinder Consistency Group is automatically provisioned on the primary site and linked to a Pure Storage Protection Group on the backend.


Example 3: Running a test failover (DR drill)

  1. Navigate to Project > Disaster Recovery > Protection Groups and click prod-web-app.
  2. From the Actions dropdown, select Test Failover.
  3. In the dialog:
    • Recovery Point: select protector-dr-test-pg.29 (6h 15m old, replicated) from the dropdown.
    • Network Mapping: map web-net (site-a) → dr-test-net (site-b).
    • Instance Prefix: drill-
  4. Click Start Test Failover.

The operation progress panel opens:

Test Failover: prod-web-app
────────────────────────────────────────
āœ“  Recovery point selected: .29
āœ“  Volumes cloned on secondary array
āœ“  Cloned volumes imported into Cinder
⟳  Creating test VMs on dr-test-net...

When complete, test VMs named drill-web-server-1, drill-web-server-2, and drill-db-server appear on the DR site, connected only to the isolated dr-test-net.

After validation, click Cleanup Test Failover on the protection group detail page to delete the test VMs and cloned volumes.


Example 4: Executing a planned failover

  1. Navigate to the prod-web-app protection group detail page.
  2. Click Failover.
  3. In the Failover dialog:
    • Failover Type: Planned
    • Recovery Point: Latest (create new snapshot)
    • Leave Force unchecked.
  4. Click Initiate Failover.

The wizard runs pre-flight checks before proceeding:

Pre-Flight Checks
────────────────────────────────────────
āœ“  Site B is reachable
āœ“  Metadata is in sync
āœ“  All volumes replicated
āœ“  RPO compliant (5 min < 15 min)

After checks pass, the failover executes. Primary VMs are shut down gracefully, a final snapshot is created and replicated, and VMs are started on Site B. The protection group status updates to failed_over and Current Site changes to Site B.


Example 5: Triggering a site-level failover (Admin panel)

  1. Navigate to Admin > Disaster Recovery.
  2. In the DR Sites table, locate site-a.
  3. Click the red Failover Site button.
  4. In the confirmation modal:
    • Confirm the site name: site-a
    • Failover Type: Unplanned
    • Check Force failover (required when Site A is unreachable).
    • Check the confirmation checkbox.
  5. Click Execute Failover.

The Site Operations table shows the operation with real-time progress:

+--------------------+---------------+---------+----------+----------+
| Operation ID       | Site          | Type    | Status   | Progress |
+--------------------+---------------+---------+----------+----------+
| a1b2c3d4-...       | site-a        | site_   | running  | 60%      |
|                    |               | failover|          |          |
+--------------------+---------------+---------+----------+----------+

Click View Details to see per-protection-group status and any error summaries for groups that fail to recover.


Troubleshooting

The Disaster Recovery panel does not appear in Horizon

Symptom: After installation, there is no Disaster Recovery entry under the Project menu.

Likely causes:

  • The enabled file was not copied to Horizon's enabled/ directory.
  • Static assets were not collected after installation.
  • The web server was not restarted.

Fix:

  1. Confirm the enabled file is present:
    ls /opt/stack/horizon/openstack_dashboard/enabled/ | grep disaster
    
  2. Re-run static asset collection and compression:
    python manage.py collectstatic --noinput
    python manage.py compress --force
    
  3. Restart the web server:
    sudo systemctl restart apache2
    
  4. Check the web server error log for Python import errors related to the plugin.

"Unable to reach secondary site" error when opening a protection group

Symptom: The protection group detail page displays an error banner stating the secondary site is unreachable. Create, update, and delete actions are blocked.

Likely cause: The Horizon server cannot reach the secondary site's Keystone or Protector API endpoint. This is by design — metadata modifications are blocked when the peer site is unreachable to prevent state divergence.

Fix:

  1. Verify network connectivity from the Horizon server to the secondary site:
    curl -s http://site-b-controller:5000/v3 | python3 -m json.tool
    curl -s http://site-b-controller:8788
    
  2. Check PROTECTOR_SECONDARY_AUTH_URL and PROTECTOR_SECONDARY_API_URL in local_settings.py for typos.
  3. Confirm the secondary protector-api service is running on Site B:
    # On the Site B controller
    systemctl status protector-api
    
  4. Check firewall rules between the Horizon server and Site B.

The Volume Type dropdown is empty when creating a protection group

Symptom: The Volume Type field in the Create Protection Group dialog shows no options.

Likely cause: No Cinder volume types on the primary site have replication_enabled='<is> True' set. The plugin filters volume types to show only replication-capable types.

Fix: Confirm that replicated volume types exist on the primary site:

openstack volume type list --long | grep replication_enabled

If none are listed, create a volume type with the required properties and ensure the Cinder backend supports replication. Refer to the "Prepare replication-enabled volume types" guide.


Failover dialog shows no recovery points in the dropdown

Symptom: When initiating a failover or test failover, the Recovery Point dropdown contains only "Latest (create new snapshot)" and no historical snapshots.

Likely cause: No replicated snapshots exist on the secondary array, or the replication policy has not yet produced a snapshot cycle.

Fix:

  1. Verify that replication is healthy and that at least one snapshot has been replicated:
    openstack dr recovery point list <protection-group>
    
  2. If the list is empty, check the replication policy interval and confirm the Pure Storage Protection Group is actively replicating.
  3. For async replication, wait for at least one replication cycle to complete before recovery points appear.

"Site actions require dr_site_admin role" error in the Admin panel

Symptom: Clicking Failover Site in the Admin panel returns a permission error.

Likely cause: Your user does not have the dr_site_admin role, which is required for site-level failover operations (separate from the generic admin role).

Fix:

openstack role add --user <your-username> --project <your-project> dr_site_admin

Then log out and back in to Horizon for the role assignment to take effect in your session.


Test failover VMs are not visible after the operation completes

Symptom: The test failover operation shows as completed, but no test VMs appear in the Horizon Instances list.

Likely cause: The test VMs were created on the secondary site, but Horizon is scoped to the primary site's Nova endpoint.

Fix: Switch your Horizon session to the secondary site, or use the secondary site's Nova endpoint directly to list instances:

# Source secondary site credentials
source site-b-openrc.sh
openstack server list --all-projects | grep test-

Test VMs are always created on the DR (secondary) site and will not appear in the primary site's instance list.