ProtectionRequest
Request VM protection (DRBD Operator deployments)
A ProtectionRequest (short names: pr, protect) is the DRBD Operator deployment's primary mechanism for placing a single KubeVirt VM under DRBD-backed disaster recovery protection. When you create a ProtectionRequest, the Protection Controller — running on the quorum cluster — stops the VM, creates DRBDVolume resources for each of its PVCs, waits for initial data synchronization to the DR cluster, and then switches the VM to DRBD-backed frontend PVCs before restarting it. From that point forward, every write the VM makes is synchronously replicated to the DR cluster via DRBD Protocol C, giving you an RPO of zero. This resource is specific to DRBD Operator deployments; if you are using a LINSTOR-based deployment, use a ProtectionGroup instead.
Before creating a ProtectionRequest, confirm the following are in place:
- Deployment model: You are running a DRBD Operator deployment (two or three clusters). This resource does not apply to LINSTOR deployments.
- Clusters: A primary workload cluster and a DR workload cluster are deployed and reachable. A quorum cluster is strongly recommended for management plane isolation; the Protection Controller must be running there.
- DRBD Operator: Installed and healthy on both the primary and DR workload clusters.
- DRBDReplicationPolicy: A
DRBDReplicationPolicyexists in the same namespace as theProtectionRequest, with storage class mappings covering the VM's PVCs. - Controllers: The Protection Controller (
src/protection-controller.py) is deployed to the quorum cluster and has kubeconfig secrets for both workload clusters (primary-kubeconfig,dr-kubeconfig). - VM: The target VM exists on the primary cluster, is reachable via the kubeconfig secret, and all of its PVCs use storage classes covered by your
DRBDReplicationPolicy. - Kubernetes: 1.28+ on all clusters; OpenShift 4.17+ is supported.
- Network: Less than 10 ms round-trip latency between primary and DR cluster nodes on DRBD replication ports (7000–7999). DRBD replication traffic flows directly between primary and DR cluster nodes — the quorum cluster does not relay it.
The ProtectionRequest CRD is installed as part of the DRBD Operator deployment. If you have already run the Ansible playbooks for your DRBD Operator deployment, the CRD is present on the quorum cluster. To verify:
kubectl --kubeconfig $KUBECONFIG_QUORUM get crd protectionrequests.siterecovery.trilio.io
If the CRD is missing, re-run the Ansible infrastructure playbook targeting the quorum cluster:
ansible-playbook ansible/deploy-dr-deployment.yml \
--extra-vars "target_cluster=quorum"
Confirm the Protection Controller pod is running on the quorum cluster:
kubectl --kubeconfig $KUBECONFIG_QUORUM get pods \
-n <your-dr-namespace> \
-l app=protection-controller
The controller must be in Running state with no crash loops before you create a ProtectionRequest.
A ProtectionRequest manifest requires the following fields:
| Field | Required | Description |
|---|---|---|
metadata.name | Yes | Name for this protection request. Typically descriptive of the VM being protected. |
metadata.namespace | Yes | The namespace on the quorum cluster where DR resources are managed (e.g., dr-deployment). |
spec.vmName | Yes | Name of the VirtualMachine resource on the primary cluster. |
spec.vmNamespace | Yes | Namespace of the VM on the primary cluster. |
spec.sourceCluster | Yes | Identifier for the primary cluster, matching the cluster name in your DRBDReplicationPolicy. |
The target DR cluster is derived from the DRBDReplicationPolicy in the same namespace — you do not specify it directly in the ProtectionRequest.
Status phases (written by the controller, read-only for you):
| Phase | Meaning |
|---|---|
Pending | Resource created; controller has not yet begun processing. |
Validating | Controller is verifying the VM exists, its PVCs are accessible, and a matching DRBDReplicationPolicy covers all storage classes. |
CreatingDRBD | Controller is creating DRBDVolume resources for each PVC on both workload clusters. |
Syncing | DRBD initial sync (full data copy) is in progress from primary to DR. |
ReadyToActivate | Initial sync is complete; all data is consistent across clusters. |
Activating | Controller is stopping the VM and switching it to DRBD-backed frontend PVCs. |
Protected | VM is running on DRBD-backed PVCs; synchronous replication is active. |
A terminal Failed phase may also appear if the controller encounters an unrecoverable error. Inspect .status.message and the Protection Controller logs for details.
You manage ProtectionRequest resources with kubectl against the quorum cluster. The DRBD Operator's Protection Controller watches for new ProtectionRequest objects and drives the VM through the protection workflow automatically.
List all protection requests in a namespace:
kubectl --kubeconfig $KUBECONFIG_QUORUM get protectionrequest -n dr-deployment
# or using short names:
kubectl --kubeconfig $KUBECONFIG_QUORUM get pr -n dr-deployment
kubectl --kubeconfig $KUBECONFIG_QUORUM get protect -n dr-deployment
Watch protection progress in real time:
kubectl --kubeconfig $KUBECONFIG_QUORUM get pr -n dr-deployment -w
Inspect a specific request for detailed status:
kubectl --kubeconfig $KUBECONFIG_QUORUM get pr protect-my-vm -n dr-deployment -o yaml
Pay attention to .status.phase and .status.message. During Syncing, the controller will surface sync progress information in .status.conditions if available.
What the controller does on your behalf (you do not need to perform these steps manually):
- Validates the VM and all of its PVCs on the primary cluster.
- Confirms a
DRBDReplicationPolicycovers every storage class in use. - Creates
DRBDVolumeresources on both workload clusters for each PVC. - Waits for DRBD initial synchronization to complete (the VM remains running during this phase).
- Stops the VM, replaces its PVCs with DRBD-backed frontend PVCs, and restarts it.
- Sets the phase to
Protectedonce the VM is running on replicated storage.
All API calls from the controller flow from the quorum cluster outward to the workload clusters. The workload clusters have no direct knowledge of the quorum cluster.
Example 1: Protect a single VM
Create a manifest for a VM named web-server running in the production namespace on cluster1:
apiVersion: siterecovery.trilio.io/v1alpha1
kind: ProtectionRequest
metadata:
name: protect-web-server
namespace: dr-deployment
spec:
vmName: web-server
vmNamespace: production
sourceCluster: cluster1
Apply it to the quorum cluster:
kubectl --kubeconfig $KUBECONFIG_QUORUM apply -f protect-web-server.yaml
Expected progression when you watch:
NAME PHASE AGE
protect-web-server Pending 0s
protect-web-server Validating 3s
protect-web-server CreatingDRBD 12s
protect-web-server Syncing 30s
protect-web-server ReadyToActivate 4m
protect-web-server Activating 4m15s
protect-web-server Protected 4m45s
The Syncing phase duration depends on the size of the VM's disks and available network bandwidth between clusters.
Example 2: Inspect detailed status during sync
kubectl --kubeconfig $KUBECONFIG_QUORUM get pr protect-web-server \
-n dr-deployment -o yaml
Expected .status block while syncing:
status:
phase: Syncing
message: "DRBD initial sync in progress for 2 volumes"
Example 3: List all protected VMs across a deployment namespace
kubectl --kubeconfig $KUBECONFIG_QUORUM get pr -n dr-deployment
Example output:
NAME PHASE AGE
protect-web-server Protected 2d
protect-database Protected 2d
protect-cache Syncing 5m
Example 4: Check what DRBDVolumes were created for a protected VM
Once the ProtectionRequest reaches Protected, you can inspect the resulting DRBDVolume resources on the quorum cluster:
kubectl --kubeconfig $KUBECONFIG_QUORUM get drbdvolumes \
-n dr-deployment \
-l siterecovery.trilio.io/protected-vm=web-server
Issue: ProtectionRequest stays in Pending indefinitely
Symptom: Phase does not advance past Pending after several minutes.
Likely cause: The Protection Controller is not running or is not watching the namespace.
Fix:
# Check controller pod status on the quorum cluster
kubectl --kubeconfig $KUBECONFIG_QUORUM get pods \
-n dr-deployment -l app=protection-controller
# View controller logs
kubectl --kubeconfig $KUBECONFIG_QUORUM logs \
-n dr-deployment -l app=protection-controller --tail=50
If the pod is not present, redeploy it:
kubectl --kubeconfig $KUBECONFIG_QUORUM apply \
-f deploy/crds/protection-controller-deployment.yaml
Issue: Validating phase fails — VM or PVCs not found
Symptom: Phase transitions to Failed; .status.message references a missing VM or PVC.
Likely cause: spec.vmName, spec.vmNamespace, or spec.sourceCluster does not match the actual VM, or the primary-kubeconfig secret does not have access to that cluster/namespace.
Fix: Verify the VM name and namespace directly on the primary cluster:
kubectl --kubeconfig $KUBECONFIG_CLUSTER1 get vm \
-n <vmNamespace> <vmName>
Also confirm the kubeconfig secret is present on the quorum cluster:
kubectl --kubeconfig $KUBECONFIG_QUORUM get secret primary-kubeconfig \
-n dr-deployment
Issue: Validating phase fails — no matching DRBDReplicationPolicy
Symptom: Phase transitions to Failed; message indicates a storage class is not covered.
Likely cause: One or more of the VM's PVCs uses a storage class not listed in any DRBDReplicationPolicy in the same namespace.
Fix: Check which storage classes the VM's PVCs use:
kubectl --kubeconfig $KUBECONFIG_CLUSTER1 get pvc \
-n <vmNamespace> -o wide
Then inspect your DRBDReplicationPolicy and add any missing storageClassMappings:
kubectl --kubeconfig $KUBECONFIG_QUORUM get drbdreplicationpolicies \
-n dr-deployment -o yaml
Issue: Syncing phase takes much longer than expected or never completes
Symptom: Phase stays in Syncing for hours; expected sync time based on disk size and bandwidth is much lower.
Likely cause: Network latency or bandwidth between primary and DR cluster nodes is insufficient. DRBD replication runs directly between workload cluster nodes — if those nodes cannot reach each other on ports 7000–7999, sync stalls.
Fix: Verify DRBD port connectivity between a primary cluster node and a DR cluster node:
# From a node on the primary cluster:
nc -zv <dr-node-ip> 7000
Check DRBD resource status on the primary cluster nodes:
kubectl --kubeconfig $KUBECONFIG_CLUSTER1 exec -n <drbd-namespace> \
<drbd-node-agent-pod> -- drbdadm status
Issue: VM does not restart after Activating phase
Symptom: Phase reaches Activating but the VM does not come back up; phase may regress or show Failed.
Likely cause: The frontend PVC swap encountered an error, or the node where the VM was scheduled does not yet have DRBD resources up (the node agent applies a NoSchedule taint during startup).
Fix: Check node taints on primary cluster nodes:
kubectl --kubeconfig $KUBECONFIG_CLUSTER1 get nodes \
-o custom-columns='NAME:.metadata.name,TAINTS:.spec.taints'
If siterecovery.trilio.io/not-ready:NoSchedule is present on the target node, wait for the DRBD node agent to complete its startup sequence and remove the taint. This taint is automatically removed once all DRBD resources on that node are in a Connected state (up to a 120-second timeout). Also inspect the Protection Controller logs for the specific error encountered during the PVC swap.