
Phased deployment is how most data center teams make liquid-adjacent upgrades operationally survivable: you prove assumptions on a contained slice of production, harden the playbook, then scale row-by-row or pod-by-pod.
This guide is written for project directors and build/expansion leads evaluating rear-door heat exchangers (RDHx) as a high-density rack cooling retrofit path. It’s MOFU by design: criteria, deliverables, responsibilities, and risk controls—not a product pitch.
Key Takeaway: Treat RDHx as a program with gated artifacts (readiness pack → PoC report → rollout runbook), not an “install.” Your schedule stays intact when every phase has a pass/fail definition and a rollback plan.
Table of Contents
ToggleDefinitions (keep everyone aligned)
RDHx: A rear-door heat exchanger mounted on the back of a rack to capture heat at the exhaust and reject it to a water loop.
FAT / SAT: Factory vs. site acceptance testing—evidence that equipment and the installed system meet predefined criteria.
Change window: A pre-approved period for cutover work with a defined rollback path.
KPIs: The specific, measurable outcomes you’ll use to decide whether to scale.
Step 1: Align on scope and success criteria (before anyone draws piping)
Inputs
Current rack density bands, target density, and growth assumptions
Constraints: aisle geometry, rear clearance, floor loading, maintenance access
Operational constraints: approved change windows and downtime tolerance
Actions
Define the initial deployment boundary (e.g., a single row/pod or a defined set of racks).
Set acceptance criteria in three buckets:
Thermal: inlet temperature compliance band at top/mid/bottom; stable exhaust temperature; acceptable ΔT across the RDHx.
Operational: alarm response time; leak-event severity model; mean time to isolate and recover.
Program: schedule adherence by gate; installation time per rack; documentation completeness.
Decide “must-have” vs. “nice-to-have” instrumentation (sensors, trending cadence, alerting).
Outputs
Deployment charter (scope + gates + owners)
KPI sheet (pass/fail thresholds + measurement method)
Change control outline (how approvals happen)
Done when…
Every KPI has: owner, measurement method, frequency, and a decision rule (what happens if it fails).
Step 2: Run a site readiness assessment (civil, mechanical, controls, ops)
RDHx retrofits usually fail in the gaps between disciplines: physical clearance, loop interfaces, or monitoring assumptions.
Inputs
Existing mechanical drawings and as-builts (or a validated field survey)
Existing monitoring stack (BMS/DCIM), alarm routing, and on-call model
Actions
Validate readiness items that show up repeatedly in planning guidance, including rear clearance, piping routing, and the need for added sensors/valves. IBM’s planning checklist is a credible baseline for pre-work you should expect before installation (IBM “Planning for the installation of rear door heat exchangers”).
Use an engineering-style checklist mindset: the LBNL/DOE-FEMP RDHx guide calls out practical readiness details such as isolation/balancing valves and extra temperature sensors (LBNL “Data Center Rack Cooling with Rear-door Heat Exchanger (RDHx)” PDF).
Identify integration touchpoints you will treat as deliverables, not assumptions:
isolation and balancing valves placement
leak detection and response (detection, isolation, notification, documentation)
sensor placement and naming conventions for trending
BMS alarming paths and escalation
Outputs
Readiness punch list (ranked by criticality + lead time)
Integration map (what must connect to what, and who owns each interface)
Risk register v1 (top failure modes + mitigations)
Done when…
Every readiness item has an owner and a “ready-by” date aligned to the PoC change window.
Step 3: Design the phased architecture (RDHx + loop interface + controls)
This step is where you decide whether the rollout is repeatable.
Inputs
Target deployment boundary from Step 1
Readiness findings from Step 2
Actions
Define the rollout unit: per rack (slowest, lowest risk) vs. per row/pod (faster, needs tighter change control).
Specify the loop interface requirements you’ll validate during PoC:
required flow/pressure envelope at the door interface
isolation strategy (valves, quick disconnects, dripless couplings where applicable)
drain/fill/purge procedures and who performs them
If your design uses a coolant distribution unit (CDU) on the secondary loop, define it as its own workstream with a separate acceptance checklist (don’t hide it inside “piping”).
Define monitoring and control responsibilities:
what you will trend (temps, flow, alarms) and at what cadence
which alarms are “stop work” vs. “log and proceed”
Define the “what if RDHx is offline” posture:
do you have an air-cooling fallback plan and a scripted response during door-open or RDHx-offline conditions?
Outputs
Reference design package (repeatable for rollout)
Controls/alarming spec (points list + naming + thresholds)
Cutover/rollback playbook draft
Done when…
A second team could deploy the next row using only your design package and runbook—no tribal knowledge required.
Step 4: Execute a PoC (prove performance and operability)
A PoC that only proves “it cools” is incomplete. You need to prove the operations model: alarms, isolation, recovery, documentation.
Inputs
Reference design package
KPI sheet
Change window approval
Actions
Build a PoC test plan that covers:
normal load operation
transient behavior (load steps)
alarm validation (sensor values, alert routing)
isolation drills (how quickly you can isolate and recover)
documentation completeness (as-builts, labels, point list)
Outputs
PoC report (results vs. KPIs + issues + remediation plan)
Updated risk register (new failure modes found in practice)
Rollout playbook v1 (what changes for scale)
Done when…
You can answer, in writing: “What did we learn that changes the rollout sequence, staffing, or change-window length?”
Step 5: Plan the rollout gates (row-by-row, with deliverables at each gate)
This is where phased deployment becomes a schedule tool.
Inputs
PoC report
Rollout unit definition
Actions Define gates that match how procurement and operations actually work:
Gate A — Ready to procure and stage
Materials list, staging plan, spares, and logistics
Labeling standard and documentation structure
Gate B — Ready to install (per rollout unit)
Change ticket approved
Risk controls ready (isolation, leak response, comms plan)
Gate C — Ready to commission
SAT scripts, point-to-point checks, sign-off templates
Gate D — Ready for steady state
Monitoring dashboards live
Ops runbook + on-call training completed
Outputs
Rollout gate checklist (prereqs and pass/fail criteria)
RACI (who does what) per gate
Communications plan (who is informed at each checkpoint)
Done when…
Procurement can see what needs to be purchased and operations can see what needs to be trained and tested.
Step 6: Define RDHx acceptance testing and commissioning deliverables
Acceptance is the bridge between “installed” and “operable.”
Inputs
KPI sheet
Controls/alarming spec
Actions
Define evidence packages that are hard to argue with:
FAT package (vendor-provided): pressure/leak tightness evidence, QC docs, as-built component list.
SAT package (site): pressure test after install; sensor point-to-point validation; alarm routing confirmation; labeling verification.
Integrated checks: load validation and trending; failure-mode drills (e.g., isolate a unit, confirm fallback posture).
Outputs
SAT scripts and sign-off forms
Evidence folder structure (where documents live and how they’re named)
Done when…
You can hand the acceptance package to a third party (or your own QA) and they can verify readiness without interviews.
Step 7: Lock KPIs and monitoring (so optimization is measurable)
A phased rollout is only rational if the KPIs decide the next action.
Inputs
Final KPI sheet
Monitoring/controls points list
Actions
Implement a simple KPI dashboard and review cadence:
thermal compliance trends (inlet temps by rack position)
incident log (leak alarms, false positives, isolation events)
change-window metrics (planned vs. actual cutover time)
Keep efficiency metrics directionally honest. Avoid guarantee language.
Outputs
KPI dashboard definition (data sources, owners, cadence)
Ops review cadence (weekly during rollout; monthly after steady state)
Done when…
A rollout gate cannot be passed without KPI evidence attached.
Step 8: Training and handover (commissioning isn’t complete until ops can run it)
Inputs
Rollout playbook
SAT package
Actions
Train on what actually causes incidents:
alarm triage and escalation
isolation and recovery steps
maintenance intervals and spares
documentation expectations after every change
Outputs
Ops runbook + quick reference
Training completion log (who is signed off)
Done when…
On-call can respond using the runbook without calling engineering.
Step 9: Scale phased RDHx deployment with controlled change windows
Inputs
Rollout gates + acceptance packages
Post-PoC lessons learned
Actions
Deploy the next unit using the exact same gate sequence.
After each unit:
record deviations and why
update the playbook
update the risk register
If you’re using RDHx as one step in a broader high-density roadmap, use internal resources to help stakeholders understand where RDHx fits relative to other cooling paths (see the Coolnet AI data center cooling selection guide).
Outputs
Rollout playbook vN (living document)
As-built + evidence package per unit
Done when…
The Nth rollout is faster, safer, and more predictable than the first.
A practical RACI you can reuse (example)
Workstream / Deliverable | Operator | Facilities | IT | Integrator | Coolnet |
|---|---|---|---|---|---|
Scope + KPI sheet | A | R | C | C | C |
Readiness punch list | C | A/R | C | R | C |
Reference design package | C | A | C | R | C |
RDHx equipment supply | C | C | C | C | A/R |
Installation execution | C | A | C | R | C |
Commissioning & SAT support | C | A | C | R | R |
Monitoring points + alarm routing | R | C | A/R | R | C |
Training + runbook | A/R | R | C | C | R |
Legend: R = Responsible, A = Accountable, C = Consulted
Common risks (and mitigations that survive audits)
Integration ambiguity (who owns the interface?) → Write interface owners into the RACI and acceptance package.
Leak-risk anxiety blocks approvals → Treat leak detection + isolation drills as required SAT evidence; document response times.
Change window overruns → Reduce rollout unit size; require a rollback plan at Gate B.
Documentation drift → Make “evidence package complete” a gate criterion.
Skills gap → Training sign-off is required before Gate D.
For stakeholders who want a high-level retrofit rationale (and tradeoffs to consider), Data Center Dynamics offers a useful frame in its 2026 opinion piece on RDHx refurbishments (“Top ten reasons to consider rear door heat exchangers when refurbishing data centers”).
⚠️ Warning: If the project can’t define “offline posture” (what happens when RDHx is unavailable), you’re not ready for production rollout—regardless of cooling performance.
Next steps (resources and how to engage)
If you want internal references your stakeholders can read alongside this playbook:
Coolnet liquid cooling solutions (where RDHx fits within the product family)
High-density compute clusters and advanced liquid cooling for AI (program context for AI/HPC density)
A validation mindset you can adapt to acceptance evidence: Coolnet modular data center validation checklist
CTA: If you’d like, request a commissioning checklist + SAT script aligned to your gate model, then book a technical fit call to validate readiness items and rollout sequencing.







