Phased RDHx Deployment With Coolnet: From Assessment to Rollout

Phased deployment is how most data center teams make liquid-adjacent upgrades operationally survivable: you prove assumptions on a contained slice of production, harden the playbook, then scale row-by-row or pod-by-pod.

This guide is written for project directors and build/expansion leads evaluating rear-door heat exchangers (RDHx) as a high-density rack cooling retrofit path. It’s MOFU by design: criteria, deliverables, responsibilities, and risk controls—not a product pitch.

Key Takeaway: Treat RDHx as a program with gated artifacts (readiness pack → PoC report → rollout runbook), not an “install.” Your schedule stays intact when every phase has a pass/fail definition and a rollback plan.

Table of Contents

Definitions (keep everyone aligned)

RDHx: A rear-door heat exchanger mounted on the back of a rack to capture heat at the exhaust and reject it to a water loop.
FAT / SAT: Factory vs. site acceptance testing—evidence that equipment and the installed system meet predefined criteria.
Change window: A pre-approved period for cutover work with a defined rollback path.
KPIs: The specific, measurable outcomes you’ll use to decide whether to scale.

Step 1: Align on scope and success criteria (before anyone draws piping)

Inputs

Current rack density bands, target density, and growth assumptions
Constraints: aisle geometry, rear clearance, floor loading, maintenance access
Operational constraints: approved change windows and downtime tolerance

Actions

Define the initial deployment boundary (e.g., a single row/pod or a defined set of racks).
Set acceptance criteria in three buckets:
- Thermal: inlet temperature compliance band at top/mid/bottom; stable exhaust temperature; acceptable ΔT across the RDHx.
- Operational: alarm response time; leak-event severity model; mean time to isolate and recover.
- Program: schedule adherence by gate; installation time per rack; documentation completeness.
Decide “must-have” vs. “nice-to-have” instrumentation (sensors, trending cadence, alerting).

Outputs

Deployment charter (scope + gates + owners)
KPI sheet (pass/fail thresholds + measurement method)
Change control outline (how approvals happen)

Done when…

Every KPI has: owner, measurement method, frequency, and a decision rule (what happens if it fails).

Step 2: Run a site readiness assessment (civil, mechanical, controls, ops)

RDHx retrofits usually fail in the gaps between disciplines: physical clearance, loop interfaces, or monitoring assumptions.

Inputs

Existing mechanical drawings and as-builts (or a validated field survey)
Existing monitoring stack (BMS/DCIM), alarm routing, and on-call model

Actions

Validate readiness items that show up repeatedly in planning guidance, including rear clearance, piping routing, and the need for added sensors/valves. IBM’s planning checklist is a credible baseline for pre-work you should expect before installation (IBM “Planning for the installation of rear door heat exchangers”).
Use an engineering-style checklist mindset: the LBNL/DOE-FEMP RDHx guide calls out practical readiness details such as isolation/balancing valves and extra temperature sensors (LBNL “Data Center Rack Cooling with Rear-door Heat Exchanger (RDHx)” PDF).
Identify integration touchpoints you will treat as deliverables, not assumptions:
- isolation and balancing valves placement
- leak detection and response (detection, isolation, notification, documentation)
- sensor placement and naming conventions for trending
- BMS alarming paths and escalation

Outputs

Readiness punch list (ranked by criticality + lead time)
Integration map (what must connect to what, and who owns each interface)
Risk register v1 (top failure modes + mitigations)

Done when…

Every readiness item has an owner and a “ready-by” date aligned to the PoC change window.

Step 3: Design the phased architecture (RDHx + loop interface + controls)

This step is where you decide whether the rollout is repeatable.

Inputs

Target deployment boundary from Step 1
Readiness findings from Step 2

Actions

Define the rollout unit: per rack (slowest, lowest risk) vs. per row/pod (faster, needs tighter change control).
Specify the loop interface requirements you’ll validate during PoC:
- required flow/pressure envelope at the door interface
- isolation strategy (valves, quick disconnects, dripless couplings where applicable)
- drain/fill/purge procedures and who performs them
If your design uses a coolant distribution unit (CDU) on the secondary loop, define it as its own workstream with a separate acceptance checklist (don’t hide it inside “piping”).
Define monitoring and control responsibilities:
- what you will trend (temps, flow, alarms) and at what cadence
- which alarms are “stop work” vs. “log and proceed”
Define the “what if RDHx is offline” posture:
- do you have an air-cooling fallback plan and a scripted response during door-open or RDHx-offline conditions?

Outputs

Reference design package (repeatable for rollout)
Controls/alarming spec (points list + naming + thresholds)
Cutover/rollback playbook draft

Done when…

A second team could deploy the next row using only your design package and runbook—no tribal knowledge required.

Step 4: Execute a PoC (prove performance and operability)

A PoC that only proves “it cools” is incomplete. You need to prove the operations model: alarms, isolation, recovery, documentation.

Inputs

Reference design package
KPI sheet
Change window approval

Actions

Build a PoC test plan that covers:
- normal load operation
- transient behavior (load steps)
- alarm validation (sensor values, alert routing)
- isolation drills (how quickly you can isolate and recover)
- documentation completeness (as-builts, labels, point list)

Outputs

PoC report (results vs. KPIs + issues + remediation plan)
Updated risk register (new failure modes found in practice)
Rollout playbook v1 (what changes for scale)

Done when…

You can answer, in writing: “What did we learn that changes the rollout sequence, staffing, or change-window length?”

Step 5: Plan the rollout gates (row-by-row, with deliverables at each gate)

This is where phased deployment becomes a schedule tool.

Inputs

PoC report
Rollout unit definition

Actions Define gates that match how procurement and operations actually work:

Gate A — Ready to procure and stage
- Materials list, staging plan, spares, and logistics
- Labeling standard and documentation structure
Gate B — Ready to install (per rollout unit)
- Change ticket approved
- Risk controls ready (isolation, leak response, comms plan)
Gate C — Ready to commission
- SAT scripts, point-to-point checks, sign-off templates
Gate D — Ready for steady state
- Monitoring dashboards live
- Ops runbook + on-call training completed

Outputs

Rollout gate checklist (prereqs and pass/fail criteria)
RACI (who does what) per gate
Communications plan (who is informed at each checkpoint)

Done when…

Procurement can see what needs to be purchased and operations can see what needs to be trained and tested.

Step 6: Define RDHx acceptance testing and commissioning deliverables

Acceptance is the bridge between “installed” and “operable.”

Inputs

KPI sheet
Controls/alarming spec

Actions

Define evidence packages that are hard to argue with:
- FAT package (vendor-provided): pressure/leak tightness evidence, QC docs, as-built component list.
- SAT package (site): pressure test after install; sensor point-to-point validation; alarm routing confirmation; labeling verification.
- Integrated checks: load validation and trending; failure-mode drills (e.g., isolate a unit, confirm fallback posture).

Outputs

SAT scripts and sign-off forms
Evidence folder structure (where documents live and how they’re named)

Done when…

You can hand the acceptance package to a third party (or your own QA) and they can verify readiness without interviews.

Step 7: Lock KPIs and monitoring (so optimization is measurable)

A phased rollout is only rational if the KPIs decide the next action.

Inputs

Final KPI sheet
Monitoring/controls points list

Actions

Implement a simple KPI dashboard and review cadence:
- thermal compliance trends (inlet temps by rack position)
- incident log (leak alarms, false positives, isolation events)
- change-window metrics (planned vs. actual cutover time)
Keep efficiency metrics directionally honest. Avoid guarantee language.

Outputs

KPI dashboard definition (data sources, owners, cadence)
Ops review cadence (weekly during rollout; monthly after steady state)

Done when…

A rollout gate cannot be passed without KPI evidence attached.

Step 8: Training and handover (commissioning isn’t complete until ops can run it)

Inputs

Rollout playbook
SAT package

Actions

Train on what actually causes incidents:
- alarm triage and escalation
- isolation and recovery steps
- maintenance intervals and spares
- documentation expectations after every change

Outputs

Ops runbook + quick reference
Training completion log (who is signed off)

Done when…

On-call can respond using the runbook without calling engineering.

Step 9: Scale phased RDHx deployment with controlled change windows

Inputs

Rollout gates + acceptance packages
Post-PoC lessons learned

Actions

Deploy the next unit using the exact same gate sequence.
After each unit:
- record deviations and why
- update the playbook
- update the risk register

If you’re using RDHx as one step in a broader high-density roadmap, use internal resources to help stakeholders understand where RDHx fits relative to other cooling paths (see the Coolnet AI data center cooling selection guide).

Outputs

Rollout playbook vN (living document)
As-built + evidence package per unit

Done when…

The Nth rollout is faster, safer, and more predictable than the first.

A practical RACI you can reuse (example)

Workstream / Deliverable	Operator	Facilities	IT	Integrator	Coolnet
Scope + KPI sheet	A	R	C	C	C
Readiness punch list	C	A/R	C	R	C
Reference design package	C	A	C	R	C
RDHx equipment supply	C	C	C	C	A/R
Installation execution	C	A	C	R	C
Commissioning & SAT support	C	A	C	R	R
Monitoring points + alarm routing	R	C	A/R	R	C
Training + runbook	A/R	R	C	C	R

Legend: R = Responsible, A = Accountable, C = Consulted

Common risks (and mitigations that survive audits)

Integration ambiguity (who owns the interface?) → Write interface owners into the RACI and acceptance package.
Leak-risk anxiety blocks approvals → Treat leak detection + isolation drills as required SAT evidence; document response times.
Change window overruns → Reduce rollout unit size; require a rollback plan at Gate B.
Documentation drift → Make “evidence package complete” a gate criterion.
Skills gap → Training sign-off is required before Gate D.

For stakeholders who want a high-level retrofit rationale (and tradeoffs to consider), Data Center Dynamics offers a useful frame in its 2026 opinion piece on RDHx refurbishments (“Top ten reasons to consider rear door heat exchangers when refurbishing data centers”).

⚠️ Warning: If the project can’t define “offline posture” (what happens when RDHx is unavailable), you’re not ready for production rollout—regardless of cooling performance.

Next steps (resources and how to engage)

If you want internal references your stakeholders can read alongside this playbook:

Coolnet liquid cooling solutions (where RDHx fits within the product family)
High-density compute clusters and advanced liquid cooling for AI (program context for AI/HPC density)
A validation mindset you can adapt to acceptance evidence: Coolnet modular data center validation checklist

CTA: If you’d like, request a commissioning checklist + SAT script aligned to your gate model, then book a technical fit call to validate readiness items and rollout sequencing.