The Migration Window Playbook: Cutover Without Catastrophe

Every migration window is a bet. You’re betting that your planning was complete, your testing was sufficient, and your rollback will work if needed.

Most migration plans are documents. Most rollback procedures are theoretical. Most migration windows run late because nobody actually verified the Go/No-Go criteria before starting.

This playbook is for the teams that want to win the bet.

The Anatomy of a Migration Window

A migration window has four phases:

Pre-window preparation (T-7 to T-1 days): Final verification, last-minute change tickets, team briefings
Migration execution (T+0): The window itself — the work happens here
Validation and stabilization (T+0 to T+4 hours post-window): Confirm everything works
Post-window monitoring (T+4 to T+72 hours): Catch the delayed failures

Most teams plan the execution and ignore the rest. That’s where overruns come from.

Wave Planning: The Foundation

A wave is a group of users, workloads, or systems that migrate together. The wave plan is the sequence in which those groups move.

How to build a wave plan

Step 1: Map every dependency

For each application in scope, document:

What it depends on (databases, APIs, file shares, identity providers)
What depends on it (users, other applications, scheduled tasks)
What happens to it during migration (does it move, get replicated, or get cut over?)

Step 2: Score each application by migration complexity

Complexity factors:

User count (how many people does this affect)
Data volume (how much data needs to move)
Dependency depth (how many upstream/downstream dependencies)
Availability requirements (can it tolerate downtime, and how much)
Regulatory constraints (does data residency affect where it can live)

Step 3: Order waves by dependency

The goal: no wave should contain two applications where one depends on the other. If Application A authenticates to Application B, and Application B is in Wave 3 while Application A is in Wave 2, Wave 2 can’t complete until Wave 3 is done.

Graph the dependencies. Find the critical path. Build waves that respect the dependency order.

Step 4: Define the wave boundary

Every wave should have a clear definition of:

What is included (which users, which systems, which data sets)
What is excluded (what will be handled in a later wave)
Entry criteria (what must be true before this wave starts)
Exit criteria (what must be true before this wave is considered complete)

Go/No-Go Gates

A Go/No-Go gate is a formal decision point where the team evaluates whether to proceed. The gate is either green (proceed) or red (halt and reassess).

Gate 1: Pre-Wave Validation (T-24 hours)

Checklist:

All pre-migration changes deployed and stable
Target environment prepared and verified
Rollback procedure documented and tested (at least one dry run)
Communication plan sent to affected users
Support team briefed and available
Change ticket approved
Go/No-Go decision meeting held with migration lead and escalation contact

If any item is not confirmed: No-Go until resolved.

Gate 2: Migration Readiness (T-2 hours)

Checklist:

Source system frozen (no new changes being made)
Data backup completed and verified
Target environment reachable
Network connectivity verified between source and target
Identity mapping validated (all users mapped to target accounts)
Application migration scripts tested
Rollback scripts tested
Team in position and confirmed

If any item is not confirmed: No-Go until resolved.

Gate 3: Mid-Wave Checkpoint (at each major milestone within the window)

At each 25% completion milestone within the window:

Error rate within acceptable threshold
No P1 or P2 incidents triggered
Data integrity checks passing
User impact within expected range
Remaining time sufficient for remaining work

If any item is outside acceptable range: Pause and evaluate. Do not proceed without explicit sign-off from migration lead.

Gate 4: Wave Completion (at window end)

Checklist:

All users migrated and validated
All applications functional in target environment
Data integrity confirmed (row counts, file counts, checksum validation)
Identity and access confirmed (all users can authenticate)
No outstanding errors or exceptions
Monitoring and alerting active on target environment

If any item is not confirmed: Do not close the window. Extend or rollback.

The Rollback Procedure

A rollback is only worth having if it’s been tested. A rollback procedure that has never been executed is a theoretical document, not a safety net.

What a real rollback looks like

Trigger conditions (define these before the window opens):

Error rate exceeds X% within Y minutes of starting
A P1 incident is declared
A Go/No-Go gate fails
The migration lead calls it

Rollback sequence (reverse order of migration):

Stop the migration process (halt any running scripts)
Notify affected users and support team
Execute rollback scripts (restore identities, revert permissions, remove migrated objects)
Validate rollback completion (confirm source environment is in pre-migration state)
Root cause the failure before scheduling a retry

What rollback doesn’t fix:

Data that was already written to the target and can’t be reversed cleanly
Users who already started working in the target environment and created new data there
Downstream systems that already received data from the migration

These are why you need a data freeze period before the window. If users can still create data in the source system while migration is running, rollback leaves that data stranded.

Timing: How Long Should a Window Be?

Most teams underestimate window duration. The correct calculation:

Estimated migration time × 1.5 + 30 minutes per Go/No-Go gate + 60 minutes post-window stabilization buffer

If your estimate is 4 hours, the window should be: (4 × 1.5) + 0.5 + 1 = 7.5 hours minimum.

Better yet: ask how much time you have. If the answer is “we need to be done by 6am so the Tokyo team can log in,” work backward from that constraint. Size the wave to fit the window, not the other way around.

Night and weekend windows

Migration windows during business hours are high-risk. The best windows:

Friday night / Saturday morning (weekend, low utilization, can extend if needed)
Saturday / Sunday (full weekend, maximum flexibility)
Holiday periods (Christmas, Easter, July 4th — but plan for reduced support availability)

Avoid: Monday morning, Thursday night (before Friday), the last business day before a holiday.

Post-Window Monitoring: The 72-Hour Rule

The most common migration failure pattern: window completes successfully, everyone goes home, problems emerge overnight or over the weekend, and the first anyone hears about it is a call at 7am Monday from a user in Singapore who can’t access their files.

The fix: 72 hours of active monitoring after every wave.

Monitoring checklist:

Application availability and response time
Error rates in application logs
Failed authentication attempts
Unusual data access patterns (potential permission issues)
Scheduled task failures (something in the migration may have disrupted a scheduled job)
Email and Teams functionality (communication is often the first thing to break)

For a 47,000-identity migration across 6 waves over 18 weeks: every wave gets 72 hours of post-window monitoring before the next wave is authorized to start.

Common Failure Modes

1. Dependency surprise

What happened: Migration was scoped assuming Application A was independent. During execution, discovered Application A authenticates to Application B via a service account in the source forest being decommissioned.

Prevention: Dependency mapping during discovery phase. Every service account, SPN, and cross-application authentication path documented and verified.

2. Identity mapping error

What happened: Users migrated with wrong UPNs. Post-migration, applications that use UPN for authentication rejected the migrated users.

Prevention: Identity mapping validation T-24 hours. Automated UPN validation against all integrated applications. Don’t wait until users call to find out the mapping was wrong.

3. Data freeze failure

What happened: Data continued being written to the source system during migration. Post-rollback, the data written during migration was lost. Post-re-migration, the lost data had to be reconstructed manually.

Prevention: Strict data freeze. Source systems in read-only or locked state during window. No exceptions without explicit migration lead approval.

4. Rollback script error

What happened: Rollback procedure failed to fully revert the source environment. Team spent 3 hours manually fixing what the rollback should have handled automatically.

Prevention: Test the rollback. Actually run it. Not in theory — execute the rollback procedure against a pre-production environment before the window opens.

ACQI Migration runs 62+ migration views with wave planning, Go/No-Go gates, and real-time cutover monitoring. Request a demo →

The Migration Window Playbook: Cutover Without Catastrophe

The Migration Window Playbook: Cutover Without Catastrophe

The Anatomy of a Migration Window

Wave Planning: The Foundation

How to build a wave plan

Go/No-Go Gates

Gate 1: Pre-Wave Validation (T-24 hours)

Gate 2: Migration Readiness (T-2 hours)

Gate 3: Mid-Wave Checkpoint (at each major milestone within the window)

Gate 4: Wave Completion (at window end)

The Rollback Procedure

What a real rollback looks like

Timing: How Long Should a Window Be?

Night and weekend windows

Post-Window Monitoring: The 72-Hour Rule

Common Failure Modes

1. Dependency surprise

2. Identity mapping error

3. Data freeze failure

4. Rollback script error

Running an integration right now?