The Migration Window Playbook: Cutover Without Catastrophe
Every migration window is a bet. You’re betting that your planning was complete, your testing was sufficient, and your rollback will work if needed.
Most migration plans are documents. Most rollback procedures are theoretical. Most migration windows run late because nobody actually verified the Go/No-Go criteria before starting.
This playbook is for the teams that want to win the bet.
The Anatomy of a Migration Window
A migration window has four phases:
- Pre-window preparation (T-7 to T-1 days): Final verification, last-minute change tickets, team briefings
- Migration execution (T+0): The window itself — the work happens here
- Validation and stabilization (T+0 to T+4 hours post-window): Confirm everything works
- Post-window monitoring (T+4 to T+72 hours): Catch the delayed failures
Most teams plan the execution and ignore the rest. That’s where overruns come from.
Wave Planning: The Foundation
A wave is a group of users, workloads, or systems that migrate together. The wave plan is the sequence in which those groups move.
How to build a wave plan
Step 1: Map every dependency
For each application in scope, document:
- What it depends on (databases, APIs, file shares, identity providers)
- What depends on it (users, other applications, scheduled tasks)
- What happens to it during migration (does it move, get replicated, or get cut over?)
Step 2: Score each application by migration complexity
Complexity factors:
- User count (how many people does this affect)
- Data volume (how much data needs to move)
- Dependency depth (how many upstream/downstream dependencies)
- Availability requirements (can it tolerate downtime, and how much)
- Regulatory constraints (does data residency affect where it can live)
Step 3: Order waves by dependency
The goal: no wave should contain two applications where one depends on the other. If Application A authenticates to Application B, and Application B is in Wave 3 while Application A is in Wave 2, Wave 2 can’t complete until Wave 3 is done.
Graph the dependencies. Find the critical path. Build waves that respect the dependency order.
Step 4: Define the wave boundary
Every wave should have a clear definition of:
- What is included (which users, which systems, which data sets)
- What is excluded (what will be handled in a later wave)
- Entry criteria (what must be true before this wave starts)
- Exit criteria (what must be true before this wave is considered complete)
Go/No-Go Gates
A Go/No-Go gate is a formal decision point where the team evaluates whether to proceed. The gate is either green (proceed) or red (halt and reassess).
Gate 1: Pre-Wave Validation (T-24 hours)
Checklist:
- All pre-migration changes deployed and stable
- Target environment prepared and verified
- Rollback procedure documented and tested (at least one dry run)
- Communication plan sent to affected users
- Support team briefed and available
- Change ticket approved
- Go/No-Go decision meeting held with migration lead and escalation contact
If any item is not confirmed: No-Go until resolved.
Gate 2: Migration Readiness (T-2 hours)
Checklist:
- Source system frozen (no new changes being made)
- Data backup completed and verified
- Target environment reachable
- Network connectivity verified between source and target
- Identity mapping validated (all users mapped to target accounts)
- Application migration scripts tested
- Rollback scripts tested
- Team in position and confirmed
If any item is not confirmed: No-Go until resolved.
Gate 3: Mid-Wave Checkpoint (at each major milestone within the window)
At each 25% completion milestone within the window:
- Error rate within acceptable threshold
- No P1 or P2 incidents triggered
- Data integrity checks passing
- User impact within expected range
- Remaining time sufficient for remaining work
If any item is outside acceptable range: Pause and evaluate. Do not proceed without explicit sign-off from migration lead.
Gate 4: Wave Completion (at window end)
Checklist:
- All users migrated and validated
- All applications functional in target environment
- Data integrity confirmed (row counts, file counts, checksum validation)
- Identity and access confirmed (all users can authenticate)
- No outstanding errors or exceptions
- Monitoring and alerting active on target environment
If any item is not confirmed: Do not close the window. Extend or rollback.
The Rollback Procedure
A rollback is only worth having if it’s been tested. A rollback procedure that has never been executed is a theoretical document, not a safety net.
What a real rollback looks like
Trigger conditions (define these before the window opens):
- Error rate exceeds X% within Y minutes of starting
- A P1 incident is declared
- A Go/No-Go gate fails
- The migration lead calls it
Rollback sequence (reverse order of migration):
- Stop the migration process (halt any running scripts)
- Notify affected users and support team
- Execute rollback scripts (restore identities, revert permissions, remove migrated objects)
- Validate rollback completion (confirm source environment is in pre-migration state)
- Root cause the failure before scheduling a retry
What rollback doesn’t fix:
- Data that was already written to the target and can’t be reversed cleanly
- Users who already started working in the target environment and created new data there
- Downstream systems that already received data from the migration
These are why you need a data freeze period before the window. If users can still create data in the source system while migration is running, rollback leaves that data stranded.
Timing: How Long Should a Window Be?
Most teams underestimate window duration. The correct calculation:
Estimated migration time × 1.5 + 30 minutes per Go/No-Go gate + 60 minutes post-window stabilization buffer
If your estimate is 4 hours, the window should be: (4 × 1.5) + 0.5 + 1 = 7.5 hours minimum.
Better yet: ask how much time you have. If the answer is “we need to be done by 6am so the Tokyo team can log in,” work backward from that constraint. Size the wave to fit the window, not the other way around.
Night and weekend windows
Migration windows during business hours are high-risk. The best windows:
- Friday night / Saturday morning (weekend, low utilization, can extend if needed)
- Saturday / Sunday (full weekend, maximum flexibility)
- Holiday periods (Christmas, Easter, July 4th — but plan for reduced support availability)
Avoid: Monday morning, Thursday night (before Friday), the last business day before a holiday.
Post-Window Monitoring: The 72-Hour Rule
The most common migration failure pattern: window completes successfully, everyone goes home, problems emerge overnight or over the weekend, and the first anyone hears about it is a call at 7am Monday from a user in Singapore who can’t access their files.
The fix: 72 hours of active monitoring after every wave.
Monitoring checklist:
- Application availability and response time
- Error rates in application logs
- Failed authentication attempts
- Unusual data access patterns (potential permission issues)
- Scheduled task failures (something in the migration may have disrupted a scheduled job)
- Email and Teams functionality (communication is often the first thing to break)
For a 47,000-identity migration across 6 waves over 18 weeks: every wave gets 72 hours of post-window monitoring before the next wave is authorized to start.
Common Failure Modes
1. Dependency surprise
What happened: Migration was scoped assuming Application A was independent. During execution, discovered Application A authenticates to Application B via a service account in the source forest being decommissioned.
Prevention: Dependency mapping during discovery phase. Every service account, SPN, and cross-application authentication path documented and verified.
2. Identity mapping error
What happened: Users migrated with wrong UPNs. Post-migration, applications that use UPN for authentication rejected the migrated users.
Prevention: Identity mapping validation T-24 hours. Automated UPN validation against all integrated applications. Don’t wait until users call to find out the mapping was wrong.
3. Data freeze failure
What happened: Data continued being written to the source system during migration. Post-rollback, the data written during migration was lost. Post-re-migration, the lost data had to be reconstructed manually.
Prevention: Strict data freeze. Source systems in read-only or locked state during window. No exceptions without explicit migration lead approval.
4. Rollback script error
What happened: Rollback procedure failed to fully revert the source environment. Team spent 3 hours manually fixing what the rollback should have handled automatically.
Prevention: Test the rollback. Actually run it. Not in theory — execute the rollback procedure against a pre-production environment before the window opens.
ACQI Migration runs 62+ migration views with wave planning, Go/No-Go gates, and real-time cutover monitoring. Request a demo →