Data Estate Due Diligence: The M&A Discovery Checklist That Actually Works

The IT due diligence questionnaire arrives in the data room. It asks 40 questions about infrastructure, applications, and licensing. The target’s IT team fills it in. The buyer’s technical team reviews it. The deal team includes it in the integration planning deck.

The problem: the questionnaire captures what the target’s IT team knows. And in any acquisition — particularly one where IT was not a primary driver of the deal thesis — what IT knows is a subset of what exists.

A buyer’s technical team that relies on questionnaires is flying blind.

Why Questionnaires Fail

A questionnaire produces answers in the format the target chooses to provide them. The target’s IT team may not know about the shadow SaaS subscriptions in marketing. The CFO may have signed an enterprise agreement that IT hasn’t tracked properly. The acquired entity may be running a legacy ERP that nobody documented because the person who set it up left three years ago.

The questionnaire also rewards incompleteness. If the target writes “see attached” for a section, the buyer’s team has to ask a follow-up question, wait for a response, and hope the answer is accurate. There’s no mechanism for verifying what’s not in the document.

Discovery scanning solves both problems. It produces structured, machine-readable data. It finds things the target didn’t know to report. And it can be completed in days, not weeks.

The Discovery Checklist That Matters

Identity and Access

For every identity system in scope:

Active Directory: Forest structure, domain memberships, OU hierarchy, trust relationships, group membership (including nested groups), service accounts, and password policies. Every AD forest that will be in scope post-close needs a complete map.
Entra ID / Azure AD: Tenant structure, conditional access policies, PIM assignments, managed identities, application registrations, and API permissions granted to third-party apps. If the target uses M365, this is where the security exposure hides.
Privileged accounts: Any account with permanent privileged access — not just AD admins, but application owners, service account administrators, and emergency access accounts. These are your highest-risk inheritance.
Orphaned accounts: Accounts belonging to users who have left but haven’t been disabled. These accumulate in every organization and represent credential risk.

Cloud Infrastructure

For every cloud subscription or environment:

Azure: Subscriptions, resource groups, VNets, subnets, network security groups, storage accounts, VMs, PaaS services, and RBAC assignments. Who has what access at the subscription and resource group level? What public endpoints exist? What conditional access policies apply to Azure resources?
AWS: Account structure, VPCs, IAM roles, security groups, S3 bucket policies, EC2 instances, Lambda functions, and any cross-account trust relationships. If the target uses AWS, the same visibility gap exists as Azure — plus a different permission model.
GCP (if applicable): Project structure, IAM bindings, VPC networks, storage buckets, and service account keys.

Application Portfolio

For every application in use:

Business-critical applications: The ERP, the CRM, the HR system, the engineering tools — whatever runs the business. These need dependency maps: what talks to what, what data lives where, what would break if this moved.
Productivity stack: M365 or Google Workspace. Email, file storage, collaboration tools, project management. For M365 specifically: Exchange mailboxes, SharePoint sites, Teams, OneDrive, Power Platform. License utilization, data distribution, collaboration patterns.
SaaS subscriptions: Every paid SaaS application, what it costs, what it does, who uses it, and what data it holds. This is where shadow IT hides. An OAuth audit of M365 or Google Workspace will surface every third-party app that has API access — including ones nobody tracks.
Custom/legacy applications: Internal applications, older systems that are still in production, anything running on-premises or in a data centre that the business depends on.

Technical Debt

For every environment:

Security findings: Unpatched systems, expired certificates, missing MFA, insecure configurations. What would a penetration tester find on day one?
Network documentation: What is the network topology? What are the segmentation boundaries? What is the MPLS or VPN structure? What needs to change post-close?
Data residency: Where does regulated data live? Is it in the right jurisdiction? What cross-border transfer mechanisms exist? If you inherit GDPR-relevant data, where is it and what are the lawful bases for processing?
Licensing liability: What are the actual license assignments vs. entitlements? What auto-renewals exist? What happens to licenses when users leave?

How to Get This Data in a Due Diligence Timeline

Due diligence runs 6-12 weeks. Discovery scanning takes 1-2 weeks for a target of moderate complexity.

The sequencing:

Week 1: Get access credentials or read-only integration points. Deploy ACQI across Azure, AWS, Active Directory, and M365 simultaneously. Run all 124 discovery modules.

Week 2: First-pass data review. What does the identity inventory look like? What’s the application count? Are there obvious red flags — excessive privileged accounts, expired security certificates, unknown SaaS subscriptions?

Week 2-3: Cross-reference discovery findings with the seller’s data room responses. Where does the questionnaire match the scan data? Where does it diverge? The divergences are your risk areas.

Week 3-4: Produce the technical risk register. Map findings to integration complexity, remediation cost, and regulatory risk. This becomes the input to integration planning.

The Output That Actually Helps

The goal is not a slide deck. The goal is a structured data asset that integration teams can query and act on.

ACQI’s discovery outputs include:

Identity inventory: Complete user list, privilege levels, account status, and cross-tenant mapping for consolidated identities
Application register: Every application, license status, business criticality, data classification, and migration path
Infrastructure map: Cloud resource inventory, network topology, security configuration status
Risk heatmap: Findings ranked by exploitability, business impact, and remediation cost

This data feeds directly into the migration planner. Wave plans are built from the application register. Go/No-Go gates use the security findings. Synergy tracking starts from the license inventory.

Discovery data is not a deliverable. It’s infrastructure for every decision that follows.

Running a due diligence process now? ACQI can scan a target environment in days. Request a discovery sprint →

Data Estate Due Diligence: The M&A Discovery Checklist That Actually Works

Data Estate Due Diligence: The M&A Discovery Checklist That Actually Works

Why Questionnaires Fail

The Discovery Checklist That Matters

Identity and Access

Cloud Infrastructure

Application Portfolio

Technical Debt

How to Get This Data in a Due Diligence Timeline

The Output That Actually Helps

Running an integration right now?