Microsoft Azure: Virtual Machines and Managed Identity

Incident Summary

Between 19:46 UTC on February 2, 2026 and 06:05 UTC on February 3, 2026, Microsoft Azure experienced a critical multi-service outage affecting compute and identity services across multiple regions.

Field	Details
Duration	10 hours 19 minutes
Root Cause	Configuration change restricting storage access
Impact	Global, multiple regions
Services Down	VM, VMSS, AKS, DevOps, GitHub Actions, Managed Identities

Timeline

Phase 1: VM and Compute Failure (Feb 2, 19:46 - Feb 3, 00:30 UTC)

Time (UTC)	Event
Feb 2, 19:46	Reports begin surfacing of Azure VM connectivity issues
Feb 2, 21:30	Microsoft confirms "a subset of customers" affected
Feb 3, 00:00	Scope expands to include AKS, VMSS, and identity services
Feb 3, 00:30	Recovery begins as configuration rollback is initiated

Root Cause: A configuration change inadvertently restricted public access to Microsoft-managed storage accounts used to host virtual machine extension packages.

Phase 2: Managed Identities Overload (Feb 3, 00:10 - 06:05 UTC)

Time (UTC)	Event
Feb 3, 00:10	Surge of queued operations overwhelms Managed Identities platform
Feb 3, 02:00	East US and West US identity services experiencing failures
Feb 3, 04:30	Identity platform capacity scaled up
Feb 3, 06:05	Full recovery confirmed across all regions

Root Cause: When VM services recovered, a backlog of queued identity operations created a surge that overwhelmed the Managed Identities platform in East US and West US.

Services Impacted

Azure Virtual Machines

Failed to start or provision
Lost connectivity to storage
Extension installations failed

Virtual Machine Scale Sets (VMSS)

Auto-scaling operations failed
New instance provisioning blocked
Health probe failures

Azure Kubernetes Service (AKS)

Pod scheduling disrupted
Node provisioning failed
Cluster operations timed out

Azure DevOps & GitHub Actions

Pipeline runs failed
Azure-hosted runners unavailable
Build and deployment operations blocked

Azure Managed Identities

Authentication failures
Token acquisition timeouts
Identity-dependent operations failed

Impact Assessment

Global Impact: Customers across all Azure regions experienced service disruptions, though East US and West US were most severely affected during the identity overload phase.

Business Impact:

CI/CD pipelines halted
Auto-scaling failed during peak traffic
New deployments blocked for 10+ hours
Identity-based authentication unavailable

Microsoft's Response

Microsoft's incident response included:

Immediate Rollback — Configuration change reverted within 5 hours of detection
Capacity Scaling — Managed Identities platform scaled up to handle surge
Customer Communication — Status updates via Azure Service Health Dashboard
RCA Pending — Full Root Cause Analysis to be published

Lessons Learned

For Azure Customers

Multi-Region Deployments — Avoid concentrating critical workloads in a single region
Failover Planning — Have documented failover procedures for cloud outages
Status Monitoring — Implement automated monitoring of Azure Service Health
SLA Review — Check Azure SLA for service credit eligibility

For Cloud Providers

This incident highlights the cascading failure risk when core infrastructure (storage access) affects compute services, which then creates surge load on identity services during recovery.

Current Status

Resolved: All services fully recovered as of 06:05 UTC on February 3, 2026.

Microsoft is conducting a full Root Cause Analysis and will publish detailed findings. Customers who experienced SLA violations should file service credit requests through the Azure Portal.

References