Azure infrastructure integrity
All software components installed in the Azure environment are custom built. This, of course, refers to software installed and managed by Microsoft as part of Azure Service Fabric. Custom software is built using Microsoft's Security Development Lifecycle (SDL) process, including operating system images and SQL databases. All software deployment is conducted as part of the strictly defined change management and release management process. All nodes and fabric controllers use customized versions of Windows Server 2019. The installation of any unauthorized software is not allowed.
VMs running in Azure are grouped into clusters. Each cluster contains around 1,000 VMs. All VMs are managed by the Fabric Controller (FC). The FC is scaled out and redundant. Each FC is responsible for the life cycle management of applications running in its own cluster. This includes the provisioning and monitoring of hardware in that cluster. If any server fails, the FC automatically rebuilds a new instance of that server.
Each Azure software component undergoes a build process (as part of the release management process) that includes virus scans using endpoint protection anti-virus tools. As each software component undergoes this process, nothing goes to production without a clean-virus scan. During the release management process, all components go through a build process. During this process, an anti-virus scan is performed. Each virus scan creates a log in the build directory and, if any issues are detected, the process for this component is frozen. Any software components for which the issue is detected undergo inspection by Microsoft security teams in order to detect the exact issue.
Azure is a closed and locked-down environment. All nodes and guest VMs have their default Windows administrator account disabled. No user accounts are created directly on any of the nodes or guest VMs as well. Administrators from Azure support can connect to them only with proper authorization to perform maintenance tasks and emergency repairs.
With all precautions taken to provide maximum availability and security, incidents may occur from time to time. To detect these issues and mitigate them as soon as possible, Microsoft implemented monitoring and incident management.