Skip to content

Maintenance Configurations

Overview

Azure Maintenance Configurations provide automated patch management for VMs through scheduled maintenance windows. This implementation uses dynamic scope assignments to automatically target VMs based on filters (tags, resource groups, locations), eliminating the need to explicitly list individual VMs.

The system integrates Event Grid, Automation Runbooks, and Azure Resource Graph to orchestrate pre- and post-maintenance workflows, including automated VM power state management. VM discovery within runbooks is performed dynamically at runtime using Azure Resource Graph queries based on maintenance correlation IDs.

Module Structure

The patch automation system consists of several integrated modules:

  • maintenance_configuration - Defines maintenance schedules, patching rules, and recurrence patterns
  • maintenance_assignment_dynamic_scope - Automatically targets VMs using tag-based filters (no explicit VM lists required)
  • eventgrid_system_topic - Creates Event Grid topics that emit pre/post maintenance events
  • eventgrid_event_subscription - Routes maintenance events to automation webhooks with configurable event types
  • automation_account - Provides system-assigned managed identity for secure runbook execution
  • automation_runbook - PowerShell scripts using Azure Resource Graph for dynamic VM discovery and power management
  • automation_webhook - HTTP endpoints that receive Event Grid events and trigger runbooks

Note: This automation requires a custom RBAC role with specific permissions for VM power management and tagging.

Architecture Overview

The patch automation workflow operates as follows:

  1. Maintenance Configuration - Defines when patching occurs (schedule, duration, recurrence, patch classifications)
  2. Dynamic Scope Assignments - Azure automatically discovers target VMs at runtime using tag filters (e.g., PatchGroup = "1")
  3. Event Grid System Topics - Emit events when maintenance windows start and complete
  4. Event Grid Event Subscriptions - Route PreMaintenanceEvent and PostMaintenanceEvent to webhooks
  5. Automation Webhooks - Receive events with maintenance correlation IDs and trigger runbooks
  6. Automation Runbooks - Execute PowerShell scripts with managed identity that:
    • Parse Event Grid payloads to extract correlation IDs
    • Query Azure Resource Graph to dynamically find VMs in the maintenance window
    • Require custom RBAC role with VM power management and tagging permissions

Usage

1. Configure Patch Group Tagging on VMs

First, enable patch group tagging on your Windows VMs:

windows_vms = {
  hsw = {
    names = ["azwu2nhsw001", "azwu2nhsw002"]
    patch_group_tagging = true      # Enable patch group tagging
    patch_group_tag = "PatchGroup"  # Tag key (default: "PatchGroup")
    # ... other VM configuration
  }
}

This will automatically tag VMs with:

  • PatchGroup = "1" for VMs ending in odd numbers (001, 003, etc.)
  • PatchGroup = "2" for VMs ending in even numbers (002, 004, etc.)

2. Create Maintenance Configurations

Define maintenance schedules in your terraform.tfvars:

maintenance_configurations = {
  patch_tuesday = {
    resource_group = "hsw"
    window = {
      start_date_time = "2026-02-10 02:00"  # Second Tuesday of the month
      duration = "03:00"                    # 3 hour window
      recur_every = "1Month Second Tuesday" # Monthly recurrence on Second Tuesday
    }
    install_patches = {
      reboot = "IfRequired"
      windows = [{
        classifications_to_include = ["Critical", "Security", "UpdateRollup", "ServicePack"]
      }]
    }
  }

  patch_thursday = {
    resource_group = "hsw"
    window = {
      start_date_time = "2026-02-11 02:00"
      duration = "03:00"
      recur_every = "1Month Second Thursday"
    }
    install_patches = {
      reboot = "IfRequired"
      windows = [{
        classifications_to_include = ["Critical", "Security", "UpdateRollup"]
      }]
    }
  }
}

3. Create Dynamic Scope Assignments

Create dynamic assignments that automatically target VMs based on filters:

dynamic_scope_assignments = {
    patch_group_1 = {
        maintenance_configuration = "patch_tuesday"
        tag_filters = [
            {
                tag = "PatchGroup"
                values = ["1"]
            },
            {
                tag = "environment"
                values = ["dev"]
            }
        ]
    }
    patch_group_2 = {
        maintenance_configuration = "patch_thursday"
        tag_filters = [
            {
                tag = "PatchGroup"
                values = ["2"]
            },
            {
                tag = "environment"
                values = ["dev"]
            }
        ]
    }
}

4. Create Automation Account

Create an automation account with a system-assigned managed identity:

automation_account = {
  maintenance = {
    resource_group = "hsw"
    sku_name = "Basic"
    identity = {
      type = "SystemAssigned"
    }
    tags = {}
  }
}

5. Create Automation Runbooks

Define PowerShell runbooks that respond to maintenance events:

automation_runbook = { 
    pre_maintenance_power_on = {
        automation_account = "maintenance"
        runbook_type = "PowerShell"
        description = "Pre-maintenance task - Power on VMs"
        content = <<-EOT
            param(
                [object]$WebhookData
            )

            # Parse correlation ID from webhook input
            if ($WebhookData -is [string]) {
                # Direct string input for manual runs/testing
                $CorrelationId = $WebhookData
            } else {
                # Event Grid webhook - RequestBody is a JSON string
                $events = $WebhookData.RequestBody | ConvertFrom-Json
                $CorrelationId = $events[0].data.CorrelationId
            }

            Write-Output "Starting pre-maintenance power on for correlation ID: $CorrelationId"

            # Connect to Azure and get access token
            Connect-AzAccount -Identity | Out-Null
            $resource = "https://management.azure.com/"
            $tokenAuthUri = $env:IDENTITY_ENDPOINT + "?resource=$resource&api-version=2019-08-01"
            $tokenResponse = Invoke-RestMethod -Headers @{"X-IDENTITY-HEADER" = $env:IDENTITY_HEADER } -Method GET -Uri $tokenAuthUri
            $accessToken = $tokenResponse.access_token

            # Query for powered off VMs in this maintenance run
            $Query = @"
maintenanceresources
| where properties.correlationId =~ '$CorrelationId'
| where type =~ 'microsoft.maintenance/applyupdates'
| extend targetResourceId=tostring(properties.resourceId)
| extend targetResourceIdLower=tolower(targetResourceId)
| join kind=inner (
    resources
    | where type =~ 'microsoft.compute/virtualmachines'
    | extend powerState = tostring(properties.extended.instanceView.powerState.code)
    | extend idLower = tolower(id)
) on `$left.targetResourceIdLower == `$right.idLower
| where powerState !~ 'PowerState/running'
| extend vmName = tostring(split(targetResourceId, '/')[-1])
| extend vmResourceGroup = tostring(split(targetResourceId, '/')[4])
| project id=targetResourceId, name=vmName, resourceGroup=vmResourceGroup
"@

            $argUrl = "https://management.azure.com/providers/Microsoft.ResourceGraph/resources?api-version=2021-03-01"
            $body = @{ query = $Query } | ConvertTo-Json
            $response = Invoke-RestMethod -Method POST -Uri $argUrl -Headers @{ Authorization = "Bearer $accessToken" } -ContentType "application/json" -Body $body

            # Retrieve VMs from response
            $VMs = @($response.data)

            if (-not $VMs) {
                Write-Output "No powered off VMs found for this maintenance run"
                exit
            }

            Write-Output "Found $($VMs.Count) powered off VMs to start"

            $Timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"

            # Start all VMs
            foreach ($VM in $VMs) {
                Write-Output "Starting VM: $($VM.name) in $($VM.resourceGroup)"
                Start-AzVM -ResourceGroupName $VM.resourceGroup -Name $VM.name -NoWait
            }

            # Tag all VMs to track they were powered on by automation
            foreach ($VM in $VMs) {
                Update-AzTag -ResourceId $VM.id -Tag @{AutomationPoweredOn = $Timestamp} -Operation Merge
                Write-Output "Tagged $($VM.name) with AutomationPoweredOn=$Timestamp"
            }

            Write-Output "Pre-maintenance power on completed"
        EOT
    }

    post_maintenance_power_off = {
        automation_account = "maintenance"
        runbook_type = "PowerShell"
        description = "Post-maintenance task - Power off VMs that were powered on by automation"
        content = <<-EOT
            param(
                [object]$WebhookData
            )

            # Parse correlation ID from webhook input
            if ($WebhookData -is [string]) {
                # Direct string input for manual runs/testing
                $CorrelationId = $WebhookData
            } else {
                # Event Grid webhook - RequestBody is a JSON string
                $events = $WebhookData.RequestBody | ConvertFrom-Json
                $CorrelationId = $events[0].data.CorrelationId
            }

            Write-Output "Starting post-maintenance power off for correlation ID: $CorrelationId"

            # Connect to Azure and get access token
            Connect-AzAccount -Identity | Out-Null
            $resource = "https://management.azure.com/"
            $tokenAuthUri = $env:IDENTITY_ENDPOINT + "?resource=$resource&api-version=2019-08-01"
            $tokenResponse = Invoke-RestMethod -Headers @{"X-IDENTITY-HEADER" = $env:IDENTITY_HEADER } -Method GET -Uri $tokenAuthUri
            $accessToken = $tokenResponse.access_token

            # Query for running VMs with AutomationPoweredOn tag in this maintenance run
            $Query = @"
maintenanceresources
| where properties.correlationId =~ '$CorrelationId'
| where type =~ 'microsoft.maintenance/applyupdates'
| extend targetResourceId=tostring(properties.resourceId)
| extend targetResourceIdLower=tolower(targetResourceId)
| join kind=inner (
    resources
    | where type =~ 'microsoft.compute/virtualmachines'
    | extend powerState = tostring(properties.extended.instanceView.powerState.code)
    | extend automationPoweredOn = tostring(tags.AutomationPoweredOn)
    | extend idLower = tolower(id)
) on `$left.targetResourceIdLower == `$right.idLower
| where powerState =~ 'PowerState/running'
| where isnotempty(automationPoweredOn)
| extend vmName = tostring(split(targetResourceId, '/')[-1])
| extend vmResourceGroup = tostring(split(targetResourceId, '/')[4])
| project id=targetResourceId, name=vmName, resourceGroup=vmResourceGroup
"@

            $argUrl = "https://management.azure.com/providers/Microsoft.ResourceGraph/resources?api-version=2021-03-01"
            $body = @{ query = $Query } | ConvertTo-Json
            $response = Invoke-RestMethod -Method POST -Uri $argUrl -Headers @{ Authorization = "Bearer $accessToken" } -ContentType "application/json" -Body $body

            # Retrieve VMs from response
            $VMs = @($response.data)

            if (-not $VMs) {
                Write-Output "No running VMs with AutomationPoweredOn tag found for this maintenance run"
                exit
            }

            Write-Output "Found $($VMs.Count) VMs to power off"

            # Stop all VMs
            foreach ($VM in $VMs) {
                Write-Output "Stopping VM: $($VM.name) in $($VM.resourceGroup)"
                Stop-AzVM -ResourceGroupName $VM.resourceGroup -Name $VM.name -Force -NoWait
            }

            # Remove AutomationPoweredOn tag from all VMs
            foreach ($VM in $VMs) {
                Update-AzTag -ResourceId $VM.id -Tag @{AutomationPoweredOn = ""} -Operation Delete
                Write-Output "Removed AutomationPoweredOn tag from $($VM.name)"
            }

            Write-Output "Post-maintenance power off completed"
        EOT
    }
}

6. Create Automation Webhooks

Create webhooks that will be called by Event Grid:

automation_webhook = {
  pre_maintenance_webhook = {
    automation_account = "maintenance"
    runbook = "pre_maintenance_power_on"
    expiry_time = "2036-01-25T00:00:00Z"
    enabled = true
  }

  post_maintenance_webhook = {
    automation_account = "maintenance"
    runbook = "post_maintenance_power_off"
    expiry_time = "2036-01-25T00:00:00Z"
    enabled = true
  }
}

7. Create Event Grid System Topic

Create an Event Grid system topic for the maintenance configuration:

eventgrid_system_topic = {
  maintenance_events = {
    resource_group = "hsw"
    maintenance_configuration = "patch_tuesday"  # References your maintenance configuration
    topic_type = "Microsoft.Maintenance.MaintenanceConfigurations"
  }
}

8. Create Event Grid Event Subscriptions

Subscribe to maintenance events and route them to automation webhooks:

eventgrid_event_subscription = {
  pre_maintenance_sub = {
    system_topic = "maintenance_events"
    event_delivery_schema = "EventGridSchema"
    included_event_types = ["Microsoft.Maintenance.PreMaintenanceEvent"]
    webhook_endpoint = {
      automation_webhook = "pre_maintenance_webhook"
    }
  }

  post_maintenance_sub = {
    system_topic = "maintenance_events"
    event_delivery_schema = "EventGridSchema"
    included_event_types = ["Microsoft.Maintenance.PostMaintenanceEvent"]
    webhook_endpoint = {
      automation_webhook = "post_maintenance_webhook"
    }
  }
}

9. Create Custom RBAC Role and Assign to Automation Account

The automation account's managed identity requires specific permissions to manage VMs. Create a custom Azure role with the following permissions:

Required Permissions:

{
  "actions": [
    "Microsoft.Compute/virtualMachines/start/action",
    "Microsoft.Compute/virtualMachines/powerOff/action",
    "Microsoft.Compute/virtualMachines/deallocate/action",
    "Microsoft.Compute/virtualMachines/read",
    "Microsoft.Resources/tags/write",
    "Microsoft.Resources/tags/delete",
    "Microsoft.ResourceGraph/resources/read",
    "Microsoft.Maintenance/applyUpdates/read",
    "Microsoft.Resources/subscriptions/resourceGroups/read"
  ]
}

Event Flow Diagram

Maintenance Window Starts
Pre-Maintenance Event Grid (40 minutes prior to scheduled maintenance)
Pre-Maintenance Webhook Triggered
Pre-Maintenance Runbook Executes
Azure Performs Maintenance (Patching)
Post-Maintenance Event Grid (15 minutes after maintenance completes)
Post-Maintenance Webhook Triggered
Post-Maintenance Runbook Executes

Variable Reference

maintenance_configurations

Field Type Description Default
resource_group string Resource group for the configuration Required
scope string Maintenance scope "InGuestPatch"
visibility string Configuration visibility "Custom"
window.start_date_time string Start date/time in ISO 8601 format Required
window.duration string Maintenance window duration "02:00"
window.time_zone string Time zone for the schedule Defaults to global timezone
window.recur_every string Recurrence pattern Required
tags map(string) Resource tags {}

dynamic_scope_assignments

Field Type Description Default
maintenance_configuration string Maintenance configuration key Required
resource_types list(string) Resource types to target ["microsoft.compute/virtualmachines"]
resource_groups list(string) Resource groups to include null (all)
locations list(string) Azure regions to include Defaults to global location
tag_filters list(object) Tag-based filtering null
tag_filters[].tag string Tag name to filter by Required
tag_filters[].values list(string) Tag values to match Required

automation_account

Field Type Description Default
resource_group string Resource group key Required
sku_name string SKU name "Basic"
identity.type string Managed identity type "SystemAssigned"
tags map(string) Resource tags {}

automation_runbook

Field Type Description Default
automation_account string Automation account key Required
runbook_type string Runbook type "PowerShell", "PowerShell72", "Python3", etc.
log_progress bool Enable progress logging true
log_verbose bool Enable verbose logging true
description string Runbook description null
content string Inline PowerShell script content null
publish_content_link object External script source null
tags map(string) Resource tags {}

automation_webhook

Field Type Description Default
automation_account string Automation account key Required
runbook string Runbook key Required
expiry_time string Webhook expiry (ISO 8601) "2036-01-25T00:00:00Z"
enabled bool Enable webhook true
parameters map(string) Static parameters to pass to runbook null

eventgrid_system_topic

Field Type Description Default
resource_group string Resource group key Required
maintenance_configuration string Maintenance configuration key null
source_resource_id string Direct Azure resource ID null
topic_type string Event Grid topic type "Microsoft.Maintenance.MaintenanceConfigurations"
tags map(string) Resource tags {}

Note: Use either maintenance_configuration (module reference) or source_resource_id (direct ARM ID).

eventgrid_event_subscription

Field Type Description Default
system_topic string Event Grid system topic key Required
event_delivery_schema string Event schema format "EventGridSchema"
included_event_types list(string) Event types to subscribe to null (all)
webhook_endpoint.automation_webhook string Automation webhook key null
webhook_endpoint.url string Direct webhook URL null
labels list(string) Subscription labels null

Common Event Types:

  • Microsoft.Maintenance.PreMaintenanceEvent - Triggered before maintenance starts
  • Microsoft.Maintenance.PostMaintenanceEvent - Triggered after maintenance completes

Azure Resource Graph Queries

The automation runbooks use Azure Resource Graph to query VMs affected by maintenance events. Here are the key queries used:

Find VMs in Maintenance Window

maintenanceresources
| where type == 'microsoft.maintenance/applyupdates'
| where properties.correlationId == '<CORRELATION_ID>'
| project resourceId = tostring(properties.resourceId)
| join kind=inner (
    resources
    | where type == 'microsoft.compute/virtualmachines'
    | project resourceId = tolower(id), vmName = name, resourceGroup
) on resourceId
| project vmName, resourceGroup

This query:

  1. Queries the maintenanceresources table for apply updates matching the correlation ID
  2. Joins with the resources table to get VM details
  3. Returns VM names and resource groups

Naming Convention

Resources are named using the standard pattern:

  • Maintenance Configuration: {prefix}{key}{suffix} (e.g., prod-patch_tuesday-eastus2-mc)
  • Maintenance Assignment: {prefix}{key}{suffix} (e.g., prod-patch_group_1-eastus2-ma)
  • Automation Account: {prefix}{key}{suffix} (e.g., prod-maintenance-eastus2-aa)
  • Automation Runbook: {prefix}{key}{suffix} (e.g., prod-pre_maintenance-eastus2-runbook)
  • Automation Webhook: {prefix}{key}{suffix} (e.g., prod-pre_maintenance-eastus2-webhook)
  • Event Grid System Topic: {prefix}{key}{suffix} (e.g., prod-maintenance_events-eastus2-egst)
  • Event Grid Event Subscription: {prefix}{key}{suffix} (e.g., prod-pre_maintenance_sub-eastus2-eges)

Add to your name_prefixes and name_suffixes:

name_prefixes = {
  maintenance_configuration = "prod-"
  maintenance_assignment = "prod-"
  automation_account = "prod-"
  automation_runbook = "prod-"
  automation_webhook = "prod-"
  eventgrid_system_topic = "prod-"
  eventgrid_event_subscription = "prod-"
}

name_suffixes = {
  maintenance_configuration = "-eastus2-mc"
  maintenance_assignment = "-eastus2-ma"
  automation_account = "-eastus2-aa"
  automation_runbook = "-eastus2-runbook"
  automation_webhook = "-eastus2-webhook"
  eventgrid_system_topic = "-eastus2-egst"
  eventgrid_event_subscription = "-eastus2-eges"
}