Your AVD deployment is already implementing many best practices! This document outlines additional enhancements to achieve enterprise-grade production readiness.
| Feature | Status | Implementation |
|---|---|---|
| Modular Architecture | β Complete | 6 specialized modules for maintainability |
| Network Flexibility | β Complete | 3 deployment scenarios (new/existing VNet/subnet) |
| Availability Zones | β Complete | Session hosts distributed across zones |
| Monitoring | β Complete | Log Analytics with VM Insights agents |
| Auto-Shutdown | β Complete | Cost optimization for dev/test |
| Private Endpoints | β Complete | Secure storage connectivity |
| Key Vault | β Complete | Secure credential management |
| Scaling Plans | β Complete | Dynamic capacity management |
| Domain Join Options | β Complete | Both Azure AD and AD DS support |
| Simplified Deployment | β Complete | Deploy to Azure button + wizard |
Why: Protect against data loss, accidental deletion, or ransomware.
Implementation: New backup module created
Usage in main.tf:
module "backup" {
source = "./modules/backup"
enable_backup = var.enable_backup
recovery_vault_name = "${local.resource_prefix}-rsv"
resource_group_name = azurerm_resource_group.main.name
location = var.location
# Backup schedule
backup_frequency = "Daily"
backup_time = "02:00"
backup_timezone = var.timezone
# Retention
daily_retention_count = 30
weekly_retention_count = 12
monthly_retention_count = 12
yearly_retention_count = 5
# VMs to protect
vm_ids = module.compute.vm_ids
tags = local.common_tags
}
Benefits:
Why: Standardize session host images, accelerate deployments, ensure consistency.
Implementation: New image_gallery module created
Usage in main.tf:
module "image_gallery" {
source = "./modules/image_gallery"
enable_image_gallery = var.enable_image_gallery
gallery_name = replace("${local.resource_prefix}_sig", "-", "_")
resource_group_name = azurerm_resource_group.main.name
location = var.location
create_win11_definition = true
create_win10_definition = false
# Grant CI/CD pipeline access
image_builder_principal_id = var.image_builder_principal_id
tags = local.common_tags
}
Update compute module to use custom images:
# In modules/compute/variables.tf, add:
variable "custom_image_id" {
description = "Custom image ID from Shared Image Gallery"
type = string
default = ""
}
# In modules/compute/main.tf, modify source_image_reference:
dynamic "source_image_reference" {
for_each = var.custom_image_id == "" ? [1] : []
content {
publisher = "MicrosoftWindowsDesktop"
offer = "Windows-11"
sku = "win11-22h2-avd"
version = "latest"
}
}
source_image_id = var.custom_image_id != "" ? var.custom_image_id : null
Benefits:
Why: Enforce compliance, security standards, and operational best practices.
Implementation: New policy module created
Usage in main.tf:
module "policy" {
source = "./modules/policy"
enable_policies = var.enable_policies
resource_group_id = azurerm_resource_group.main.id
location = var.location
# Governance
require_environment_tag = true
allowed_vm_sizes = var.allowed_vm_sizes
# Security
deploy_antimalware = true
audit_disk_encryption = true
# Monitoring
enable_vm_diagnostics = true
log_analytics_workspace_id = module.monitoring.workspace_id
}
Policies Enforced:
Benefits:
Why: User profiles contain critical data and settings.
Implementation: Add to storage module
Add to modules/storage/main.tf:
# Backup vault for storage account
resource "azurerm_data_protection_backup_vault" "main" {
count = var.enable_fslogix_backup ? 1 : 0
name = "${var.storage_account_name}-bvault"
resource_group_name = var.resource_group_name
location = var.location
datastore_type = "VaultStore"
redundancy = "LocallyRedundant"
tags = var.tags
}
# Backup policy for Azure Files
resource "azurerm_data_protection_backup_policy_blob_storage" "main" {
count = var.enable_fslogix_backup ? 1 : 0
name = "fslogix-backup-policy"
vault_id = azurerm_data_protection_backup_vault.main[0].id
retention_duration = "P30D"
}
# Backup instance for storage account
resource "azurerm_data_protection_backup_instance_blob_storage" "main" {
count = var.enable_fslogix_backup ? 1 : 0
name = "${var.storage_account_name}-backup"
vault_id = azurerm_data_protection_backup_vault.main[0].id
location = var.location
storage_account_id = azurerm_storage_account.main.id
backup_policy_id = azurerm_data_protection_backup_policy_blob_storage.main[0].id
}
Benefits:
Strategy: Deploy updates in waves to minimize risk.
Implementation:
# In modules/compute/variables.tf
variable "update_ring" {
description = "Update ring for patching strategy"
type = string
default = "production"
validation {
condition = contains(["pilot", "testing", "production"], var.update_ring)
error_message = "Update ring must be pilot, testing, or production."
}
}
# In modules/compute/main.tf
resource "azurerm_virtual_machine_extension" "update_management" {
count = var.enable_update_management ? var.vm_count : 0
name = "UpdateManagement"
virtual_machine_id = azurerm_windows_virtual_machine.main[count.index].id
publisher = "Microsoft.SoftwareUpdateManagement"
type = "WindowsOSUpdateExtension"
type_handler_version = "1.0"
settings = jsonencode({
updateRing = var.update_ring
autoUpdateEnabled = true
maintenanceWindow = var.update_ring == "pilot" ? "Sunday 02:00" : var.update_ring == "testing" ? "Tuesday 02:00" : "Friday 02:00"
})
}
Update Rings:
Benefits:
Add to modules/monitoring/main.tf:
# Cost Management Exports
resource "azurerm_cost_management_export" "daily" {
count = var.enable_cost_exports ? 1 : 0
name = "avd-daily-costs"
resource_group_id = var.resource_group_id
recurrence_type = "Daily"
recurrence_period_start_date = timestamp()
recurrence_period_end_date = timeadd(timestamp(), "8760h") # 1 year
delivery_info {
storage_account_id = var.cost_export_storage_account_id
container_name = "cost-exports"
root_folder_path = "/avd"
}
query {
type = "Usage"
time_frame = "MonthToDate"
}
}
# Workbook for Cost Analysis
resource "azurerm_application_insights_workbook" "cost_analysis" {
count = var.enable_cost_workbook ? 1 : 0
name = "AVD-Cost-Analysis"
resource_group_name = var.resource_group_name
location = var.location
display_name = "AVD Cost Analysis"
data_json = jsonencode({
version = "Notebook/1.0"
items = [
{
type = 3
content = {
version = "KqlItem/1.0"
query = <<-EOT
AzureDiagnostics
| where ResourceType == "DESKTOPS"
| summarize Cost=sum(todouble(Cost_USD)) by bin(TimeGenerated, 1d), ResourceId
| render timechart
EOT
}
}
]
})
tags = var.tags
}
Benefits:
Why: Fresh VMs prevent configuration drift and security issues.
Implementation: Add scheduled replacement
Add to modules/compute/main.tf:
# Random rotation for VM lifecycle
resource "time_rotating" "vm_lifecycle" {
rotation_days = var.vm_replacement_days
}
# Add lifecycle meta-argument to VMs
resource "azurerm_windows_virtual_machine" "main" {
# ... existing config ...
lifecycle {
create_before_destroy = true
replace_triggered_by = [
time_rotating.vm_lifecycle.id
]
}
tags = merge(
var.tags,
{
"ProvisionedDate" = timestamp()
"ReplacementDate" = timeadd(timestamp(), "${var.vm_replacement_days * 24}h")
}
)
}
Recommendation: Replace session hosts every 60-90 days
Benefits:
Strategy: Deploy secondary AVD environment in paired region.
High-Level Architecture:
Primary Region (East US) Secondary Region (West US)
βββ Host Pool (Active) <--> βββ Host Pool (Standby)
βββ Session Hosts (10) βββ Session Hosts (3)
βββ Storage (FSLogix) ---> βββ Storage (Replica)
βββ Workspace βββ Workspace (Hidden)
Implementation Approach:
Option A: Active-Standby
Option B: Active-Active (Geo-Distribution)
Key Components:
# In main.tf
module "avd_secondary" {
source = "./modules/avd"
location = var.secondary_region
resource_group_name = azurerm_resource_group.secondary.name
host_pool_name = "${local.resource_prefix}-secondary-hp"
# Minimal capacity for DR
validation_environment = false
preferred_app_group_type = "Desktop"
tags = merge(local.common_tags, {
Role = "DisasterRecovery"
})
}
# Storage replication
module "storage_secondary" {
source = "./modules/storage"
location = var.secondary_region
resource_group_name = azurerm_resource_group.secondary.name
account_replication_type = "GZRS" # Geo-zone-redundant
tags = local.common_tags
}
Benefits:
Add to modules/security/main.tf:
# AVD User Role Assignment
resource "azurerm_role_assignment" "avd_users" {
count = var.enable_rbac ? 1 : 0
scope = var.workspace_id
role_definition_name = "Desktop Virtualization User"
principal_id = var.avd_users_group_id
}
# AVD Admin Role Assignment
resource "azurerm_role_assignment" "avd_admins" {
count = var.enable_rbac ? 1 : 0
scope = var.resource_group_id
role_definition_name = "Desktop Virtualization Contributor"
principal_id = var.avd_admins_group_id
}
# JIT VM Access (Azure Security Center)
resource "azurerm_security_center_jit_network_access_policy" "main" {
count = var.enable_jit_access ? 1 : 0
name = "avd-jit-policy"
resource_group_name = var.resource_group_name
location = var.location
kind = "Basic"
virtual_machine {
virtual_machine_id = var.vm_id
port {
number = 3389
protocol = "TCP"
max_request_access_duration = "PT3H"
}
}
}
Benefits:
Add to modules/monitoring/main.tf:
# Alert Rule - Session Host Unavailable
resource "azurerm_monitor_metric_alert" "session_host_down" {
count = var.enable_advanced_alerts ? 1 : 0
name = "avd-session-host-unavailable"
resource_group_name = var.resource_group_name
scopes = [var.host_pool_id]
description = "Alert when session host becomes unavailable"
severity = 1
frequency = "PT5M"
window_size = "PT15M"
criteria {
metric_namespace = "Microsoft.DesktopVirtualization/hostpools"
metric_name = "SessionHostHealthCheckFailurePercent"
aggregation = "Average"
operator = "GreaterThan"
threshold = 10
}
action {
action_group_id = azurerm_monitor_action_group.main.id
}
}
# Alert Rule - High CPU Usage
resource "azurerm_monitor_metric_alert" "high_cpu" {
count = var.enable_advanced_alerts ? 1 : 0
name = "avd-high-cpu-usage"
resource_group_name = var.resource_group_name
scopes = var.vm_ids
description = "Alert when CPU exceeds 80%"
severity = 2
frequency = "PT5M"
window_size = "PT30M"
criteria {
metric_namespace = "Microsoft.Compute/virtualMachines"
metric_name = "Percentage CPU"
aggregation = "Average"
operator = "GreaterThan"
threshold = 80
}
action {
action_group_id = azurerm_monitor_action_group.main.id
}
}
# Alert Rule - Storage Low Space
resource "azurerm_monitor_metric_alert" "storage_low_space" {
count = var.enable_advanced_alerts ? 1 : 0
name = "avd-storage-low-space"
resource_group_name = var.resource_group_name
scopes = [var.storage_account_id]
description = "Alert when FSLogix storage exceeds 80%"
severity = 2
frequency = "PT15M"
window_size = "PT1H"
criteria {
metric_namespace = "Microsoft.Storage/storageAccounts"
metric_name = "UsedCapacity"
aggregation = "Average"
operator = "GreaterThan"
threshold = 858993459200 # 800GB (80% of 1TB)
}
action {
action_group_id = azurerm_monitor_action_group.main.id
}
}
# Action Group for Alerts
resource "azurerm_monitor_action_group" "main" {
count = var.enable_advanced_alerts ? 1 : 0
name = "avd-alerts"
resource_group_name = var.resource_group_name
short_name = "avdalerts"
email_receiver {
name = "Admin Email"
email_address = var.alert_email
use_common_alert_schema = true
}
webhook_receiver {
name = "Teams Webhook"
service_uri = var.teams_webhook_url
}
}
Alerts:
| Priority | Feature | Impact | Complexity | Timeline |
|---|---|---|---|---|
| π΄ HIGH | Azure Backup | High | Low | 1 week |
| π΄ HIGH | Shared Image Gallery | High | Medium | 2 weeks |
| π΄ HIGH | FSLogix Backup | High | Low | 1 week |
| π΄ HIGH | Session Host Replacement | Medium | Low | 1 week |
| π΄ HIGH | Advanced Monitoring | High | Medium | 1 week |
| π‘ MEDIUM | Azure Policy | Medium | Low | 1 week |
| π‘ MEDIUM | Update Rings | Medium | Medium | 2 weeks |
| π‘ MEDIUM | Cost Optimization | Medium | Low | 1 week |
| π‘ MEDIUM | RBAC & JIT | Medium | Low | 1 week |
| π‘ MEDIUM | Multi-Region DR | High | High | 4 weeks |
# Backup
variable "enable_backup" {
description = "Enable Azure Backup for session hosts"
type = bool
default = true
}
# Image Gallery
variable "enable_image_gallery" {
description = "Enable Shared Image Gallery"
type = bool
default = false
}
variable "image_builder_principal_id" {
description = "Service principal ID for image building"
type = string
default = ""
}
# Policy
variable "enable_policies" {
description = "Enable Azure Policy governance"
type = bool
default = true
}
variable "allowed_vm_sizes" {
description = "Allowed VM sizes for session hosts"
type = list(string)
default = ["Standard_D2s_v5", "Standard_D4s_v5", "Standard_D8s_v5"]
}
# Monitoring
variable "enable_advanced_alerts" {
description = "Enable advanced monitoring alerts"
type = bool
default = true
}
variable "alert_email" {
description = "Email for alert notifications"
type = string
default = ""
}
# DR
variable "enable_dr" {
description = "Enable disaster recovery in secondary region"
type = bool
default = false
}
variable "secondary_region" {
description = "Azure region for disaster recovery"
type = string
default = "westus2"
}
Your implementation is production-ready with the current features! The recommendations above will take you from good to exceptional by adding:
β
Business Continuity: Backup, DR, session host replacement
β
Operational Excellence: Golden images, update rings, monitoring
β
Security & Compliance: Azure Policy, RBAC, JIT access
β
Cost Optimization: Cost tracking, auto-shutdown, right-sizing
Estimated effort: 6-8 weeks for full implementation
ROI: Reduced downtime, faster recovery, improved compliance, lower costs