Pipeline Troubleshooting¶
Common Azure DevOps pipeline issues and solutions for Forge projects.
📋 Overview¶
This guide covers common pipeline failures in Azure DevOps and how to diagnose and resolve them.
🏗️ Pipeline Architecture¶
Forge Pipeline Structure¶
.azdo/
├── azure-pipelines-api.yml # Main API deployment pipeline
├── azure-pipelines-api-pr.yml # API PR validation pipeline
├── azure-pipelines-web.yml # Web app deployment pipeline
├── azure-pipelines-web-pr.yml # Web PR validation pipeline
├── azure-pipelines-auth.yml # Auth configuration pipeline
├── azure-pipelines-auth-pr.yml # Auth PR validation pipeline
├── azure-pipelines-bootstrap.yml # Infrastructure bootstrap
└── vars/
├── base.yml # Shared variables
├── api.yml # API-specific variables
├── web.yml # Web-specific variables
└── auth.yml # Auth-specific variables
Pipeline Templates¶
Forge uses centralized templates from SAIF/pipeline-templates:
| Template | Purpose |
|---|---|
azure-dotnet-api-v3.yml |
.NET API build and deploy |
azure-react-web-v3.yml |
React web app build and deploy |
azure-auth.yml |
Auth configuration deployment |
azure-terraform.yml |
Infrastructure deployment |
❌ Common Pipeline Failures¶
1. Template Reference Errors¶
Error: Template reference not found or Unable to find template
Cause: Missing or incorrect ref for templates repository
Solution:
# Correct template reference
resources:
repositories:
- repository: templates
type: git
name: SAIF/pipeline-templates
ref: refs/heads/releases/v3 # Must specify version
2. Variable Group Access¶
Error: Variable group 'X' is not authorized for use
Cause: Pipeline not authorized to access variable group
Solution:
- Go to Azure DevOps → Library → Variable Groups
- Click on the variable group
- Go to "Pipeline permissions"
- Authorize the pipeline
3. Agent Pool Issues¶
Error: No agent pool found or No hosted parallelism
Cause: Agent pool not available or parallelism quota exhausted
Solution:
# Use hosted agent
pool:
vmImage: 'ubuntu-latest'
# Or specific agent pool
pool:
name: 'SAIF-AgentPool'
4. .NET SDK Not Found¶
Error: SDK 'Microsoft.NET.Sdk' not found
Cause: .NET SDK version not installed on agent
Solution:
# Ensure UseDotNet task runs first
- task: UseDotNet@2
inputs:
version: '10.x'
includePreviewVersions: true
5. Node.js Version Mismatch¶
Error: node: /lib/x86_64-linux-gnu/libc.so.6: version 'GLIBC_X.XX' not found
Cause: Node.js version incompatible with agent OS
Solution:
6. Docker Build Failures¶
Error: Cannot connect to Docker daemon
Cause: Docker service not running or insufficient permissions
Solution:
- Verify agent has Docker installed
- Check service account has Docker permissions
- Use hosted agent with Docker pre-installed
7. Terraform State Lock¶
Error: Error locking state or state is locked by another process
Cause: Previous pipeline run crashed without releasing lock
Solution:
8. Artifact Publishing Failures¶
Error: Failed to publish artifact
Cause: Artifact path doesn't exist or permissions issue
Solution:
# Ensure build produces artifacts
- task: PublishBuildArtifacts@1
inputs:
pathToPublish: '$(Build.ArtifactStagingDirectory)'
artifactName: 'drop'
🔍 Diagnostic Approaches¶
1. Enable Verbose Logging¶
2. Check Pipeline Logs¶
Look for these key sections:
- Initialize job - Agent setup issues
- Checkout - Repository access issues
- Build tasks - Compilation errors
- Test tasks - Test failures
- Deploy tasks - Deployment errors
3. Run Locally¶
Reproduce the issue locally:
# Simulate pipeline environment
$env:BUILD_BUILDNUMBER = "1.0.0"
$env:BUILD_SOURCESDIRECTORY = (Get-Location).Path
# Run the same commands
dotnet build
dotnet test
4. Check Terraform State¶
📊 Stage-Specific Issues¶
Build Stage¶
| Issue | Symptom | Resolution |
|---|---|---|
| Restore fails | NU1101: Unable to find package |
Check nuget.config, verify feed access |
| Build fails | CSxxxx error |
Fix code issue, check package versions |
| Test fails | XUnit test failed |
Review test output, check test dependencies |
Deploy Stage¶
| Issue | Symptom | Resolution |
|---|---|---|
| Slot swap fails | Slot busy |
Retry or check App Service status |
| Config update fails | KeyVault access denied |
Check managed identity permissions |
| Health check fails | 503 Service Unavailable |
Check app startup, review logs |
Auth Stage¶
| Issue | Symptom | Resolution |
|---|---|---|
| Okta API error | 401 Unauthorized |
Rotate Okta API credentials |
| Entra ID error | Insufficient privileges |
Check service principal permissions |
| Role assignment fails | Principal not found |
Verify user/group exists |
🌍 Environment-Specific Issues¶
Development (DEV)¶
- More permissive, may have different variable values
- Uses non-prod Okta tenant
- Terraform workspaces suffixed with
-dev
UAT¶
- Mirrors production configuration
- May have restricted access
- Requires approval gates
Production (PROD)¶
- Strict approval requirements
- Uses production Okta tenant
- Blue-green deployment slots
- Extended health check timeouts
🔄 Recovery Patterns¶
Failed Deployment Rollback¶
# Pipeline includes rollback logic
- task: AzureFunctionApp@1
inputs:
deployToSlotOrASE: true
slotName: 'staging'
# If health check fails, no slot swap occurs
Terraform State Recovery¶
# Import existing resource
terraform import azurerm_storage_account.main /subscriptions/.../storageAccounts/xxx
# Remove orphaned state
terraform state rm azurerm_storage_account.old
Retry Failed Stage¶
- Go to pipeline run
- Click on failed stage
- Click "Rerun failed jobs"
📝 Pipeline Variables Reference¶
Built-in Variables¶
| Variable | Description |
|---|---|
$(Build.BuildNumber) |
Pipeline build number |
$(Build.SourceBranch) |
Git branch |
$(Build.Repository.Name) |
Repository name |
$(System.DefaultWorkingDirectory) |
Agent working directory |
Forge Variables¶
| Variable | Description |
|---|---|
$(ProjectId) |
Forge project identifier |
$(Environment) |
Deployment environment |
$(IsProduction) |
Boolean for prod checks |
🛡️ Prevention Strategies¶
1. Pin Versions¶
# Pin .NET version
- task: UseDotNet@2
inputs:
version: '10.0.x'
# Pin Node version
- task: UseNode@1
inputs:
version: '22.x'
2. Use Lock Files¶
packages.lock.jsonfor NuGetpackage-lock.jsonfor npm.terraform.lock.hclfor Terraform
3. Validate Before Deploy¶
# Add validation stage
- stage: Validate
jobs:
- job: ValidateTerraform
steps:
- script: terraform validate
4. Health Checks¶
# Configure deployment health checks
- task: AzureWebApp@1
inputs:
healthCheckPath: '/health'
healthCheckTimeout: '300'
📚 Related Resources¶
- Aspire Publish - Generate pipeline YAML from your AppHost instead of maintaining it by hand
- Aspire Troubleshooting - Debug Aspire startup issues
- Azure DevOps Pipeline Documentation
- Azure DevOps Services Reference
- Forge Pipeline Templates