Skip to content

Pipeline Troubleshooting

Common Azure DevOps pipeline issues and solutions for Forge projects.


📋 Overview

This guide covers common pipeline failures in Azure DevOps and how to diagnose and resolve them.


🏗️ Pipeline Architecture

Forge Pipeline Structure

.azdo/
├── azure-pipelines-api.yml        # Main API deployment pipeline
├── azure-pipelines-api-pr.yml     # API PR validation pipeline
├── azure-pipelines-web.yml        # Web app deployment pipeline
├── azure-pipelines-web-pr.yml     # Web PR validation pipeline
├── azure-pipelines-auth.yml       # Auth configuration pipeline
├── azure-pipelines-auth-pr.yml    # Auth PR validation pipeline
├── azure-pipelines-bootstrap.yml  # Infrastructure bootstrap
└── vars/
    ├── base.yml                   # Shared variables
    ├── api.yml                    # API-specific variables
    ├── web.yml                    # Web-specific variables
    └── auth.yml                   # Auth-specific variables

Pipeline Templates

Forge uses centralized templates from SAIF/pipeline-templates:

Template Purpose
azure-dotnet-api-v3.yml .NET API build and deploy
azure-react-web-v3.yml React web app build and deploy
azure-auth.yml Auth configuration deployment
azure-terraform.yml Infrastructure deployment

❌ Common Pipeline Failures

1. Template Reference Errors

Error: Template reference not found or Unable to find template

Cause: Missing or incorrect ref for templates repository

Solution:

# Correct template reference
resources:
  repositories:
    - repository: templates
      type: git
      name: SAIF/pipeline-templates
      ref: refs/heads/releases/v3  # Must specify version

2. Variable Group Access

Error: Variable group 'X' is not authorized for use

Cause: Pipeline not authorized to access variable group

Solution:

  1. Go to Azure DevOps → Library → Variable Groups
  2. Click on the variable group
  3. Go to "Pipeline permissions"
  4. Authorize the pipeline

3. Agent Pool Issues

Error: No agent pool found or No hosted parallelism

Cause: Agent pool not available or parallelism quota exhausted

Solution:

# Use hosted agent
pool:
  vmImage: 'ubuntu-latest'

# Or specific agent pool
pool:
  name: 'SAIF-AgentPool'

4. .NET SDK Not Found

Error: SDK 'Microsoft.NET.Sdk' not found

Cause: .NET SDK version not installed on agent

Solution:

# Ensure UseDotNet task runs first
- task: UseDotNet@2
  inputs:
    version: '10.x'
    includePreviewVersions: true

5. Node.js Version Mismatch

Error: node: /lib/x86_64-linux-gnu/libc.so.6: version 'GLIBC_X.XX' not found

Cause: Node.js version incompatible with agent OS

Solution:

# Use UseNode task
- task: UseNode@1
  inputs:
    version: '22.x'

6. Docker Build Failures

Error: Cannot connect to Docker daemon

Cause: Docker service not running or insufficient permissions

Solution:

  1. Verify agent has Docker installed
  2. Check service account has Docker permissions
  3. Use hosted agent with Docker pre-installed

7. Terraform State Lock

Error: Error locking state or state is locked by another process

Cause: Previous pipeline run crashed without releasing lock

Solution:

# Force unlock (with caution)
terraform force-unlock <lock-id>

# Or wait for lock timeout

8. Artifact Publishing Failures

Error: Failed to publish artifact

Cause: Artifact path doesn't exist or permissions issue

Solution:

# Ensure build produces artifacts
- task: PublishBuildArtifacts@1
  inputs:
    pathToPublish: '$(Build.ArtifactStagingDirectory)'
    artifactName: 'drop'

🔍 Diagnostic Approaches

1. Enable Verbose Logging

variables:
  System.Debug: true

2. Check Pipeline Logs

Look for these key sections:

  • Initialize job - Agent setup issues
  • Checkout - Repository access issues
  • Build tasks - Compilation errors
  • Test tasks - Test failures
  • Deploy tasks - Deployment errors

3. Run Locally

Reproduce the issue locally:

# Simulate pipeline environment
$env:BUILD_BUILDNUMBER = "1.0.0"
$env:BUILD_SOURCESDIRECTORY = (Get-Location).Path

# Run the same commands
dotnet build
dotnet test

4. Check Terraform State

# List workspaces
terraform workspace list

# Check state
terraform state list
terraform show

📊 Stage-Specific Issues

Build Stage

Issue Symptom Resolution
Restore fails NU1101: Unable to find package Check nuget.config, verify feed access
Build fails CSxxxx error Fix code issue, check package versions
Test fails XUnit test failed Review test output, check test dependencies

Deploy Stage

Issue Symptom Resolution
Slot swap fails Slot busy Retry or check App Service status
Config update fails KeyVault access denied Check managed identity permissions
Health check fails 503 Service Unavailable Check app startup, review logs

Auth Stage

Issue Symptom Resolution
Okta API error 401 Unauthorized Rotate Okta API credentials
Entra ID error Insufficient privileges Check service principal permissions
Role assignment fails Principal not found Verify user/group exists

🌍 Environment-Specific Issues

Development (DEV)

  • More permissive, may have different variable values
  • Uses non-prod Okta tenant
  • Terraform workspaces suffixed with -dev

UAT

  • Mirrors production configuration
  • May have restricted access
  • Requires approval gates

Production (PROD)

  • Strict approval requirements
  • Uses production Okta tenant
  • Blue-green deployment slots
  • Extended health check timeouts

🔄 Recovery Patterns

Failed Deployment Rollback

# Pipeline includes rollback logic
- task: AzureFunctionApp@1
  inputs:
    deployToSlotOrASE: true
    slotName: 'staging'
    # If health check fails, no slot swap occurs

Terraform State Recovery

# Import existing resource
terraform import azurerm_storage_account.main /subscriptions/.../storageAccounts/xxx

# Remove orphaned state
terraform state rm azurerm_storage_account.old

Retry Failed Stage

  1. Go to pipeline run
  2. Click on failed stage
  3. Click "Rerun failed jobs"

📝 Pipeline Variables Reference

Built-in Variables

Variable Description
$(Build.BuildNumber) Pipeline build number
$(Build.SourceBranch) Git branch
$(Build.Repository.Name) Repository name
$(System.DefaultWorkingDirectory) Agent working directory

Forge Variables

Variable Description
$(ProjectId) Forge project identifier
$(Environment) Deployment environment
$(IsProduction) Boolean for prod checks

🛡️ Prevention Strategies

1. Pin Versions

# Pin .NET version
- task: UseDotNet@2
  inputs:
    version: '10.0.x'

# Pin Node version
- task: UseNode@1
  inputs:
    version: '22.x'

2. Use Lock Files

  • packages.lock.json for NuGet
  • package-lock.json for npm
  • .terraform.lock.hcl for Terraform

3. Validate Before Deploy

# Add validation stage
- stage: Validate
  jobs:
    - job: ValidateTerraform
      steps:
        - script: terraform validate

4. Health Checks

# Configure deployment health checks
- task: AzureWebApp@1
  inputs:
    healthCheckPath: '/health'
    healthCheckTimeout: '300'