Principles¶

Decision-making frameworks for building the Forge platform.

📋 Overview¶

These principles guide platform engineering decisions. Every feature, template, and tool we build is evaluated against these principles to ensure we're creating a platform that enables developers to build secure, scalable, and maintainable applications efficiently.

For Platform Engineers: Use these principles as decision-making criteria when:

Designing new platform features
Reviewing pull requests to platform repositories
Evaluating third-party tools and libraries
Making architectural trade-offs

🔗 See Also: Forge History for how these principles evolved.

🔒 Principle of Zero-Trust Architecture¶

Platform features must be secure without requiring additional configuration.

Security should not be an afterthought or an optional add-on. By making applications secure by default, Forge reduces the risk of vulnerabilities introduced by misconfiguration or oversight.

🎯 Guidelines¶

Default to locked down; teams should explicitly grant access

New resources should have no external access by default
Authentication and authorization should be required, not optional
Teams must explicitly configure access when needed
Zero permissions by default, grant least privilege incrementally
Application permissions are scoped to specific resources (e.g., app reads only its secret, not all secrets)

Enforce security at infrastructure/platform layers, not in app templates

Applications shouldn't be responsible for network security, encryption, or access control
Infrastructure modules should enforce security policies transparently
API Management handles authentication and authorization policies
Separation of concerns: apps implement business logic, platform handles security

Automate credential management; developers shouldn't handle secrets

Managed identities eliminate the need for service principals and secrets
Secrets should be injected at runtime, never committed to code
Automatic rotation of credentials reduces risk
Developers should never see or manage production credentials

If a template can be deployed insecurely, redesign it

Templates should make insecure configurations difficult or impossible
Security should be the path of least resistance
If security requires extra steps, developers will skip them
Make secure-by-default easier than insecure configurations

📊 Platform Capabilities¶

No Default API Authorization: Secure APIs with no out-of-box access; teams must explicitly grant access
API Management: All access flows through Azure API Management with enforced policies
Managed Service Principals: Automatic rotation of keys via Terraform and managed identities
Token-Based Context: User context passed across APIs using Entra ID and Okta tokens
Network Isolation: Private endpoints and VNet integration by default

🔗 Learn More: Security Overview for implementation details.

📘 Principle of Proven Patterns¶

Choose secure, tested defaults over flexibility when designing platform capabilities.

Flexibility sounds valuable but often results in inconsistency, security vulnerabilities, and maintenance overhead. By being opinionated about how applications should be built, Forge reduces cognitive load on developers and ensures security and compliance by default.

🎯 Guidelines¶

When adding features, prefer opinionated defaults that enforce security/compliance

Security should be the default path, not an opt-in configuration
Compliance requirements should be baked into templates, not external checklists
If there's a "right way" to do something, make that the only way
Less choice means fewer decisions developers need to make

Provide escape hatches only when there's a documented business need

Start opinionated, add flexibility later only when teams hit real limitations
Require justification for deviating from standard patterns
Escape hatches should be clearly documented with warnings about trade-offs
Most teams should never need to use escape hatches

Document the "why" behind every opinionated choice in our templates

Developers are more likely to follow patterns when they understand the reasoning
Include comments explaining security decisions, compliance requirements, and architectural choices
Link to documentation that provides deeper context
Make it easy for developers to learn while using templates

Constraint is a feature, not a limitation

Constraints reduce decision fatigue and enable faster development
Standardization makes troubleshooting easier across teams
Opinionated platforms create common vocabulary and shared understanding
Flexibility has a cost: maintenance, security review, support burden

📊 Platform Capabilities¶

Terraform Modules: Pre-configured infrastructure modules vetted by SAIF and Microsoft
Security Best Practices: Modules enforce least-privilege access, managed identities, and encryption
Baked-in Best Practices: Compliance, monitoring, and security integrated by default in all templates
Standardized Patterns: Consistent API structure, authentication flows, and deployment models

🛠️ Principle of Leverage Over Invention¶

Build on existing solutions rather than creating custom platform abstractions.

Custom code is a liability. Every line of custom framework code we write must be maintained, documented, tested, and supported. Managed services and vendor-provided frameworks shift this burden to companies with dedicated teams and deeper expertise.

🎯 Guidelines¶

Before building custom tooling, exhaust vendor and OSS options

Research existing solutions before writing code
Evaluate Azure-managed services, .NET platform features, and established OSS libraries
Prefer leveraging managed services over building custom solutions
Custom code should be a last resort, not a first instinct

Prefer integrating managed services over building platform services

Azure PaaS offerings handle scaling, patching, monitoring, and security
Managed services reduce on-call burden and maintenance overhead
Teams can focus on business logic instead of infrastructure management
Managed services often have better SLAs than custom solutions

Contribute upstream to vendor ecosystems instead of wrapping them

If a vendor tool is missing a feature, contribute to the upstream project
Avoid creating thin wrappers that add maintenance without value
Wrappers become outdated as upstream tools evolve
Engage with vendor communities to influence roadmaps

📊 Platform Capabilities¶

Vendor Frameworks: Leverage Aspire, Entity Framework, Kiota, and other Microsoft-supported tools
Managed Services: Prioritize Azure PaaS (App Services, Cosmos DB, API Management) over custom solutions
Service Discovery & Telemetry: Use Aspire for service orchestration and OpenTelemetry for observability
API Security: Azure API Management handles authentication, authorization, and policy enforcement

⚡ Principle of Speed-to-Value¶

When designing platform features, prioritize reducing time-to-first-deployment.

Traditional software platforms often require extensive setup, configuration, and coordination before developers can deploy their first application. This delays value delivery and frustrates teams. Forge inverts this model: same-day deployments are possible because the platform automates everything required to go from idea to production.

🎯 Guidelines¶

Every new template or tool should enable deployment within 24 hours

New project templates must include all infrastructure, pipelines, and configuration needed for immediate deployment
Don't gate deployment on manual approval processes unless legally or audit-required (e.g., production approvals)
Pre-configure sensible defaults that work in production environments
Templates should create Azure DevOps repositories, pipelines, and infrastructure definitions automatically

Default to automation; manual steps are technical debt

If a process requires human intervention, automate it or document why it can't be automated
Manual steps should be exceptional, not standard
Build automation into templates, not external runbooks
When automation isn't possible, create self-service tools

If a feature adds deployment friction, it needs strong justification

New dependencies, approval gates, or configuration requirements slow teams down
Weigh the cost of friction against the benefit provided
Can the same outcome be achieved with less friction?
If friction is unavoidable, ensure clear documentation and error messages guide developers

Measure platform success using DORA and SPACE metrics

DORA Metrics measure delivery performance:
Deployment Frequency, Lead Time for Changes, Change Failure Rate, Time to Restore Service
SPACE Framework measures developer productivity:
Satisfaction and well-being, Performance, Activity, Communication and collaboration, Efficiency and flow
Track time-to-first-deployment as a leading indicator of platform effectiveness
Use both frameworks together—DORA for delivery, SPACE for developer experience

📊 Platform Capabilities¶

CLI Tools: saif new command creates fully-configured projects in minutes
Project Templates: Experience APIs, Process APIs, System APIs, event services, and composable features (databases, frontends, event subscriptions)
Automated Repos & Pipelines: Code-to-production workflow using Azure DevOps with zero manual configuration
Infrastructure as Code: Terraform modules that provision environments consistently and repeatably
Pipeline Templates: Standardized CI/CD pipelines with security scanning, testing, and deployment built-in

🔧 Principle of Team Autonomy¶

Design platform features that eliminate dependencies on central teams.

Waiting on central teams (platform, security, infrastructure) creates bottlenecks that slow down development. Self-service capabilities and automation reduce coordination overhead and enable teams to move independently.

🎯 Guidelines¶

Every operational capability should be self-service or automated

Teams should be able to provision infrastructure, deploy applications, and manage configuration without tickets
Self-service doesn't mean uncontrolled: use guardrails and policy enforcement
Automation should be triggered by code commits, not manual processes
Central team intervention should be exceptional, not routine

If teams are filing tickets for something, it's a platform gap

Repeated tickets for the same operation indicate missing self-service capability
Track common requests and prioritize automation
Tickets should be for exceptions and edge cases, not standard workflows
Every ticket is an opportunity to improve the platform

Build observability into everything; teams should self-diagnose

Applications should expose metrics, logs, and traces by default
Teams need visibility to troubleshoot without escalation
Clear error messages reduce support burden
Dashboards and alerts should be provisioned automatically

Guardrails > approval gates

Prevent bad outcomes with automated policy enforcement instead of manual approvals
Approvals slow teams down and don't scale
Use infrastructure-as-code validation, security scanning, and policy-as-code
Let teams move fast within secure boundaries

📊 Platform Capabilities¶

Scaling Control: Teams manage manual and automatic scaling via Terraform variables
Full Visibility: Logs, tracing, metrics, and telemetry via Dynatrace (deployed) and Aspire (local)
API Authorization: Self-service pipeline workflows with approval gates only where required
Infrastructure Provisioning: Teams deploy infrastructure changes through GitOps workflows

🖥️ Principle of Local-First Orchestration¶

Development environments must mirror production architecture through local orchestration.

Tooling should abstract infrastructure dependencies—swapping cloud-managed resources for local equivalents—to ensure the full application stack is runnable and observable offline, maintaining strict parity with production topology.

🎯 Guidelines¶

Templates should include local development configuration by default

Every project template must include a local development setup
Local development should work out-of-the-box after cloning the repository
No manual configuration, credentials, or external dependencies required
Use Docker Compose, Aspire, or similar tools for local orchestration

Provide mocks/emulators for all platform-provided services

Azure Cosmos DB → Use the Cosmos DB emulator
Azure Blob Storage → Use Azurite
Downstream APIs → Use WireMock CLI for service virtualization

Local development should work offline (no VPN, no cloud auth)

Developers shouldn't need VPN access to run applications locally
No dependency on cloud-hosted services for local development
Emulators and mocks should provide sufficient fidelity for development
Integration tests leverage local development setup; cloud environments are for deployment verification and production

If a feature breaks local dev, it's a breaking change

Local development experience is a first-class requirement
Features that only work in cloud environments need strong justification
Test locally before deploying to cloud
Local-first development reduces cloud costs and increases developer productivity

📊 Platform Capabilities¶

Mock Services: WireMock CLI for mocking downstream APIs and external dependencies
Orchestration: Aspire manages service dependencies and orchestration locally
Minimal Dependencies: Run applications with emulators and mocks instead of live cloud services

🔗 Examples: See Foundry for experimental prototypes exploring local development patterns.

🔄 Principle of Platform as a Product¶

Manage the platform with a roadmap, defined lifecycle, and rigorous deprecation policies.

A platform that only adds features eventually collapses under its own weight. We treat Forge as a product with customers (developers), not just a collection of scripts. This means being aggressive about removing operational debt and retiring patterns that no longer serve the organization while continuously evolving based on evidence and feedback.

🎯 Guidelines¶

Deprecation is a feature, not a failure

Support contracts must be finite; maintaining multiple versions of templates slows innovation
Every new feature should have a plan for how it will eventually be retired or superseded
Provide comprehensive migration guides with before/after examples when introducing breaking changes
Version-based lifecycle management (e.g., Forge 2.x → 3.0) with clear compatibility matrices

The "Golden Path" evolves; it's not permanent

When industry standards shift (e.g., .NET 8 to .NET 10), the platform must lead the migration
Don't preserve legacy patterns to avoid friction; friction is acceptable when moving to better security or maintainability
Document why a pattern changed, not just what changed
Use the roadmap to communicate upcoming changes and deprecations transparently

Make decisions based on evidence, not assumptions

Use SPACE metrics to validate that platform changes improve developer productivity and satisfaction
Gather feedback from teams before and after major changes
Track support requests as signals for platform gaps
Prototype and experiment when data is unavailable, then measure outcomes

Iterate toward the ideal, don't wait for perfection

Ship improvements incrementally rather than waiting for complete solutions
Each iteration should deliver measurable value
Define clear "done" criteria to prevent endless refinement
Balance progress with quality—don't sacrifice stability for speed

📊 Platform Capabilities¶

Developer Productivity Measurement: SPACE framework surveys to capture developer experience and satisfaction
Feedback Channels: Teams channel, backlog, and regular surveys
Migration Guides: Documented upgrade paths with before/after examples and version compatibility matrices
Versioning Strategy: Semantic versioning with clear breaking change indicators (Forge 2.x → 3.0)
Public Roadmap: Now-Next-Later format with principle-based objectives visible in the repository

🔑 Key Concepts¶

Principles vs. Capabilities¶

Principles guide how we make platform engineering decisions
Platform capabilities are the what — specific implementations in templates, CLI tools, and modules

Using These Principles¶

In Design Reviews:

Does this feature reduce time-to-deployment or add friction?
Are we building custom code when a managed service exists?
Does this require manual intervention or enable self-service?
Is this secure by default or does it require configuration?
Can developers use this feature locally without cloud access?
How will we measure the success of this feature?

In PR Reviews:

Does this change align with our principles?
Are we adding technical debt (custom code) or reducing it (using vendor solutions)?
Does this improve developer experience or add complexity?
Are security best practices enforced at the platform layer?

In Roadmap Planning:

Which principle does this initiative support?
Does this enable teams to move faster independently?
Are we reducing manual processes and approval gates?
Is this solving a real platform gap or adding unnecessary flexibility?
What metrics will indicate success, and how will we track them?

Forge History - Evolution of the platform and lessons learned
Roadmap - Platform initiatives with principle-based objectives
Security Guides - Implementation of security principles
Development Guides - Applying principles in practice

Principles¶

📋 Overview¶

🔒 Principle of Zero-Trust Architecture¶

🎯 Guidelines¶

📊 Platform Capabilities¶

📘 Principle of Proven Patterns¶

🎯 Guidelines¶

📊 Platform Capabilities¶

🛠️ Principle of Leverage Over Invention¶

🎯 Guidelines¶

📊 Platform Capabilities¶

⚡ Principle of Speed-to-Value¶

🎯 Guidelines¶

📊 Platform Capabilities¶

🔧 Principle of Team Autonomy¶

🎯 Guidelines¶

📊 Platform Capabilities¶

🖥️ Principle of Local-First Orchestration¶

🎯 Guidelines¶

📊 Platform Capabilities¶

🔄 Principle of Platform as a Product¶

🎯 Guidelines¶

📊 Platform Capabilities¶

🔑 Key Concepts¶

Principles vs. Capabilities¶

Using These Principles¶

📚 Related Documentation¶