Principles¶
Decision-making frameworks for building the Forge platform.
π Overview¶
These principles guide platform engineering decisions. Every feature, template, and tool we build is evaluated against these principles to ensure we're creating a platform that enables developers to build secure, scalable, and maintainable applications efficiently.
For Platform Engineers: Use these principles as decision-making criteria when:
- Designing new platform features
- Reviewing pull requests to platform repositories
- Evaluating third-party tools and libraries
- Making architectural trade-offs
π See Also: Forge History for how these principles evolved.
π Principle of Zero-Trust Architecture¶
Platform features must be secure without requiring additional configuration.
Security should not be an afterthought or an optional add-on. By making applications secure by default, Forge reduces the risk of vulnerabilities introduced by misconfiguration or oversight.
π― Guidelines¶
Default to locked down; teams should explicitly grant access
- New resources should have no external access by default
- Authentication and authorization should be required, not optional
- Teams must explicitly configure access when needed
- Zero permissions by default, grant least privilege incrementally
- Application permissions are scoped to specific resources (e.g., app reads only its secret, not all secrets)
Enforce security at infrastructure/platform layers, not in app templates
- Applications shouldn't be responsible for network security, encryption, or access control
- Infrastructure modules should enforce security policies transparently
- API Management handles authentication and authorization policies
- Separation of concerns: apps implement business logic, platform handles security
Automate credential management; developers shouldn't handle secrets
- Managed identities eliminate the need for service principals and secrets
- Secrets should be injected at runtime, never committed to code
- Automatic rotation of credentials reduces risk
- Developers should never see or manage production credentials
If a template can be deployed insecurely, redesign it
- Templates should make insecure configurations difficult or impossible
- Security should be the path of least resistance
- If security requires extra steps, developers will skip them
- Make secure-by-default easier than insecure configurations
π Platform Capabilities¶
- No Default API Authorization: Secure APIs with no out-of-box access; teams must explicitly grant access
- API Management: All access flows through Azure API Management with enforced policies
- Managed Service Principals: Automatic rotation of keys via Terraform and managed identities
- Token-Based Context: User context passed across APIs using Entra ID and Okta tokens
- Network Isolation: Private endpoints and VNet integration by default
π Learn More: Security Overview for implementation details.
π Principle of Proven Patterns¶
Choose secure, tested defaults over flexibility when designing platform capabilities.
Flexibility sounds valuable but often results in inconsistency, security vulnerabilities, and maintenance overhead. By being opinionated about how applications should be built, Forge reduces cognitive load on developers and ensures security and compliance by default.
π― Guidelines¶
When adding features, prefer opinionated defaults that enforce security/compliance
- Security should be the default path, not an opt-in configuration
- Compliance requirements should be baked into templates, not external checklists
- If there's a "right way" to do something, make that the only way
- Less choice means fewer decisions developers need to make
Provide escape hatches only when there's a documented business need
- Start opinionated, add flexibility later only when teams hit real limitations
- Require justification for deviating from standard patterns
- Escape hatches should be clearly documented with warnings about trade-offs
- Most teams should never need to use escape hatches
Document the "why" behind every opinionated choice in our templates
- Developers are more likely to follow patterns when they understand the reasoning
- Include comments explaining security decisions, compliance requirements, and architectural choices
- Link to documentation that provides deeper context
- Make it easy for developers to learn while using templates
Constraint is a feature, not a limitation
- Constraints reduce decision fatigue and enable faster development
- Standardization makes troubleshooting easier across teams
- Opinionated platforms create common vocabulary and shared understanding
- Flexibility has a cost: maintenance, security review, support burden
π Platform Capabilities¶
- Terraform Modules: Pre-configured infrastructure modules vetted by SAIF and Microsoft
- Security Best Practices: Modules enforce least-privilege access, managed identities, and encryption
- Baked-in Best Practices: Compliance, monitoring, and security integrated by default in all templates
- Standardized Patterns: Consistent API structure, authentication flows, and deployment models
π οΈ Principle of Leverage Over Invention¶
Build on existing solutions rather than creating custom platform abstractions.
Custom code is a liability. Every line of custom framework code we write must be maintained, documented, tested, and supported. Managed services and vendor-provided frameworks shift this burden to companies with dedicated teams and deeper expertise.
π― Guidelines¶
Before building custom tooling, exhaust vendor and OSS options
- Research existing solutions before writing code
- Evaluate Azure-managed services, .NET platform features, and established OSS libraries
- Prefer leveraging managed services over building custom solutions
- Custom code should be a last resort, not a first instinct
Prefer integrating managed services over building platform services
- Azure PaaS offerings handle scaling, patching, monitoring, and security
- Managed services reduce on-call burden and maintenance overhead
- Teams can focus on business logic instead of infrastructure management
- Managed services often have better SLAs than custom solutions
Contribute upstream to vendor ecosystems instead of wrapping them
- If a vendor tool is missing a feature, contribute to the upstream project
- Avoid creating thin wrappers that add maintenance without value
- Wrappers become outdated as upstream tools evolve
- Engage with vendor communities to influence roadmaps
π Platform Capabilities¶
- Vendor Frameworks: Leverage Aspire, Entity Framework, Kiota, and other Microsoft-supported tools
- Managed Services: Prioritize Azure PaaS (App Services, Cosmos DB, API Management) over custom solutions
- Service Discovery & Telemetry: Use Aspire for service orchestration and OpenTelemetry for observability
- API Security: Azure API Management handles authentication, authorization, and policy enforcement
β‘ Principle of Speed-to-Value¶
When designing platform features, prioritize reducing time-to-first-deployment.
Traditional software platforms often require extensive setup, configuration, and coordination before developers can deploy their first application. This delays value delivery and frustrates teams. Forge inverts this model: same-day deployments are possible because the platform automates everything required to go from idea to production.
π― Guidelines¶
Every new template or tool should enable deployment within 24 hours
- New project templates must include all infrastructure, pipelines, and configuration needed for immediate deployment
- Don't gate deployment on manual approval processes unless legally or audit-required (e.g., production approvals)
- Pre-configure sensible defaults that work in production environments
- Templates should create Azure DevOps repositories, pipelines, and infrastructure definitions automatically
Default to automation; manual steps are technical debt
- If a process requires human intervention, automate it or document why it can't be automated
- Manual steps should be exceptional, not standard
- Build automation into templates, not external runbooks
- When automation isn't possible, create self-service tools
If a feature adds deployment friction, it needs strong justification
- New dependencies, approval gates, or configuration requirements slow teams down
- Weigh the cost of friction against the benefit provided
- Can the same outcome be achieved with less friction?
- If friction is unavoidable, ensure clear documentation and error messages guide developers
Measure platform success using DORA and SPACE metrics
- DORA Metrics measure delivery performance:
- Deployment Frequency, Lead Time for Changes, Change Failure Rate, Time to Restore Service
- SPACE Framework measures developer productivity:
- Satisfaction and well-being, Performance, Activity, Communication and collaboration, Efficiency and flow
- Track time-to-first-deployment as a leading indicator of platform effectiveness
- Use both frameworks togetherβDORA for delivery, SPACE for developer experience
π Platform Capabilities¶
- CLI Tools:
saif newcommand creates fully-configured projects in minutes - Project Templates: Experience APIs, Process APIs, System APIs, event services, and composable features (databases, frontends, event subscriptions)
- Automated Repos & Pipelines: Code-to-production workflow using Azure DevOps with zero manual configuration
- Infrastructure as Code: Terraform modules that provision environments consistently and repeatably
- Pipeline Templates: Standardized CI/CD pipelines with security scanning, testing, and deployment built-in
π§ Principle of Team Autonomy¶
Design platform features that eliminate dependencies on central teams.
Waiting on central teams (platform, security, infrastructure) creates bottlenecks that slow down development. Self-service capabilities and automation reduce coordination overhead and enable teams to move independently.
π― Guidelines¶
Every operational capability should be self-service or automated
- Teams should be able to provision infrastructure, deploy applications, and manage configuration without tickets
- Self-service doesn't mean uncontrolled: use guardrails and policy enforcement
- Automation should be triggered by code commits, not manual processes
- Central team intervention should be exceptional, not routine
If teams are filing tickets for something, it's a platform gap
- Repeated tickets for the same operation indicate missing self-service capability
- Track common requests and prioritize automation
- Tickets should be for exceptions and edge cases, not standard workflows
- Every ticket is an opportunity to improve the platform
Build observability into everything; teams should self-diagnose
- Applications should expose metrics, logs, and traces by default
- Teams need visibility to troubleshoot without escalation
- Clear error messages reduce support burden
- Dashboards and alerts should be provisioned automatically
Guardrails > approval gates
- Prevent bad outcomes with automated policy enforcement instead of manual approvals
- Approvals slow teams down and don't scale
- Use infrastructure-as-code validation, security scanning, and policy-as-code
- Let teams move fast within secure boundaries
π Platform Capabilities¶
- Scaling Control: Teams manage manual and automatic scaling via Terraform variables
- Full Visibility: Logs, tracing, metrics, and telemetry via Dynatrace (deployed) and Aspire (local)
- API Authorization: Self-service pipeline workflows with approval gates only where required
- Infrastructure Provisioning: Teams deploy infrastructure changes through GitOps workflows
π₯οΈ Principle of Local-First Orchestration¶
Development environments must mirror production architecture through local orchestration.
Tooling should abstract infrastructure dependenciesβswapping cloud-managed resources for local equivalentsβto ensure the full application stack is runnable and observable offline, maintaining strict parity with production topology.
π― Guidelines¶
Templates should include local development configuration by default
- Every project template must include a local development setup
- Local development should work out-of-the-box after cloning the repository
- No manual configuration, credentials, or external dependencies required
- Use Docker Compose, Aspire, or similar tools for local orchestration
Provide mocks/emulators for all platform-provided services
- Azure Cosmos DB β Use the Cosmos DB emulator
- Azure Blob Storage β Use Azurite
- Downstream APIs β Use WireMock CLI for service virtualization
Local development should work offline (no VPN, no cloud auth)
- Developers shouldn't need VPN access to run applications locally
- No dependency on cloud-hosted services for local development
- Emulators and mocks should provide sufficient fidelity for development
- Integration tests leverage local development setup; cloud environments are for deployment verification and production
If a feature breaks local dev, it's a breaking change
- Local development experience is a first-class requirement
- Features that only work in cloud environments need strong justification
- Test locally before deploying to cloud
- Local-first development reduces cloud costs and increases developer productivity
π Platform Capabilities¶
- Mock Services: WireMock CLI for mocking downstream APIs and external dependencies
- Orchestration: Aspire manages service dependencies and orchestration locally
- Minimal Dependencies: Run applications with emulators and mocks instead of live cloud services
π Examples: See Foundry for experimental prototypes exploring local development patterns.
π Principle of Platform as a Product¶
Manage the platform with a roadmap, defined lifecycle, and rigorous deprecation policies.
A platform that only adds features eventually collapses under its own weight. We treat Forge as a product with customers (developers), not just a collection of scripts. This means being aggressive about removing operational debt and retiring patterns that no longer serve the organization while continuously evolving based on evidence and feedback.
π― Guidelines¶
Deprecation is a feature, not a failure
- Support contracts must be finite; maintaining multiple versions of templates slows innovation
- Every new feature should have a plan for how it will eventually be retired or superseded
- Provide comprehensive migration guides with before/after examples when introducing breaking changes
- Version-based lifecycle management (e.g., Forge 2.x β 3.0) with clear compatibility matrices
The "Golden Path" evolves; it's not permanent
- When industry standards shift (e.g., .NET 8 to .NET 10), the platform must lead the migration
- Don't preserve legacy patterns to avoid friction; friction is acceptable when moving to better security or maintainability
- Document why a pattern changed, not just what changed
- Use the roadmap to communicate upcoming changes and deprecations transparently
Make decisions based on evidence, not assumptions
- Use SPACE metrics to validate that platform changes improve developer productivity and satisfaction
- Gather feedback from teams before and after major changes
- Track support requests as signals for platform gaps
- Prototype and experiment when data is unavailable, then measure outcomes
Iterate toward the ideal, don't wait for perfection
- Ship improvements incrementally rather than waiting for complete solutions
- Each iteration should deliver measurable value
- Define clear "done" criteria to prevent endless refinement
- Balance progress with qualityβdon't sacrifice stability for speed
π Platform Capabilities¶
- Developer Productivity Measurement: SPACE framework surveys to capture developer experience and satisfaction
- Feedback Channels: Teams channel, backlog, and regular surveys
- Migration Guides: Documented upgrade paths with before/after examples and version compatibility matrices
- Versioning Strategy: Semantic versioning with clear breaking change indicators (Forge 2.x β 3.0)
- Public Roadmap: Now-Next-Later format with principle-based objectives visible in the repository
π Key Concepts¶
Principles vs. Capabilities¶
- Principles guide how we make platform engineering decisions
- Platform capabilities are the what β specific implementations in templates, CLI tools, and modules
Using These Principles¶
In Design Reviews:
- Does this feature reduce time-to-deployment or add friction?
- Are we building custom code when a managed service exists?
- Does this require manual intervention or enable self-service?
- Is this secure by default or does it require configuration?
- Can developers use this feature locally without cloud access?
- How will we measure the success of this feature?
In PR Reviews:
- Does this change align with our principles?
- Are we adding technical debt (custom code) or reducing it (using vendor solutions)?
- Does this improve developer experience or add complexity?
- Are security best practices enforced at the platform layer?
In Roadmap Planning:
- Which principle does this initiative support?
- Does this enable teams to move faster independently?
- Are we reducing manual processes and approval gates?
- Is this solving a real platform gap or adding unnecessary flexibility?
- What metrics will indicate success, and how will we track them?
π Related Documentation¶
- Forge History - Evolution of the platform and lessons learned
- Roadmap - Platform initiatives with principle-based objectives
- Security Guides - Implementation of security principles
- Development Guides - Applying principles in practice