Skip to main content

Platform Engineering: The Complete Guide to Building Internal Developer Platforms

Elena Rodriguez
14 min read
Platform Engineering: The Complete Guide to Building Internal Developer Platforms

As organizations scale their engineering teams and adopt complex cloud-native architectures, developers often find themselves drowning in operational overhead. Deploying a new service can involve configuring Kubernetes YAML, setting up CI/CD pipelines, provisioning databases, and configuring monitoring—a multi-day endeavor.

According to a 2023 Puppet State of DevOps report, developers spend 30-40% of their time on operational tasks instead of writing code. That's an expensive productivity tax.

Platform Engineering addresses this by creating a paved road for developers.

Platform Engineering reduces developer operational overhead

The Rise of Platform Engineering

Evolution of Developer Productivity
──────────────────────────────────────────────────────────────────

Era 1: Ops Does Everything (2000s)
─────────────────────────────────
Developers: "I need a server"
Ops: "Submit a ticket, 2-week SLA"
Result: Slow, but developers focused on code

Era 2: DevOps / "You Build It, You Run It" (2010s)
─────────────────────────────────────────────────
Developers: "I own everything from code to production"
Reality: Developers become part-time ops engineers
Result: Faster, but cognitive overload

Era 3: Platform Engineering (2020s)
───────────────────────────────────
Platform Team: "Here's a self-service platform"
Developers: "I can deploy in minutes with golden paths"
Result: Fast AND developers focus on business logic

Why DevOps Alone Isn't Enough

DevOps ChallengePlatform Engineering Solution
Every team reinvents the wheelGolden paths with best practices baked in
Inconsistent toolingStandardized, curated toolchain
Steep learning curveSelf-service with guardrails
Tribal knowledgeDocumented, discoverable APIs
Security as afterthoughtSecurity built into templates

What is an Internal Developer Platform (IDP)?

An IDP is a self-service layer that sits on top of your existing infrastructure. It provides a curated set of tools, services, and APIs that abstract away the underlying complexity.

IDP Architecture
──────────────────────────────────────────────────────────────────

                    ┌─────────────────────────────────────────┐
                    │         Developer Portal                │
                    │      (Backstage, Port, Cortex)          │
                    │                                         │
                    │  ┌──────────┐ ┌──────────┐ ┌──────────┐│
                    │  │ Service  │ │   API    │ │  Docs    ││
                    │  │ Catalog  │ │   Docs   │ │  Search  ││
                    │  └──────────┘ └──────────┘ └──────────┘│
                    └──────────────────┬──────────────────────┘
                                       │
                    ┌──────────────────┴──────────────────────┐
                    │            Platform APIs                 │
                    │                                         │
                    │  ┌─────────┐ ┌─────────┐ ┌─────────┐   │
                    │  │ Deploy  │ │  Infra  │ │ Secrets │   │
                    │  │   API   │ │   API   │ │   API   │   │
                    │  └─────────┘ └─────────┘ └─────────┘   │
                    └──────────────────┬──────────────────────┘
                                       │
    ┌──────────────────────────────────┼──────────────────────────┐
    │                                  │                          │
    ▼                                  ▼                          ▼
┌────────────┐                  ┌────────────┐              ┌────────────┐
│   CI/CD    │                  │ Kubernetes │              │   Cloud    │
│            │                  │            │              │            │
│ • GitHub   │                  │ • ArgoCD   │              │ • AWS      │
│   Actions  │                  │ • Helm     │              │ • Terraform│
│ • Jenkins  │                  │ • Kustomize│              │ • Vault    │
└────────────┘                  └────────────┘              └────────────┘

The IDP abstracts complexity while maintaining flexibility

Internal Developer Platform architecture (portal, platform APIs, underlying infrastructure)

Key Components of an IDP

1. Service Catalog

A searchable registry of all services, APIs, and infrastructure in your organization.

# Example: Service catalog entry (Backstage format)
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-service
  description: Handles payment processing and billing
  annotations:
    github.com/project-slug: myorg/payment-service
    backstage.io/techdocs-ref: dir:.
  tags:
    - python
    - payments
    - critical
  links:
    - url: https://grafana.internal/d/payments
      title: Grafana Dashboard
    - url: https://runbooks.internal/payments
      title: Runbooks
spec:
  type: service
  lifecycle: production
  owner: team-payments
  system: billing-platform
  dependsOn:
    - component:user-service
    - resource:payments-db
  providesApis:
    - payments-api
  consumesApis:
    - user-api
    - notification-api

2. Golden Paths (Service Templates)

Pre-configured templates that encode best practices.

# Example: Golden path for new microservice
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: microservice-template
  title: Production-Ready Microservice
  description: |
    Creates a new microservice with:
    - FastAPI application structure
    - Dockerfile optimized for production
    - Kubernetes manifests
    - CI/CD pipeline (GitHub Actions)
    - Observability (Prometheus, Grafana, Jaeger)
    - Security scanning (Snyk, Trivy)
spec:
  owner: platform-team
  type: service

  parameters:
    - title: Service Information
      required:
        - name
        - description
        - owner
      properties:
        name:
          title: Service Name
          type: string
          pattern: "^[a-z0-9-]+$"
        description:
          title: Description
          type: string
        owner:
          title: Owner Team
          type: string
          ui:field: OwnerPicker

    - title: Technical Options
      properties:
        database:
          title: Database
          type: string
          enum:
            - none
            - postgresql
            - mongodb
          default: none
        messaging:
          title: Message Queue
          type: string
          enum:
            - none
            - kafka
            - rabbitmq
          default: none

  steps:
    - id: fetch-template
      name: Fetch Template
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          description: ${{ parameters.description }}
          owner: ${{ parameters.owner }}
          database: ${{ parameters.database }}

    - id: create-repo
      name: Create GitHub Repository
      action: github:repo:create
      input:
        repoUrl: github.com?owner=myorg&repo=${{ parameters.name }}

    - id: register-catalog
      name: Register in Catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.create-repo.output.repoContentsUrl }}
        catalogInfoPath: /catalog-info.yaml

    - id: create-argocd-app
      name: Create ArgoCD Application
      action: argocd:create-application
      input:
        name: ${{ parameters.name }}
        namespace: ${{ parameters.name }}
        repoUrl: ${{ steps.create-repo.output.remoteUrl }}

  output:
    links:
      - title: Repository
        url: ${{ steps.create-repo.output.remoteUrl }}
      - title: ArgoCD
        url: https://argocd.internal/applications/${{ parameters.name }}

3. Self-Service Portal

A web interface where developers can discover, create, and manage services.

Developer Portal Features
──────────────────────────────────────────────────────────────────

┌─────────────────────────────────────────────────────────────────┐
│  🏠 Developer Portal                      [Search...]   👤 Jane │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Quick Actions                                                  │
│  ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐  │
│  │ + New      │ │ 📊 Metrics │ │ 📚 Docs    │ │ 🔧 Tools   │  │
│  │   Service  │ │            │ │            │ │            │  │
│  └────────────┘ └────────────┘ └────────────┘ └────────────┘  │
│                                                                 │
│  My Services (3)                                                │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ 🟢 payment-service    Python │ Production │ Healthy    │   │
│  │ 🟢 user-service       Go     │ Production │ Healthy    │   │
│  │ 🟡 notification-svc   Node   │ Staging    │ Deploying  │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  Recent Activity                                                │
│  • payment-service deployed v2.3.1 (5 min ago)                 │
│  • user-service scaled to 5 replicas (1 hour ago)              │
│  • notification-svc CI failed - fix pending                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

4. Platform APIs

Programmatic access to platform capabilities.

// Example: Platform SDK for developers
import { Platform } from "@myorg/platform-sdk";

const platform = new Platform();

// Create a new environment
const env = await platform.environments.create({
  name: "feature-xyz",
  type: "ephemeral",
  baseTemplate: "staging",
  ttl: "7d",
});

// Deploy a service
await platform.deployments.create({
  service: "payment-service",
  environment: env.id,
  version: "v2.3.1",
  config: {
    replicas: 2,
    resources: {
      cpu: "500m",
      memory: "512Mi",
    },
  },
});

// Provision infrastructure
await platform.infrastructure.create({
  type: "postgresql",
  name: "payments-db",
  environment: env.id,
  size: "small",
});

// Get service status
const status = await platform.services.getStatus("payment-service");
console.log(status);
// {
//   health: 'healthy',
//   replicas: { ready: 2, desired: 2 },
//   latestDeployment: { version: 'v2.3.1', status: 'complete' }
// }

Measuring Platform Success

DORA Metrics Connection

Platform engineering directly improves DORA metrics:

Impact on DORA Metrics
──────────────────────────────────────────────────────────────────

                            Before IDP    After IDP    Improvement
                            ──────────    ─────────    ───────────
Deployment Frequency        Weekly        Daily        7x ⬆
Lead Time for Changes       2 weeks       2 days       7x ⬆
Change Failure Rate         15%           5%           3x ⬇
Mean Time to Recovery       4 hours       30 min       8x ⬇


DORA Performance Level:     Low           Elite        🚀

Developer Experience Metrics

MetricDefinitionTarget
Time to First DeployNew dev → first production deploy< 1 day
Service Creation TimeRequest → running service< 1 hour
Cognitive Load ScoreSurvey-based measure> 4/5
Platform NPSWould you recommend?> 50
Self-Service RateRequests handled without tickets> 90%
Onboarding TimeNew engineer productivity< 2 weeks

Platform Analytics

# Platform usage analytics
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import List

@dataclass
class PlatformMetrics:
    total_services: int
    active_developers: int
    deployments_today: int
    self_service_rate: float
    avg_deploy_time_minutes: float
    template_adoption_rate: float

@dataclass
class TemplateUsage:
    template_name: str
    times_used: int
    success_rate: float
    avg_time_to_production_hours: float

def calculate_platform_roi(
    metrics: PlatformMetrics,
    developer_hourly_cost: float = 100,
    developers_count: int = 100
) -> dict:
    """Calculate ROI of platform engineering investment."""

    # Time saved per deployment (industry average: 4 hours → 0.5 hours)
    hours_saved_per_deploy = 3.5
    deploys_per_dev_per_week = 2
    weekly_hours_saved = (
        hours_saved_per_deploy *
        deploys_per_dev_per_week *
        developers_count
    )

    # Time saved on new service creation (average: 2 days → 1 hour)
    new_services_per_month = 10
    hours_saved_per_new_service = 15
    monthly_service_creation_savings = (
        new_services_per_month *
        hours_saved_per_new_service
    )

    # Reduced incidents due to golden paths
    incidents_prevented_monthly = 5
    avg_incident_cost_hours = 8  # Developer hours per incident
    monthly_incident_savings = (
        incidents_prevented_monthly *
        avg_incident_cost_hours
    )

    # Total monthly savings
    monthly_hours_saved = (
        (weekly_hours_saved * 4) +
        monthly_service_creation_savings +
        monthly_incident_savings
    )

    monthly_cost_savings = monthly_hours_saved * developer_hourly_cost
    annual_savings = monthly_cost_savings * 12

    return {
        "monthly_hours_saved": monthly_hours_saved,
        "monthly_cost_savings": monthly_cost_savings,
        "annual_savings": annual_savings,
        "breakeven_platform_team_size": annual_savings / (developer_hourly_cost * 2000),
        "roi_percentage": (annual_savings / (3 * developer_hourly_cost * 2000)) * 100
    }

# Example calculation
metrics = PlatformMetrics(
    total_services=150,
    active_developers=100,
    deployments_today=45,
    self_service_rate=0.92,
    avg_deploy_time_minutes=8,
    template_adoption_rate=0.85
)

roi = calculate_platform_roi(metrics)
print(f"Annual savings: ${roi['annual_savings']:,.0f}")
print(f"ROI: {roi['roi_percentage']:.0f}%")
# Annual savings: $3,960,000
# ROI: 660%

Building Your IDP: Implementation Roadmap

Phase 1: Foundation (Months 1-3)

Phase 1 Deliverables
──────────────────────────────────────────────────────────────────

1. Service Catalog MVP
   ├── Inventory existing services
   ├── Define metadata schema
   └── Basic search and discovery

2. First Golden Path
   ├── Choose most common service type
   ├── Encode current best practices
   └── Test with 2-3 pilot teams

3. Developer Portal Setup
   ├── Deploy Backstage or equivalent
   ├── Integrate with GitHub/GitLab
   └── Basic documentation hosting

Success Criteria:
├── 50%+ of services registered in catalog
├── First golden path deployed 5+ times
└── Developer NPS baseline established

Phase 2: Self-Service (Months 4-6)

# Phase 2: Expand self-service capabilities
phase_2_goals:
  service_creation:
    - name: "Additional service templates"
      templates:
        - frontend-app
        - background-worker
        - api-gateway
    - name: "Database provisioning"
      supported:
        - postgresql
        - redis
        - mongodb
    - name: "Environment management"
      features:
        - create ephemeral environments
        - clone production data (sanitized)
        - automatic cleanup

  observability:
    - name: "Automatic instrumentation"
      features:
        - prometheus metrics
        - distributed tracing
        - structured logging
    - name: "Dashboard generation"
      auto_create:
        - service health dashboard
        - SLO tracking
        - error rate monitoring

  security:
    - name: "Secrets management"
      integration: vault
    - name: "Security scanning"
      tools:
        - dependency scanning
        - container scanning
        - SAST/DAST

Phase 3: Optimization (Months 7-12)

CapabilityImplementation
Cost visibilityShow cloud costs per service
Compliance automationAuto-enforce policies
Advanced analyticsPlatform usage insights
Self-healingAutomated incident response
Preview environmentsPR-based deployments

Platform Team Structure

Platform Team Organization
──────────────────────────────────────────────────────────────────

Ideal Ratio: 1 platform engineer per 15-20 developers

Small Org (50 devs):
├── 3 Platform Engineers
├── Focus: Core IDP, CI/CD, basic self-service

Medium Org (200 devs):
├── 10-15 Platform Engineers
├── Specialized roles:
│   ├── Developer Portal (2)
│   ├── CI/CD & Deployments (3)
│   ├── Infrastructure Automation (3)
│   ├── Observability (2)
│   └── Security (2)

Large Org (1000+ devs):
├── 50-75 Platform Engineers
├── Dedicated teams per capability
├── Platform product managers
└── Developer advocacy / evangelism

Common Anti-Patterns

Anti-PatternWhy It FailsBetter Approach
Building everything customReinventing the wheelUse open source (Backstage, Crossplane)
Too many optionsDecision paralysisOpinionated golden paths
Mandating without valueDevelopers resistEarn adoption through value
Ignoring developer feedbackPlatform misses needsProduct mindset, continuous research
One team to rule allBottleneckPlatform as product, not gatekeeper
Perfect platform syndromeNever shipsStart small, iterate fast

Technology Landscape

Developer Portal Options

ToolTypeBest For
BackstageOpen sourceLarge orgs, customization needs
PortCommercialFast implementation
CortexCommercialScorecards, compliance
OpsLevelCommercialService ownership
KratixOpen sourcePlatform-as-Product

Infrastructure Abstraction

ToolPurpose
CrossplaneInfrastructure as code with K8s
PulumiModern IaC with real languages
TerraformIndustry standard IaC
CDKAWS-native abstractions

CI/CD Integration

# Example: Standardized GitHub Actions workflow
name: Platform CI/CD

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  platform-pipeline:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # Platform-provided actions
      - uses: myorg/platform-actions/setup@v1
        with:
          service-name: ${{ github.repository }}

      - uses: myorg/platform-actions/build@v1
        with:
          dockerfile: Dockerfile

      - uses: myorg/platform-actions/security-scan@v1

      - uses: myorg/platform-actions/test@v1
        with:
          coverage-threshold: 80

      - uses: myorg/platform-actions/deploy@v1
        if: github.ref == 'refs/heads/main'
        with:
          environment: production
          strategy: canary

Key Takeaways

  1. Platform Engineering is product development—treat developers as customers
  2. Golden paths accelerate everyone—encode best practices, don't gatekeep
  3. Start with the service catalog—you can't improve what you can't see
  4. Self-service is the goal—reduce tickets, increase velocity
  5. Measure developer experience—DORA metrics + developer surveys
  6. Open source first—Backstage, Crossplane, ArgoCD are production-ready
  7. Small team, big impact—1 platform engineer per 15-20 developers
  8. Iterate based on feedback—continuous discovery, not big bang

Platform Engineering is about empowering developers to deliver business value faster, safer, and with less frustration. The best platforms are invisible—they just work.


Ready to build your Internal Developer Platform? Contact EGI Consulting for a platform engineering assessment and implementation roadmap tailored to your organization.

Related articles

Keep reading with a few hand-picked posts based on similar topics.

Posted in Blog & Insights