Featured image of post GitHub Management Plane

GitHub Management Plane

Split-mode Terraform architecture for managing 40+ GitHub repositories at scale.

The Problem Space

Managing GitHub organizations through the web UI works fine for a handful of repositories, but breaks down at scale. At 40+ repositories across a homelab environment, three problems emerge: visibility into what changed and when disappears, enforcing naming conventions and organizational standards becomes manual and error-prone, and consistency across repositories drifts over time.

This project tackles those problems by treating GitHub organization management as infrastructure as code. Every repository, team, secret, and setting lives in version control. Changes go through code review. Drift is detected and corrected automatically.

The design evolved through two architectural iterations to reach its current form.

First Iteration: Monolithic State

The initial implementation placed all resources into a single Terraform state. This worked initially but revealed scaling challenges: every terraform plan or apply fetched the entire organization state, GitHub API rate limits became a concern during concurrent operations, and a single misconfiguration could block changes across unrelated resources.

The monolithic approach also meant that CI/CD pipelines couldn’t target specific resources—any change required planning and applying against the full state.

Split-Mode Architecture: Design Decision

The solution separates state into two independent stacks with distinct responsibilities:

  graph TB
    subgraph "stacks/organization"
        O[Organization Stack]
    end
    
    subgraph "stacks/repository"
        R[Repository Stack]
    end
    
    O --> OrgSet[Organization Settings]
    O --> CustomProps[Custom Properties]
    O --> Teams[Teams]
    O --> SecVars[Org Secrets/Variables]
    
    R --> Repo1[tf-infra-homelab]
    R --> Repo2[tf-module-proxmox-talos]
    R --> Repo3[applications-homelab]
    R --> RepoN[...N repositories]

Why separate organization and repository stacks?

Organization-wide resources (settings, teams, custom properties, secrets) change infrequently but affect everything. Repository resources change frequently as new services are added or configurations evolve. Keeping these separate allows targeted planning—changing a team’s membership doesn’t require fetching all 40+ repository states.

Why per-repository state keys?

Each repository gets its own state key within the repository stack. This enables parallel planning across repositories and prevents a single repository’s configuration error from blocking the entire organization. The repository root directory remains as the migration source artifact—the actual state lives in the split stacks.

This architecture manages 26 repositories currently, with the ability to scale beyond without hitting GitHub API limits during normal operations.

Implementation Components

The codebase implements the split-mode design through five key modules:

Module Purpose
modules/organization Organization settings, custom properties, teams, secrets, variables
modules/repository Repository creation, features, topics, labels per repository
modules/ruleset Branch protection rulesets via the Rulesets API
modules/secrets_variables Organization-level Actions secrets and variables
stacks/organization / stacks/repository Split state backends

Repository Configuration Design

The repository module doesn’t hardcode repository definitions. Instead, it consumes YAML configurations—this choices means adding a repository is adding a file, not modifying Terraform code:

# configurations/repository/tf-infra-homelab.yaml
name: tf-infra-homelab
description: A terraform infrastructure repository for managing my homelab environment
enabled: true
archived: false
visibility: private
type: terraform-infrastructure

topics:
  - homelab
  - proxmox

enabled_features:
  vulnerability_alerts: true
  issues: true
  wiki: false
  projects: false
  discussions: false

The type field drives defaults—different repository types get different license templates, default topics, and initialization settings. This abstraction keeps the Terraform module generic while allowing domain-specific defaults.

Custom Properties Design

Custom properties provide organization-wide classification that propagates to repositories automatically. Three properties track metadata:

  • can-be-public: Controls whether a repository can be made public (boolean)
  • managed-by: Identifies the management mechanism (“github-management-plane” vs “manual”)
  • repository-type: Classifies the repository type for filtering and automation

These properties are defined in the repository module and applied per-repository, though they function as organization-wide classification metadata. The design allows querying repositories by type across the entire organization—useful for bulk operations or compliance checks.

Secrets and Variables Pattern

Organization-level secrets and variables require a different handling strategy than repository resources. Some secrets should never enter Terraform state (tokens managed outside the system), while others need to be synced from a secrets manager.

The implementation distinguishes between manual secrets (where Terraform manages only the secret metadata, not the value) and synced secrets (pulled from Bitwarden at plan/apply time). This separation keeps sensitive values out of state files while maintaining declarative management of which secrets should exist.

Branch Protection via Rulesets

Branch protection rules are implemented using the GitHub Rulesets API (not the legacy branch protection API). This provides more granular control and supports organization-level rulesets that apply across repositories.

Standardized Labels

Shared labels apply to every repository, with repository-specific labels layered on top. The merge pattern ensures baseline labels exist everywhere while allowing per-repository customization.

Repository Landscape

The configuration directory contains 26 repository definitions across several categories:

  • Infrastructure: tf-infra-homelab, tf-infra-github-management-plane
  • Terraform Modules: tf-module-proxmox-{lxc,vm,talos,docker}
  • Cloudflare Workers: cf-worker-terraform-registry, cf-worker-apt-repository
  • Python Applications: python-docker-{email-ingest-elasticsearch,komgah-organizer,maybankforme,restic-backup}, python-restic-backup
  • Templates: template-terraform-basic, template-cloudflare-worker-python
  • Generic: generic-{backstage,packer-image-build,resume,reusable-workflows,zharif-my,applications-homelab}

Each repository is represented as a single YAML file, making the repository inventory self-documenting and code reviewable.

Directory Structure

The repository organizes into three layers:

github-management-plane/
├── configurations/          # Data layer (YAML)
│   ├── repository/         # Repository definitions
│   ├── secrets_variables/ # Secret/variable configurations
│   └── rulesets/          # Branch protection rulesets
├── modules/                 # Reusable Terraform modules
│   ├── organization/       # Organization resources
│   ├── repository/        # Repository resources
│   ├── ruleset/           # Ruleset resources
│   └── secrets_variables/ # Secret/variable resources
├── stacks/                  # Split state backends
│   ├── organization/      # Org-level state
│   └── repository/        # Per-repo state (keyed by repo name)
├── main.tf                 # Legacy orchestration (migration artifact)
├── locals.tf               # Configuration assembly
└── providers.tf           # Provider configuration

Migration Strategy

The split-mode migration followed a cautious approach: keep the legacy path working while validating the new architecture in parallel.

During migration, both execution paths existed:

  • Monolithic: make plan / make apply against the legacy root
  • Split: make plan-org, make apply-org, make plan-repo REPO=<name>, make apply-repo REPO=<name>

The scripts/terraform_targets.py resolver determines which state to target based on what changed:

  • Organization configuration changes → organization stack
  • Repository configuration changes → per-repository state key in repository stack
  • Both → both targets executed

Post-migration, the CI configuration (.github/terraform-state-layout) switches from monolith to split, enabling targeted execution in GitHub Actions. This flag-based approach prevents premature split-mode adoption before all states are migrated.

The migration must complete organization resources first (since repositories may reference teams), then each repository state sequentially.

Observations from Operation

A few things became apparent after running this in production:

Rate limiting matters. GitHub’s API isn’t unlimited, and concurrent operations hit limits quickly. Split-state targeting reduces the scope of each operation, but for larger changes, running during off-peak hours or implementing retry logic becomes necessary.

Drift happens. Manual changes through the GitHub UI will inevitably occur—team members adding collaborators, changing settings, or enabling features outside of Terraform. Regular terraform refresh or drift detection in CI catches these. The design choice to use lifecycle { ignore_changes } on specific attributes (homepage_url, auto_init, template, pages) prevents Terraform from reacting to drift on those fields, keeping plans clean and focused on intentional changes.

Custom properties need maintenance. As repository types evolve, the allowed_values list grows. This is a deliberate trade-off—storing classification metadata enables powerful queries but requires ongoing curation.

Where This Could Go

Future enhancements that align with the current design:

  • Repository invitations: Extending beyond members to manage outside collaborators
  • Security advisories: Automated vulnerability reporting integration
  • Dependabot configuration: Standardized dependency update rules across repositories

Common Misconceptions

The approach trips up people expecting a simple repo creator:

  1. “GitHub Terraform is just for repos” — The provider covers the entire organization: teams, custom properties, secrets, variables, rulesets, and settings. Treating it as a repo-only tool misses most of the value.

  2. “Single Terraform run scales” — At 40+ repositories plus organization resources, one run becomes slow and risky. Split-state targeting is what makes this workable at scale.

  3. “Manual changes can be reverted” — Terraform drift accumulates silently. A team member enables an integration, enables discussions, or changes branch protection—none of which appear in your code. Without drift detection, your Terraform state diverges from reality.

Applicability

This architecture makes sense when:

  • The organization has 20+ repositories requiring consistent management
  • Team collaboration requires audit trails for permission changes
  • Secret and variable management needs organization-wide control
  • Enforcing naming conventions or classification matters

It doesn’t make sense for:

  • A handful of personal repositories
  • One-off experiments where speed matters more than consistency
  • Scenarios where GitHub’s built-in organization settings are sufficient

This design treats GitHub as infrastructure—every change flows through version control, code review, and drift detection. The split-mode architecture emerged from operational needs at scale and continues to evolve as the organization grows.