Infrastructure as Code: Managing Your Stack

Terraform for Reliable Deployments

16 min readInfrastructure

Infrastructure as code eliminates manual deployment steps. No more "click this button then that button." Your infrastructure is version controlled, reviewable, and reproducible.

Why Infrastructure as Code Matters

Manual infrastructure is:

  • Unreproducible: Disaster recovery requires remembering steps. People forget. Systems break.
  • Undocumented: Your infrastructure lives in someone's head. That person leaves. Knowledge walks out the door.
  • Unmaintainable: Changes are ad-hoc. No audit trail. No rollback.

Terraform Basics

Terraform describes your infrastructure as code. You define what you want. Terraform figures out what to create, update, or destroy:

# main.tf
terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

# Define your infrastructure
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"
  
  tags = {
    Name = "web-server"
  }
}

resource "aws_security_group" "web" {
  name = "web-security-group"
  
  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

State Management

Terraform tracks state. State is critical. Lose it, lose control of your infrastructure:

# terraform.tf
terraform {
  backend "s3" {
    bucket         = "my-terraform-state"
    key            = "prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

# CRITICAL: Enable state locking
# This prevents concurrent modifications
# that could corrupt state

Variables and Outputs

Extract values into variables. Make your code reusable:

# variables.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "instance_count" {
  description = "Number of instances"
  type        = number
  default     = 2
  
  validation {
    condition     = var.instance_count > 0 && var.instance_count <= 10
    error_message = "Instance count must be between 1 and 10."
  }
}

# outputs.tf
output "instance_ids" {
  value       = aws_instance.web[*].id
  description = "IDs of the instances"
}

output "api_endpoint" {
  value       = aws_api_gateway_deployment.api.invoke_url
  description = "API Gateway endpoint"
}

Organizing with Modules

Modules allow you to package and reuse infrastructure components:

# modules/database/main.tf
resource "aws_db_instance" "default" {
  allocated_storage    = var.storage_size
  storage_type         = "gp2"
  engine               = "postgres"
  engine_version       = "14"
  instance_class       = var.instance_class
  database_name        = var.db_name
  username             = var.db_user
  password             = var.db_password
  parameter_group_name = "default.postgres14"
  skip_final_snapshot  = false
  final_snapshot_identifier = "${var.environment}-snapshot"
}

# main.tf - Using the module
module "database" {
  source = "./modules/database"
  
  environment     = "production"
  storage_size    = 100
  instance_class  = "db.t3.micro"
  db_name         = "myapp"
  db_user         = "admin"
  db_password     = var.db_password
}

Common Pitfalls

  • Hardcoding values: Use variables. Don't hardcode passwords or IPs.
  • Ignoring state: Treat state like your database. Back it up. Lock it.
  • No destruction testing: Test your terraform destroy. You'll need it in disaster recovery.
  • Skipping version control: All Terraform goes in git. Every change is reviewed.

Recommended Workflow

# 1. Plan changes (always review first)
terraform plan -out=tfplan

# 2. Review the plan carefully
cat tfplan

# 3. Apply only if it looks right
terraform apply tfplan

# 4. Commit to git
git add terraform.tf variables.tf
git commit -m "Infrastructure: Add RDS database"

Why This Topic Matters in Production

Infrastructure choices define operational behavior long after features ship. Small setup shortcuts often become recurring incident patterns at scale.

Infrastructure quality determines runtime predictability. Small shortcuts in build determinism, image hardening, or runtime assumptions often become recurring incident patterns once traffic and deployment frequency increase.

Production-grade infrastructure is not just provisioning. It is the discipline of reproducible builds, secure defaults, dependency-aware health checks, and controlled rollout behavior across environments.

Core Concepts

Deterministic builds reduce environment drift and improve rollback confidence.

Runtime assumptions should be explicit: ports, env vars, health semantics, and resource limits.

Image minimization and least privilege reduce attack surface and startup variance.

State and migration workflows require reversible, audited execution paths.

  • Prefer deterministic, versioned infrastructure definitions over manual operations.
  • Treat runtime configuration and secrets as controlled system inputs.
  • Build with immutable artifacts and explicit runtime assumptions.
  • Define health and readiness semantics as deployment gates.

Real-World Mistakes

Using mutable runtime dependencies and unpinned tooling versions.

Health checks that do not validate critical dependencies.

Shipping containers as root with broad filesystem permissions.

Skipping disaster recovery and restoration drills for stateful systems.

  • Unpinned dependencies and mutable runtime environments.
  • Missing health checks or checks that do not reflect dependency readiness.
  • Treating container images as build outputs without security hardening.
  • No disaster-recovery drills for stateful infrastructure changes.

Use multi-stage builds, minimal runtime images, and non-root users.

Validate runtime configuration before process start.

Implement deploy smoke checks and post-release verification.

Keep infrastructure changes in version control with plan/apply review discipline.

  • Use multi-stage builds and least-privilege runtime users.
  • Keep infra changes in version control with review and plan/apply discipline.
  • Validate startup config and fail fast on invalid critical settings.
  • Add smoke tests and post-deploy verification for critical routes.

Implementation Checklist

  • Pin build/runtime dependencies and validate reproducibility.
  • Enforce non-root runtime and minimal image footprint.
  • Define dependency-aware health checks and deploy gates.
  • Test backup and restore paths regularly.

Architecture Notes

Infrastructure drift is often a governance issue before it becomes an outage issue.

Deterministic builds reduce rollback ambiguity and simplify incident diagnosis.

Security and reliability both improve when runtime assumptions are explicit and validated.

Applied Example

Container Readiness Contract

type Health = { app: boolean; db: boolean; queue: boolean };

export function readinessStatus(health: Health) {
  const ready = health.app && health.db && health.queue;
  return {
    ready,
    status: ready ? "ready" : "degraded",
    details: health,
  };
}

Trade-offs

Hardening and deterministic builds add setup cost but reduce incident frequency.

Strict startup checks fail early, which is preferable to hidden partial failures.

Operational controls can slow release pace slightly while improving confidence.

  • Hardening and deterministic builds increase setup effort but reduce runtime risk.
  • Strict startup checks can fail releases early, which is preferable to partial boot failures.
  • Operational controls can slow iteration slightly while dramatically improving reliability.

Production Perspective

Reliability improves with explicit readiness and rollback conditions.

Security improves via least-privilege runtime and secret hygiene.

Performance stability depends on proper resource limits and healthy orchestration.

Maintainability improves when runtime behavior is observable and testable.

  • Reliability improves when every deploy has explicit rollback criteria.
  • Security improves with smaller images, non-root runtime, and secret hygiene.
  • Performance stability depends on resource limits and health-driven orchestration.
  • Maintainability improves when infrastructure behavior is testable and documented.

Final Takeaway

Infrastructure quality is the discipline of making runtime behavior predictable, secure, and recoverable under change.

Infrastructure is part of product quality.

Predictable runtime behavior is the baseline for safe delivery at scale.

Key Takeaways

  • Infrastructure as code prevents configuration drift
  • State is critical: secure it, backup it, lock it
  • Modules make infrastructure reusable and maintainable
  • Always test terraform destroy before you need it
  • Code review for infrastructure is as important as code review for application code

Future Improvements

  • Implement Terraform testing with Terratest
  • Create policy-as-code with OPA/Sentinel
  • Set up infrastructure change notifications
  • Document disaster recovery procedures
← Back to all articles