Deploying Next.js to Production

What Actually Matters

15 min readDevOps

Production is not a larger localhost. It is a different beast. Your code runs on unknown hardware, at scale, with real people depending on it. Treat it differently.

Pre-deployment Checklist

Before moving anything to production, ensure these are done:

  • Environment validation: Your app should crash on startup if env vars are missing or invalid.
  • Database migrations: Automate them. Never run migrations manually.
  • Error tracking: Sentry, Rollbar, or similar. Silent errors kill businesses.
  • Health checks: Implement /health endpoints that verify database connectivity.

Environment Configuration

Validate all environment variables at startup. Fail loudly if anything is missing:

// next.config.ts
import { z } from "zod";

const envSchema = z.object({
  DATABASE_URL: z.string().url(),
  REDIS_URL: z.string().url(),
  NODE_ENV: z.enum(["development", "production", "test"]),
  NEXT_PUBLIC_API_URL: z.string().url(),
});

const env = envSchema.parse(process.env);

export default {
  // ... rest of config
};

Connection Pooling

Never create a new database connection per request. Connection pools manage reuse efficiently:

// src/lib/db.ts
import { Pool } from "pg";

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20, // Max connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

pool.on("error", (error) => {
  console.error("Unexpected error on idle client", error);
  process.exit(-1);
});

export async function query(text: string, params?: any[]) {
  const start = Date.now();
  try {
    const result = await pool.query(text, params);
    const duration = Date.now() - start;
    
    if (duration > 1000) {
      console.warn(
        `Slow query (${duration}ms): ${text}`
      );
    }
    
    return result.rows;
  } catch (error) {
    console.error("Database error:", error);
    throw error;
  }
}

Graceful Shutdown

When SIGTERM arrives, finish in-flight requests before exiting:

// main.ts
process.on("SIGTERM", async () => {
  console.log("SIGTERM received. Shutting down gracefully...");
  
  // Stop accepting new requests
  server.close(() => {
    console.log("HTTP server closed");
  });

  // Close database connections
  await pool.end();
  console.log("Database pool closed");

  // Close any other resources
  await redis.quit();
  console.log("Redis connection closed");

  // Exit
  process.exit(0);
});

Monitoring in Production

You cannot fix what you cannot see. Implement monitoring from day one:

// lib/monitor.ts
import * as Sentry from "@sentry/nextjs";

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 0.1, // 10% of transactions
  integrations: [
    new Sentry.Integrations.Http({ tracing: true }),
  ],
});

export function captureException(error: unknown) {
  Sentry.captureException(error);
}

// In your API routes
export async function GET(request: NextRequest) {
  try {
    const data = await fetchData();
    return NextResponse.json(data);
  } catch (error) {
    captureException(error);
    return NextResponse.json(
      { error: "Internal server error" },
      { status: 500 }
    );
  }
}

Why This Topic Matters in Production

Operational quality is decided before launch. Teams that delay observability, rollback strategy, and deployment discipline eventually spend release velocity on avoidable incidents.

DevOps maturity is the ability to ship quickly with bounded risk. Teams that treat operations as post-launch work eventually spend delivery time on avoidable incidents, noisy alerts, and unclear release decisions.

Production outcomes are largely determined by release discipline: startup validation, rollout strategy, dependency-aware health checks, and meaningful observability. Without those controls, deployment velocity creates instability rather than business value.

Core Concepts

Release safety is built from deterministic pipelines, not manual heroics.

Health checks must represent dependency readiness, not process liveness alone.

Observability must be actionable: clear ownership, stable naming, and runbook links.

Rollback is a design-time decision, not an emergency-time debate.

  • Treat deployment as a repeatable system, not a sequence of manual steps.
  • Validate configuration at startup so failure happens early and visibly.
  • Collect logs, metrics, and traces with consistent naming and ownership.
  • Define health checks that represent dependency readiness, not process existence.

Real-World Mistakes

Deploying without explicit error-budget based release gates.

Alerting on raw exceptions instead of user-impacting signals.

Allowing environment drift between local, CI, staging, and production.

Treating incident response as tribal knowledge rather than operational process.

  • Shipping with no rollback conditions or release gates.
  • Alerting on noise rather than user-impacting SLO conditions.
  • Using mutable runtime assumptions that differ across environments.
  • Relying on ad hoc incident handling with no runbooks.

Adopt progressive delivery with canary analysis and pre-defined rollback triggers.

Fail fast on invalid runtime configuration before accepting traffic.

Use immutable artifacts with environment-specific configuration boundaries.

Review post-incident actions into deployment and monitoring policy.

  • Use pre-deploy checklists with automation for schema, env, and service readiness.
  • Adopt immutable builds and environment-specific runtime configuration.
  • Use request correlation IDs across logs and traces for triage speed.
  • Implement canary rollout plus fast rollback paths for high-risk changes.

Implementation Checklist

  • Define release and rollback criteria before every production change.
  • Instrument golden signals with owner-mapped alerts.
  • Verify dependency-aware health checks for critical paths.
  • Run periodic failure drills for rollback and alert handling.

Architecture Notes

Release pipelines should be treated as production systems with their own reliability posture and observability requirements.

Deployment maturity is reflected by decision latency during incidents: teams with predefined thresholds recover materially faster.

Operational consistency improves when CI/CD and runbooks use the same service ownership model.

Applied Example

Release Gate + Rollback Trigger

type ReleaseSignals = {
  errorRate: number;
  p95LatencyMs: number;
  queueBacklog: number;
};

export function shouldRollback(signals: ReleaseSignals): boolean {
  const breaches = [
    signals.errorRate > 0.03,
    signals.p95LatencyMs > 1200,
    signals.queueBacklog > 5000,
  ];

  return breaches.filter(Boolean).length >= 2;
}

Trade-offs

More release controls increase pipeline complexity but prevent expensive outages.

Deeper telemetry increases cost while reducing diagnosis time during incidents.

Strict preflight checks can block releases early, which is cheaper than partial runtime failures.

  • More deployment controls increase process overhead but reduce outage frequency.
  • Tighter alerting thresholds can increase pager volume if not tuned to business impact.
  • High observability depth has tooling cost but pays back during every incident.

Production Perspective

Reliability improves when deploys are gated by measurable dependency and user-flow health.

Security improves with centralized secret management and startup validation.

Performance regressions are easier to isolate when releases carry baseline markers.

Maintainability improves when operational ownership is explicit and audited.

  • Reliability improves when releases are gated by measurable health conditions.
  • Security improves when secrets and config handling are centralized and validated.
  • Performance regressions are easier to catch with release-time baseline comparisons.
  • Maintainability improves when incident learnings feed into deployment policy updates.

Final Takeaway

DevOps maturity is the ability to change systems quickly without sacrificing confidence, auditability, or recovery speed.

DevOps is not tooling breadth. It is disciplined change management under uncertainty.

A fast team is one that can release and recover predictably, repeatedly, and safely.

Key Takeaways

  • Production requires explicit monitoring and alerting
  • Connection pooling is non-negotiable at scale
  • Graceful shutdown prevents data corruption
  • Slow query logging catches performance regressions early
  • Environment validation catches configuration errors before users see them

Future Improvements

  • Implement load testing to find bottlenecks
  • Create deployment checklist automation
  • Set up performance monitoring dashboards
  • Document runbooks for common incidents
← Back to all articles