on this page

Your container works. It builds, it runs, you can hit localhost:3000 and the app loads. So you tag it, push it to a registry, point your host at it, and call it production. It even works for a while. Then it falls over at 2am, doesn’t come back on its own, the logs are nowhere to be found, and the image you shipped turns out to contain your database password and 900MB of stuff nobody needed.

None of that is a Docker bug. It’s the gap between the container that’s fine on your laptop and the container you actually ship. If you’re still fuzzy on what an image or a container even is, go read Docker without the buzzwords first, then come back. This guide assumes you know docker build and docker run and want to know why the version of those you’ve been using is not the version that should be touching real users.

Let’s walk through what changes between dev and prod, one piece at a time, with the reason for each.

Why your dev image is not your prod image

A dev image is built for you, sitting at your keyboard, wanting fast rebuilds and a working terminal. A production image is built for a server you’ll never log into, wanting to be small, boring, and hard to break into. Those are different goals, and chasing both with one image is how you end up shipping a 900MB container that runs a file-watching dev server as the root user.

Here’s the same Node and Express app, the naive way and the production way, lined up so you can see what actually moves.

AxisNaive dev imageProduction image
Basenode:20 (full Debian, ~1GB)node:20-alpine (~50MB base)
Final size900MB+150MB or less
Contentsall deps, build tools, sourcebuilt output + prod deps only
Userroota non-root user you created
Commandnpm run devnode dist/server.js
Secrets.env copied in or ENV bakedinjected at runtime by the host
Logsto a file inside the containerto stdout and stderr

Every row in that table is a decision, not a default. Docker’s defaults are tuned for “make it run,” not “make it safe to leave running.” Let’s go row by row.

Smaller images, and what you give up

The official node:20 image is full Debian with a complete build toolchain. Handy when you’re poking around, but at runtime your app does not need a C compiler, git, or the headers for libraries it already finished linking against. All of that is weight you upload, store, and pull on every deploy.

Two easy wins:

  • Start from a slim or alpine base. Alpine is a tiny Linux distribution (about 5MB), so node:20-alpine is a fraction of the full image.
  • Don’t ship your dev dependencies. Things like your test runner, ESLint, and TypeScript matter while building and mean nothing once the app is built and running.

The tradeoff with Alpine is real, so I’ll say it plainly. Alpine uses musl instead of glibc (two different implementations of the standard C library that programs link against). Most Node packages don’t care, but a few native modules ship prebuilt binaries for glibc only and will either fall back to a slow compile or break outright. If you hit a weird native-module error on Alpine, that’s usually why, and node:20-slim (Debian, still much smaller than the full image, still glibc) is the safe middle ground. Start with Alpine, drop to slim if something native fights you.

Multi-stage builds: build big, ship small

Here’s the tension. To build a Node app you need everything: all dependencies, the source, the compiler. To run it you need almost none of that, just the built output and your production dependencies. A single-stage Dockerfile can’t have it both ways, so it keeps all the build junk in the final image forever.

A multi-stage build fixes this by using more than one FROM in the same Dockerfile. The first stage (call it the builder) installs everything and produces your build. The second stage starts fresh from a clean base and copies only the finished artifacts out of the builder. The builder, with all its bulk, gets thrown away. Only the lean final stage ships.

Here’s a production-minded Dockerfile for a Node and Express app that compiles TypeScript to a dist/ folder.

# ---- Stage 1: builder ----
# Full toolchain lives here. This whole stage gets discarded.
FROM node:20-alpine AS builder
WORKDIR /app

# Copy lockfiles first so this layer caches when only source changes.
COPY package*.json ./

# Install ALL dependencies, including devDependencies, to build the app.
RUN npm ci

# Copy source and build (e.g. tsc compiling src/ into dist/).
COPY . .
RUN npm run build

# Drop devDependencies now that the build is done, leaving only
# what runtime actually needs in node_modules.
RUN npm prune --omit=dev

# ---- Stage 2: runtime ----
# A clean, separate image. None of the builder's clutter comes along.
FROM node:20-alpine AS runtime
WORKDIR /app
ENV NODE_ENV=production

# Create a non-root user and group to run the app (more on this below).
RUN addgroup -S app && adduser -S app -G app

# Copy ONLY the built output and pruned production deps from the builder.
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./package.json

# Stop being root before the app ever runs.
USER app

# Tell the orchestrator the app reports its own health (see below).
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s \
  CMD node -e "fetch('http://localhost:3000/healthz').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))"

EXPOSE 3000

# Run the built app. Not the dev server. Never the dev server.
CMD ["node", "dist/server.js"]

The key line is COPY --from=builder. That’s the seam between the two stages. Everything the builder downloaded and compiled stays in the builder and never lands in the image you ship. Your final image is the runtime base plus your dist/ folder plus production node_modules, and nothing else.

A quick note on caching, because it carries straight over from the basics guide: copy package*.json and run npm ci before you copy your source. Your dependencies change far less often than your code, so Docker can reuse that cached install layer on most rebuilds instead of redownloading everything every time you fix a typo.

Stop running as root

By default, the process inside your container runs as root, the superuser that can do anything on the system. That feels harmless because it’s “just a container,” but a container shares the host’s kernel, and root inside the container is a much shorter hop to trouble on the host than an unprivileged user would be. If someone finds a hole in your app, you’d rather they land as a user who can barely do anything than as root.

The fix is two lines, both already in the Dockerfile above. Create a user, then switch to it with USER before your CMD.

# Alpine's syntax: -S makes a system user/group with no password or login.
RUN addgroup -S app && adduser -S app -G app
USER app

From that point on, the app runs as app, not root. One thing that bites beginners: do this switch after you’ve finished installing things, because a non-root user can’t write to system directories or run apk add. Build and install as root, then drop privileges right before the app starts. The order in the Dockerfile is the order it happens.

Health checks: “running” is not “working”

Your process can be alive and your app still broken. The Node process is up, but it’s stuck in a loop, or it lost its database connection and every request returns a 500. As far as a naive supervisor is concerned, the process is running, so all is well. It is not well.

A health check is a small command the orchestrator runs on a schedule to ask the app “are you actually okay?” rather than just checking that the process exists. You expose a route like /healthz that returns 200 when the app can really serve traffic, and the platform pings it. If it fails enough times, the platform knows to restart or stop sending traffic.

HEALTHCHECK --interval=30s --timeout=3s --start-period=10s \
  CMD node -e "fetch('http://localhost:3000/healthz').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))"

That checks every 30 seconds, fails a check that takes longer than 3 seconds, and gives the app a 10-second grace period on startup before counting failures. Exit 0 means healthy, anything else means sick. Hosted platforms like Railway

Heads up. You're leaving raindev.fyi

This link heads to railway.app, an external site we don't control. Cool to keep going?

Continue
and Render

Heads up. You're leaving raindev.fyi

This link heads to render.com, an external site we don't control. Cool to keep going?

Continue
usually let you configure a health check path in their dashboard instead, which does the same job from the outside. Either way, give them a real endpoint to hit.

Restart policies: come back after a crash

Things crash. A memory spike, an unhandled exception, the host rebooting for maintenance. On your laptop you just run the container again. In production nobody is watching at 2am, so the platform has to bring it back for you.

In Docker Compose that’s one line:

services:
  app:
    build: .
    restart: unless-stopped
    ports:
      - "3000:3000"

restart: unless-stopped means “if this container dies, start it again, but if I deliberately stop it, stay stopped.” That last part matters, because the cruder restart: always will fight you and relaunch a container you stopped on purpose. Managed hosts do this automatically and you don’t write it yourself, which is one of the reasons people reach for them.

Logs go to stdout, not into the container

Reflex says write logs to a file, app.log or similar. Inside a container that’s close to useless. The container is disposable, so when it’s replaced the file goes with it, and your platform’s logging tools won’t see a file buried in a container’s private filesystem anyway.

The container convention is to write logs to stdout and stderr (standard output and standard error, the two text streams every program gets for free). The platform captures those streams and routes them to wherever you read logs. console.log and console.error in Node already go there, so most of the time the move is to stop redirecting logs into a file and let them flow to the terminal. Then docker logs, Railway’s log view, Render’s log view, and friends all just work.

Secrets: an image is shippable, so a baked-in secret is a leaked secret

This is the one that ends up on the internet, so read it twice. An image is built to be copied, pushed to registries, and pulled by machines you don’t control. Anyone who can pull your image can also unpack its layers and read what’s inside. So a secret baked into the image isn’t being hidden. It’s being handed out to everyone who pulls the image.

That rules out a few tempting shortcuts:

  • Don’t COPY your .env file into the image.
  • Don’t hardcode keys with ENV DATABASE_URL=... in the Dockerfile.
  • Don’t commit secrets to the repo your image builds from.

Anything in a Dockerfile layer can be read back with docker history and a little poking, even if a later instruction appears to delete it. The layer is still there underneath.

Instead, inject secrets at runtime through the platform’s secret store. Railway, Render, and Cloudflare all have a place to set environment variables and secrets that get handed to the container when it starts and are never part of the image. Locally with Compose you point at an env file that lives only on your machine and stays out of git.

services:
  app:
    build: .
    env_file: .env        # local only; add .env to .gitignore
    environment:
      - NODE_ENV=production
  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}   # read from your shell or .env

The mental model: build-time is public, runtime is private. Configuration that’s safe to ship can live in the image. Anything that grants access (database URLs, API keys, tokens) gets handed in when the container starts, by something that isn’t the image.

Scan the image before you trust it

Your image is a small Linux system plus your dependency tree, and both pick up known vulnerabilities over time. A package that was clean when you wrote the Dockerfile can have a reported flaw six months later. You don’t have to audit this by hand.

Two tools point at an image and list known problems by severity:

# Docker's built-in scanner.
docker scout cves my-app:latest

# Trivy, a popular open-source scanner.
trivy image my-app:latest

Read the report top down. Critical and high findings in something you actually use at runtime are worth acting on, usually by bumping the offending package or rebasing onto a newer base image. A pile of low-severity notes in a transitive dependency you never call is good to know about but rarely an emergency. Scanning won’t make the image safe, it just tells you what you’re shipping so “I had no idea” stops being the reason something got popped. Trivy

Heads up. You're leaving raindev.fyi

This link heads to github.com, an external site we don't control. Cool to keep going?

Continue
drops cleanly into a GitHub Actions workflow if you want it to run on every build.

Why npm run dev is not a production command

This deserves its own section because it’s the single most common mistake, and it’s an easy one to make: the command that’s worked for you on day one is right there, so people ship it.

npm run dev starts the dev server, and the dev server is built to make your life pleasant while you write code, not to serve strangers at scale. Concretely:

  • It’s unoptimized. No production build, no minification, no tree-shaking. It serves source, not the lean compiled output.
  • It watches your files for changes and rebuilds on save. In production nothing is changing, so that’s pure wasted CPU and memory sitting there waiting.
  • It’s often single-process and tuned for one developer’s traffic, not many concurrent users.
  • It can leak debug behavior: verbose errors, stack traces, source maps, and dev-only routes that you do not want pointed at the public internet.

Production runs the built app. For a plain Node and Express service that’s node dist/server.js. For something with a framework, it’s the framework’s production start, like npm start pointed at the build output, or node ./dist/server/entry.mjs for an Astro app with a server adapter. The shape is always the same: build once, then run the result. Don’t make production rebuild your app on the fly every time it starts.

Common mistakes

The same handful of slip-ups account for most “it worked in dev” production fires.

MistakeWhat goes wrongThe fix
Shipping the dev imageBloated, runs the dev server, leaks debug behaviorMulti-stage build, run the built app
Running as rootA compromise lands with admin privilegesCreate a user, USER app before CMD
Secrets baked into the imageThe key ships inside the image for anyone to readInject at runtime via the platform’s secret store
npm run dev in prodUnoptimized, file-watching, single-processnode dist/server.js or the framework’s start
No restart policyA 2am crash stays down until you noticerestart: unless-stopped or a managed host
Giant imagesSlow pulls, slow deploys, more to scanAlpine or slim base, drop dev dependencies

Tying it back to shipping something real

Picture the project you’d actually deploy: an Express API talking to PostgreSQL and Redis, maybe an Astro front end. The dev setup from Docker without the buzzwords, one Compose file that brings up the app, the database, and the cache, is still exactly how you should work locally. Nothing here replaces that. Local dev wants convenience, and that guide gives it to you.

Production is the same app with the goals flipped. Build it in two stages so the image is small and clean. Run it as a non-root user. Give it a health check so the platform knows the difference between running and working, and a restart policy so a crash comes back. Let logs flow to stdout. Hand secrets in at runtime and keep them out of the image. Scan before you trust. Run the built app, never the dev server.

Push that image to GitHub, point Railway or Render at it (or build straight from your repo and let them handle the restart policy and secrets for you), and you have a container that’s safe to leave running while you sleep. That’s the whole difference between a container that runs and a container you can ship.