Skillia
Back to Projects

Torah Study AI

in progress

Production RAG pipeline on 3.5M sacred texts. Hybrid search, Cohere reranking, strict anti-hallucination guardrails. Built with FastAPI, Weaviate, and Gemini.

PythonFastAPIWeaviateGemini 2.5 FlashCohere RerankNext.jsshadcn/uiDocker
Back to Instructions

Step 7: Docker - How I Build

Containerizing FastAPI and Next.js with Docker Compose, layer caching, and .dockerignore best practices.

5 min read

The Goal

Make the entire app runnable with a single command: docker compose up. Anyone who clones the repo can start the full stack without installing Python, Node, or any dependencies locally.

FastAPI Dockerfile

FROM python:3.14-slim

WORKDIR /app

# Copy requirements first for layer caching
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy source code
COPY . .

EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Next.js Dockerfile

FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci

FROM node:22-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build

FROM node:22-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production

# Copy only what is needed to run
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public

EXPOSE 3000

CMD ["node", "server.js"]

The Next.js Dockerfile uses a multi-stage build. Three stages:

  1. deps: Install node_modules. This layer is cached unless package.json changes.
  2. builder: Build the Next.js app. Source code changes invalidate this layer.
  3. runner: The final image. Only contains the built output, not source code or dev dependencies.

Result: the production image is ~150MB instead of ~1.2GB.

Docker Compose

services:
  api:
    build: ./api
    ports:
      - "8000:8000"
    environment:
      - GOOGLE_API_KEY=${GOOGLE_API_KEY}
      - JWT_SECRET=${JWT_SECRET}
      - DATABASE_PATH=/data/torah_study.db
    volumes:
      - db_data:/data
    restart: unless-stopped

  frontend:
    build: ./frontend
    ports:
      - "3000:3000"
    environment:
      - NEXT_PUBLIC_API_URL=http://api:8000
    depends_on:
      - api
    restart: unless-stopped

volumes:
  db_data:

Key decisions:

  • No Weaviate in compose. Weaviate is already running on Elestio as a managed service. Adding it to compose would mean managing data persistence, backups, and memory allocation locally. Not worth it for development.
  • Named volume for SQLite. The db_data volume ensures the database survives container restarts. Without it, every docker compose down would delete all conversations.
  • Environment variables from .env. Docker Compose reads from .env file automatically. Secrets never go in the Dockerfile or docker-compose.yml.

.dockerignore Importance

This file is more important than people think. Without it, Docker copies everything into the build context, including:

# .dockerignore for the API
__pycache__/
*.pyc
.venv/
.env
.git/
*.db
tests/
.pytest_cache/

# .dockerignore for the frontend
node_modules/
.next/
.env.local
.git/

Why it matters:

  • Build speed. Without .dockerignore, COPY . . sends the entire 500MB node_modules to the Docker daemon before the build even starts. With it, only source files are sent.
  • Security. Without .dockerignore, your .env file with API keys ends up in the image. Anyone who pulls the image can extract your secrets.
  • Cache invalidation. If node_modules is in the build context, any npm install locally changes the context checksum and invalidates the Docker cache for ALL subsequent layers.

Layer Caching Strategy

The order of instructions in a Dockerfile matters enormously for build speed.

# WRONG: Copy everything first, then install
COPY . .
RUN pip install -r requirements.txt
# Problem: ANY source file change re-installs all dependencies

# RIGHT: Copy requirements first, then source
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
# Benefit: Dependencies are re-installed ONLY when requirements.txt changes

The principle: put things that change rarely at the top, things that change often at the bottom. Dependencies change monthly. Source code changes hourly. If you put source code before dependency installation, you rebuild everything on every commit.

Same pattern for Next.js:

# Copy package files first
COPY package.json package-lock.json ./
RUN npm ci

# Then copy source (changes more often)
COPY . .
RUN npm run build

The .env File

# .env (not committed to git)
GOOGLE_API_KEY=your-gemini-api-key
JWT_SECRET=a-long-random-string-for-jwt-signing
WEAVIATE_URL=https://your-weaviate.elestio.app
WEAVIATE_API_KEY=your-weaviate-key

The .env file is in .gitignore. A .env.example with placeholder values is committed so anyone cloning the repo knows what variables they need.

Running the Stack

# Start everything
docker compose up --build

# Start in background
docker compose up -d --build

# View logs
docker compose logs -f api

# Stop and remove containers
docker compose down

# Stop and remove containers AND volumes (deletes database)
docker compose down -v

Testing Inside Docker

For CI/CD, I run tests inside the container to ensure the environment matches production:

# Run API tests
docker compose exec api pytest -v

# Run with coverage
docker compose exec api pytest --cov=. --cov-report=term-missing

Lessons Learned

  • .dockerignore is not optional. It is a security and performance requirement. Create it before your first build.
  • Multi-stage builds cut image size by 80%. The final image only needs runtime files, not build tools, source maps, or test fixtures.
  • Layer order determines build speed. Dependencies before source code. Always.
  • Named volumes for persistent data. Without them, docker compose down destroys your database.
  • Managed services stay outside compose. Weaviate, PostgreSQL in production, Redis in production, all live on Elestio. Compose is for the application layer only.

What is Next

Step 8 (deploy to Elestio) was skipped for now. The RAG pipeline needs to be complete before deploying. Step 9 loads the Sefaria datasets.