Step 1: First LLM Call - How I Build

The Goal

Get a Python script that sends a Torah question to Gemini and returns a meaningful answer. Sounds simple, but this is where you establish the patterns that carry through the entire project.

BDD First: Define the Behavior

Before writing any code, I wrote BDD scenarios describing what the system should do. This is the Azerhad methodology: define expected behavior in plain English, then translate to tests.

Feature: Torah Question Answering

  Scenario: User asks a basic Torah question
    Given the API key is configured
    When I ask "What is the Shema?"
    Then I receive a non-empty answer about the Shema

  Scenario: Missing API key
    Given the API key is NOT configured
    When I ask any question
    Then I get a clear error message about the missing key

  Scenario: Empty question
    Given the API key is configured
    When I ask ""
    Then I get a validation error

The ask_torah() Function

The core function is dead simple. One function, one responsibility: take a question, return an answer.

import google.generativeai as genai
import os

SYSTEM_PROMPT = """You are a chavruta (Torah study partner). 
You help beginners understand Jewish texts.
Always cite your sources with book and chapter.
If you are unsure, say so - never invent a source.
Add a disclaimer: your answers are for study only, not halakhic rulings."""


def ask_torah(question: str) -> str:
    """Send a Torah question to Gemini and return the answer."""
    api_key = os.getenv("GOOGLE_API_KEY")
    if not api_key:
        raise ValueError("GOOGLE_API_KEY environment variable is not set")

    if not question or not question.strip():
        raise ValueError("Question cannot be empty")

    genai.configure(api_key=api_key)
    model = genai.GenerativeModel(
        model_name="gemini-2.0-flash",
        system_instruction=SYSTEM_PROMPT,
    )

    response = model.generate_content(question)
    return response.text

Key decisions here:

Input validation first. Check API key, check empty string. Fail fast with clear messages.
System prompt as a constant. Not buried inside the function, not in a config file (yet). Visible, testable, changeable.
google-genai SDK, not LangChain. For Step 1, I want zero abstractions between me and the API. I need to understand exactly what goes over the wire.

The 6 Tests

TDD means write tests first, watch them fail (RED), then implement until they pass (GREEN).

import pytest
from unittest.mock import patch, MagicMock
from torah_chat import ask_torah


def test_missing_api_key():
    """Should raise ValueError when GOOGLE_API_KEY is not set."""
    with patch.dict("os.environ", {}, clear=True):
        with pytest.raises(ValueError, match="GOOGLE_API_KEY"):
            ask_torah("What is Shabbat?")


def test_empty_question():
    """Should raise ValueError for empty question."""
    with patch.dict("os.environ", {"GOOGLE_API_KEY": "test-key"}):
        with pytest.raises(ValueError, match="empty"):
            ask_torah("")


def test_whitespace_question():
    """Should raise ValueError for whitespace-only question."""
    with patch.dict("os.environ", {"GOOGLE_API_KEY": "test-key"}):
        with pytest.raises(ValueError, match="empty"):
            ask_torah("   ")


@patch("torah_chat.genai")
def test_basic_question(mock_genai):
    """Should return a non-empty response for a valid question."""
    mock_response = MagicMock()
    mock_response.text = "The Shema is the central prayer..."
    mock_genai.GenerativeModel.return_value.generate_content.return_value = mock_response

    with patch.dict("os.environ", {"GOOGLE_API_KEY": "test-key"}):
        result = ask_torah("What is the Shema?")
        assert len(result) > 0
        assert "Shema" in result


@patch("torah_chat.genai")
def test_system_prompt_passed(mock_genai):
    """Should configure the model with chavruta system prompt."""
    mock_response = MagicMock()
    mock_response.text = "Answer"
    mock_genai.GenerativeModel.return_value.generate_content.return_value = mock_response

    with patch.dict("os.environ", {"GOOGLE_API_KEY": "test-key"}):
        ask_torah("Test question")

    call_kwargs = mock_genai.GenerativeModel.call_args
    assert "chavruta" in call_kwargs.kwargs["system_instruction"].lower()


@patch("torah_chat.genai")
def test_french_question(mock_genai):
    """Should handle French questions without error."""
    mock_response = MagicMock()
    mock_response.text = "Le Shabbat est le jour de repos..."
    mock_genai.GenerativeModel.return_value.generate_content.return_value = mock_response

    with patch.dict("os.environ", {"GOOGLE_API_KEY": "test-key"}):
        result = ask_torah("Qu'est-ce que le Shabbat?")
        assert len(result) > 0

Why google-genai First, LangChain Later

I see many tutorials that start with LangChain from day one. That is backwards. LangChain is an abstraction over LLM APIs. If you do not understand what the underlying API does, you cannot debug LangChain when it breaks.

My approach:

Step 1-3: Raw google-genai SDK. Learn the API surface, understand tokens, system prompts, streaming.
Step 12+: Refactor to LangChain LCEL when I need chains (search -> rerank -> generate). At that point, I know exactly what each link in the chain should do.

This is the same principle as learning React before Next.js, or SQL before an ORM. Abstractions are powerful only when you understand what they abstract.

Update: I completed this refactoring at Step 12. The pipeline now uses LangChain LCEL chains (prompt | llm | parser), with GoogleGenerativeAIEmbeddings for embeddings and ChatGoogleGenerativeAI for generation. The hybrid search and Cohere reranking stayed as raw SDK calls because LangChain does not fully support hybrid search or score-based threshold gating. Knowing the raw API first made it easy to identify what to wrap in LangChain and what to keep manual.

Running the Tests

# Create virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install google-generativeai pytest

# Run tests (all should pass)
pytest test_torah_chat.py -v

Output:

test_missing_api_key PASSED
test_empty_question PASSED
test_whitespace_question PASSED
test_basic_question PASSED
test_system_prompt_passed PASSED
test_french_question PASSED

6 passed in 0.12s

Lessons Learned

Start with validation, not the happy path. The first two tests are error cases. This forces you to handle errors before you even think about the main logic.
Mock external APIs in tests. Never call the real Gemini API in unit tests. It costs money, it is slow, and it makes tests flaky.
One file, one function, one responsibility. The entire Step 1 is a single ask_torah() function in a single file. No classes, no inheritance, no framework. Just a function that does one thing.
System prompts are product decisions. The word "chavruta" is not random. It positions the AI as a study partner, not an authority. This shapes every answer the model gives.

What is Next

Step 2 adds a Next.js frontend with SSE streaming. The ask_torah() function becomes the core of a backend endpoint that streams tokens word by word to the browser.