Step 1: First LLM Call - How I Build
Building the first Gemini API call with BDD/TDD, input validation, and a chavruta system prompt.
5 min readThe Goal
Get a Python script that sends a Torah question to Gemini and returns a meaningful answer. Sounds simple, but this is where you establish the patterns that carry through the entire project.
BDD First: Define the Behavior
Before writing any code, I wrote BDD scenarios describing what the system should do. This is the Azerhad methodology: define expected behavior in plain English, then translate to tests.
Feature: Torah Question Answering
Scenario: User asks a basic Torah question
Given the API key is configured
When I ask "What is the Shema?"
Then I receive a non-empty answer about the Shema
Scenario: Missing API key
Given the API key is NOT configured
When I ask any question
Then I get a clear error message about the missing key
Scenario: Empty question
Given the API key is configured
When I ask ""
Then I get a validation error
The ask_torah() Function
The core function is dead simple. One function, one responsibility: take a question, return an answer.
import google.generativeai as genai
import os
SYSTEM_PROMPT = """You are a chavruta (Torah study partner).
You help beginners understand Jewish texts.
Always cite your sources with book and chapter.
If you are unsure, say so - never invent a source.
Add a disclaimer: your answers are for study only, not halakhic rulings."""
def ask_torah(question: str) -> str:
"""Send a Torah question to Gemini and return the answer."""
api_key = os.getenv("GOOGLE_API_KEY")
if not api_key:
raise ValueError("GOOGLE_API_KEY environment variable is not set")
if not question or not question.strip():
raise ValueError("Question cannot be empty")
genai.configure(api_key=api_key)
model = genai.GenerativeModel(
model_name="gemini-2.0-flash",
system_instruction=SYSTEM_PROMPT,
)
response = model.generate_content(question)
return response.text
Key decisions here:
- Input validation first. Check API key, check empty string. Fail fast with clear messages.
- System prompt as a constant. Not buried inside the function, not in a config file (yet). Visible, testable, changeable.
- google-genai SDK, not LangChain. For Step 1, I want zero abstractions between me and the API. I need to understand exactly what goes over the wire.
The 6 Tests
TDD means write tests first, watch them fail (RED), then implement until they pass (GREEN).
import pytest
from unittest.mock import patch, MagicMock
from torah_chat import ask_torah
def test_missing_api_key():
"""Should raise ValueError when GOOGLE_API_KEY is not set."""
with patch.dict("os.environ", {}, clear=True):
with pytest.raises(ValueError, match="GOOGLE_API_KEY"):
ask_torah("What is Shabbat?")
def test_empty_question():
"""Should raise ValueError for empty question."""
with patch.dict("os.environ", {"GOOGLE_API_KEY": "test-key"}):
with pytest.raises(ValueError, match="empty"):
ask_torah("")
def test_whitespace_question():
"""Should raise ValueError for whitespace-only question."""
with patch.dict("os.environ", {"GOOGLE_API_KEY": "test-key"}):
with pytest.raises(ValueError, match="empty"):
ask_torah(" ")
@patch("torah_chat.genai")
def test_basic_question(mock_genai):
"""Should return a non-empty response for a valid question."""
mock_response = MagicMock()
mock_response.text = "The Shema is the central prayer..."
mock_genai.GenerativeModel.return_value.generate_content.return_value = mock_response
with patch.dict("os.environ", {"GOOGLE_API_KEY": "test-key"}):
result = ask_torah("What is the Shema?")
assert len(result) > 0
assert "Shema" in result
@patch("torah_chat.genai")
def test_system_prompt_passed(mock_genai):
"""Should configure the model with chavruta system prompt."""
mock_response = MagicMock()
mock_response.text = "Answer"
mock_genai.GenerativeModel.return_value.generate_content.return_value = mock_response
with patch.dict("os.environ", {"GOOGLE_API_KEY": "test-key"}):
ask_torah("Test question")
call_kwargs = mock_genai.GenerativeModel.call_args
assert "chavruta" in call_kwargs.kwargs["system_instruction"].lower()
@patch("torah_chat.genai")
def test_french_question(mock_genai):
"""Should handle French questions without error."""
mock_response = MagicMock()
mock_response.text = "Le Shabbat est le jour de repos..."
mock_genai.GenerativeModel.return_value.generate_content.return_value = mock_response
with patch.dict("os.environ", {"GOOGLE_API_KEY": "test-key"}):
result = ask_torah("Qu'est-ce que le Shabbat?")
assert len(result) > 0
Why google-genai First, LangChain Later
I see many tutorials that start with LangChain from day one. That is backwards. LangChain is an abstraction over LLM APIs. If you do not understand what the underlying API does, you cannot debug LangChain when it breaks.
My approach:
- Step 1-3: Raw google-genai SDK. Learn the API surface, understand tokens, system prompts, streaming.
- Step 12+: Refactor to LangChain LCEL when I need chains (search -> rerank -> generate). At that point, I know exactly what each link in the chain should do.
This is the same principle as learning React before Next.js, or SQL before an ORM. Abstractions are powerful only when you understand what they abstract.
Update: I completed this refactoring at Step 12. The pipeline now uses LangChain LCEL chains (prompt | llm | parser), with GoogleGenerativeAIEmbeddings for embeddings and ChatGoogleGenerativeAI for generation. The hybrid search and Cohere reranking stayed as raw SDK calls because LangChain does not fully support hybrid search or score-based threshold gating. Knowing the raw API first made it easy to identify what to wrap in LangChain and what to keep manual.
Running the Tests
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install google-generativeai pytest
# Run tests (all should pass)
pytest test_torah_chat.py -v
Output:
test_missing_api_key PASSED
test_empty_question PASSED
test_whitespace_question PASSED
test_basic_question PASSED
test_system_prompt_passed PASSED
test_french_question PASSED
6 passed in 0.12s
Lessons Learned
- Start with validation, not the happy path. The first two tests are error cases. This forces you to handle errors before you even think about the main logic.
- Mock external APIs in tests. Never call the real Gemini API in unit tests. It costs money, it is slow, and it makes tests flaky.
- One file, one function, one responsibility. The entire Step 1 is a single
ask_torah()function in a single file. No classes, no inheritance, no framework. Just a function that does one thing. - System prompts are product decisions. The word "chavruta" is not random. It positions the AI as a study partner, not an authority. This shapes every answer the model gives.
What is Next
Step 2 adds a Next.js frontend with SSE streaming. The ask_torah() function becomes the core of a backend endpoint that streams tokens word by word to the browser.