LLM Configuration

The examples in this directory support multiple LLM backends: OpenAI, FIRST (HPC inference), Ollama (local), or mock mode (no setup required).

Supported Modes

Mode	Environment Variable	Description
OpenAI	`OPENAI_API_KEY`	Uses OpenAI (gpt-4o-mini by default)
FIRST	`FIRST_API_KEY`	Uses FIRST HPC inference service
Ollama	`OLLAMA_MODEL`	Uses Ollama for local LLM inference
Mock	(none)	Demonstrates patterns with hardcoded responses

Precedence when multiple variables are set: OpenAI > FIRST > Ollama > Mock.

Configuration

OpenAI

export OPENAI_API_KEY=<your_key>
python main.py

FIRST (HPC Inference)

export FIRST_API_KEY=<your_token>
export FIRST_API_BASE=https://your-first-endpoint/v1
export FIRST_MODEL=meta-llama/Meta-Llama-3.1-70B-Instruct
python main.py

Ollama (Local LLM)

Ollama runs LLMs locally on your machine.

# Install Ollama and pull a model
ollama pull llama3.2

# Run with Ollama
export OLLAMA_MODEL=llama3.2
python main.py

Optional: Set OLLAMA_HOST if Ollama is running on a different host (default: http://localhost:11434).

Mock Mode (No Setup Required)

python main.py

Mock mode runs without any API key, showing realistic example outputs to demonstrate the patterns.

Mode Detection Output

When you run an example, it prints which mode was selected and why:

============================================================
LLM Mode: OpenAI (gpt-4o-mini)
  Reason: OPENAI_API_KEY found in environment
============================================================

Or in mock mode:

============================================================
LLM Mode: Mock
  Reason: No API key or OLLAMA_MODEL found; using hardcoded responses
============================================================

Using Argonne’s FIRST Service

Argonne National Laboratory provides access to FIRST through the ALCF (Argonne Leadership Computing Facility).

Getting Access

Get an ALCF account if you don’t have one
Obtain an API token following the instructions at: docs.alcf.anl.gov/services/inference-endpoints/#api-access

Argonne Configuration

export FIRST_API_KEY=<your_token>
export FIRST_API_BASE=https://inference-api.alcf.anl.gov/resource_server/metis/api/v1
export FIRST_MODEL=gpt-oss-120b
python main.py

Available Models at Argonne

Check the ALCF documentation for the current list of available models. Common options include:

gpt-oss-120b - Large general-purpose model
meta-llama/Meta-Llama-3.1-70B-Instruct - Llama 3.1 70B

Model Recommendations for Tool Calling

Not all models handle tool calling equally well. Based on testing:

Recommended Models

Use Case	Recommended Models
Simple tools (1-2 tools, clear inputs)	Any model, including `llama3.2` (3B)
Multiple tools (3+ tools)	`llama3.2:70b`, `mistral`, `gpt-4o-mini`
Complex workflows (multi-step, conditional)	`gpt-4o`, `gpt-4o-mini`, `llama3.1:70b`

Known Limitations with Smaller Models

When using smaller local models like llama3.2 (3B parameters), you may observe:

Code generation instead of tool calls: Model outputs Python code describing what it would do, rather than invoking the tool
Parameter hallucination: Model invents parameter values instead of using provided options
Incomplete tool sequences: Model describes remaining steps instead of executing them

Example of correct tool call:

Agent calls: calculate({'expression': '347 * 892'})
Tool result: 309524

Example of problematic behavior (smaller models):

Agent: I'll calculate this using the following code:
```python
result = 347 * 892
print(result)  # 309524

```

Recommendations

Start with mock mode to understand the expected flow
Use OpenAI or larger models for production or complex examples
Ollama with small models works well for:
- Simple calculator-style tools
- Single-tool scenarios
- Learning and experimentation
Increase model size if you see code generation instead of tool calls