RAG Agent Example

An agent that retrieves and answers questions from scientific documents using Retrieval-Augmented Generation (RAG).

Code: github.com/agents4science/agents4science.github.io/tree/main/Capabilities/local-agents/AgentsRAG

What It Does

  1. Loads scientific documents and creates embeddings
  2. User asks a question about the documents
  3. Agent retrieves relevant passages
  4. Agent synthesizes an answer from the retrieved context

The Code

@tool
def search_documents(query: str) -> str:
    """Search the document collection for relevant passages."""
    docs = vectorstore.similarity_search(query, k=3)
    return "\n\n".join([d.page_content for d in docs])

llm = ChatOpenAI(model="gpt-4o-mini")
agent = create_react_agent(llm, [search_documents])
agent.invoke({"messages": [HumanMessage(content="What catalysts work for CO2 conversion?")]})

Running the Example

cd Capabilities/local-agents/AgentsRAG
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python main.py

Custom question:

python main.py --question "What are the challenges with room temperature catalysis?"

LLM Configuration

Supports OpenAI, FIRST (HPC inference), Ollama (local), or mock mode.

See LLM Configuration for details on configuring LLM backends, including Argonne’s FIRST service.

Sample Data

The data/ directory contains sample documents about CO2 conversion catalysts:

data/
├── challenges.txt           # Conversion challenges and barriers
├── copper_catalysts.txt     # Copper-based catalyst research
├── emerging_catalysts.txt   # SACs and MOFs
└── noble_metal_catalysts.txt # Gold and silver catalysts

To use your own documents, add .txt files to the data/ directory. The agent loads all .txt files at startup.

Key Points

Requirements