CharacterizeChemicals

An LLM-Planned, Tool-Executing Molecular Property Agent

This project demonstrates a full LLM-planned computational chemistry agent built on the Academy agent framework.
Given one or more SMILES strings, the agent:

  1. Uses a local LLM (default: microsoft/Phi-3.5-mini-instruct) to plan a multi-step workflow.
  2. Executes real computational tools (RDKit, xTB, xTB GBSA) according to that plan.
  3. Parses quantum-chemical outputs (energies, dipole, HOMOโ€“LUMO gap, solvation).
  4. Returns a structured properties dictionary per molecule.
  5. Logs the entire plan and every step executed for transparency and debugging.

Main entry point:

python run_chem_agent.py

Source code: View on GitHub


โœจ Features

๐Ÿ”ฌ RDKit descriptors

โš›๏ธ xTB (GFN2-xTB) quantum properties

Extracted from real xTB output:

๐ŸŒŠ xTB GBSA solvation

๐Ÿ“‹ Transparency and debugging


๐Ÿ”ง Installation

The easiest way to install RDKit and xTB is with conda.

1. Create conda env

conda create -n chem-agent python=3.11 rdkit xtb -c conda-forge
conda activate chem-agent

2. Install Python dependencies

pip install torch transformers accelerate academy-py huggingface_hub

3. Verify xTB installation

xtb --version

If this fails, your agent will not be able to run xTB-dependent steps.


๐Ÿš€ Usage

Run with default settings:

python run_chem_agent.py

Command-line arguments

python run_chem_agent.py     [--model MODEL]     [--smiles SMILES ...]     [--props PROPERTIES ...]     [--accuracy-profile {fast,balanced,high}]

๐Ÿ”ฃ Arguments

--model, -m

Hugging Face model ID for the planner LLM. Default:

microsoft/Phi-3.5-mini-instruct

--smiles, -s

One or more SMILES strings. Default:

CCO  c1ccccc1  CC(=O)O

--props, -p

Desired properties (used as hints for the planner). Default:

logP dipole_moment solvation_free_energy

--accuracy-profile, -a

Planner behavior hint:
fast | balanced | high
Default: balanced


๐Ÿงช Examples

Default run:

python run_chem_agent.py

Single molecule:

python run_chem_agent.py --smiles "CCO"

Custom planner model:

python run_chem_agent.py --model Qwen/Qwen2.5-7B-Instruct

Multiple molecules:

python run_chem_agent.py -s CCO "c1ccccc1" "CC(=O)O"

๐Ÿงฑ Architecture Overview

1. Planner LLM โ†’ JSON Plan

The LLM receives:

It outputs a structured plan like:

{
  "steps": [
    {
      "id": "s1_rdkit",
      "tool": "rdkit_descriptors",
      "inputs": { "smiles": "CCO", "descriptor_set": ["logP", "TPSA"] },
      "depends_on": []
    },
    {
      "id": "s2_xtb",
      "tool": "xtb_opt",
      "inputs": { "smiles": "CCO", "level": "GFN2-xTB" },
      "depends_on": ["s1_rdkit"]
    },
    {
      "id": "s3_solv",
      "tool": "solvation_energy_from_xtb",
      "inputs": {
        "geometry_path": "step:s2_xtb.optimized_geometry",
        "solvent": "water"
      },
      "depends_on": ["s2_xtb"]
    }
  ]
}

The agent prints the raw plan and parsed version.


2. Execution DAG

The executor:


3. Tool Layer

RDKit

xTB optimization

xtb structure.xyz --opt --gfn 2

Parsers extract:

xTB GBSA solvation

xtb structure.xyz --gfn 2 --gbsa water

Parses:


๐Ÿ“Š Property Aggregation

Final results include:

Returned as:

{
  "status": "success",
  "molecule_smiles": "...",
  "properties": {...},
  "plan_used": {...},
  "provenance": {...}
}

๐Ÿž Logging & Debugging

You will see:

This makes it easy to identify:


โš ๏ธ Caveats


๐Ÿ“š Attribution

This project uses: