This project demonstrates a full LLM-planned computational chemistry agent built on the Academy agent framework.
Given one or more SMILES strings, the agent:
microsoft/Phi-3.5-mini-instruct) to plan a multi-step workflow.Main entry point:
python run_chem_agent.py
Source code: View on GitHub
logPTPSAExtracted from real xTB output:
E_scc_hartree, E_scc_eVE_total_hartree, E_total_eVHOMO_LUMO_gap_eVdipole_moment_D (from the โmolecular dipoleโ full: line)solvation_free_energy_kcal_per_molGsolv term in the SUMMARY block.The easiest way to install RDKit and xTB is with conda.
conda create -n chem-agent python=3.11 rdkit xtb -c conda-forge
conda activate chem-agent
pip install torch transformers accelerate academy-py huggingface_hub
xtb --version
If this fails, your agent will not be able to run xTB-dependent steps.
Run with default settings:
python run_chem_agent.py
python run_chem_agent.py [--model MODEL] [--smiles SMILES ...] [--props PROPERTIES ...] [--accuracy-profile {fast,balanced,high}]
--model, -mHugging Face model ID for the planner LLM. Default:
microsoft/Phi-3.5-mini-instruct
--smiles, -sOne or more SMILES strings. Default:
CCO c1ccccc1 CC(=O)O
--props, -pDesired properties (used as hints for the planner). Default:
logP dipole_moment solvation_free_energy
--accuracy-profile, -aPlanner behavior hint:
fast | balanced | high
Default: balanced
Default run:
python run_chem_agent.py
Single molecule:
python run_chem_agent.py --smiles "CCO"
Custom planner model:
python run_chem_agent.py --model Qwen/Qwen2.5-7B-Instruct
Multiple molecules:
python run_chem_agent.py -s CCO "c1ccccc1" "CC(=O)O"
The LLM receives:
It outputs a structured plan like:
{
"steps": [
{
"id": "s1_rdkit",
"tool": "rdkit_descriptors",
"inputs": { "smiles": "CCO", "descriptor_set": ["logP", "TPSA"] },
"depends_on": []
},
{
"id": "s2_xtb",
"tool": "xtb_opt",
"inputs": { "smiles": "CCO", "level": "GFN2-xTB" },
"depends_on": ["s1_rdkit"]
},
{
"id": "s3_solv",
"tool": "solvation_energy_from_xtb",
"inputs": {
"geometry_path": "step:s2_xtb.optimized_geometry",
"solvent": "water"
},
"depends_on": ["s2_xtb"]
}
]
}
The agent prints the raw plan and parsed version.
The executor:
"step:s2_xtb.optimized_geometry"xtb structure.xyz --opt --gfn 2
Parsers extract:
xtb structure.xyz --gfn 2 --gbsa water
Parses:
Final results include:
logPdipole_moment_Dsolvation_free_energy_kcal_per_molE_total_hartree, E_total_eVE_scc_hartree, E_scc_eVHOMO_LUMO_gap_eVReturned as:
{
"status": "success",
"molecule_smiles": "...",
"properties": {...},
"plan_used": {...},
"provenance": {...}
}
You will see:
This makes it easy to identify:
max_new_tokens or use a smaller model.This project uses: