Simplifies molecular cheminformatics and drug discovery workflows using a Pythonic interface for RDKit.
Datamol provides a lightweight, high-level abstraction layer over RDKit, making complex molecular operations accessible and efficient for AI-assisted coding. It streamlines essential tasks such as SMILES parsing, structure standardization, 3D conformer generation, and batch descriptor computation with built-in parallelization and cloud storage support. By returning native RDKit molecular objects, Datamol ensures full compatibility with the broader cheminformatics ecosystem while offering sensible defaults for standard drug discovery pipelines and chemical data analysis.
Key Features
01Efficient batch computation of molecular descriptors and fingerprints
02Built-in parallelization for high-throughput chemical data processing
03Advanced 3D conformer generation and RMSD clustering
04Simplified molecular format conversion and structure standardization
058 GitHub stars
06Native cloud storage support for S3, GCS, and HTTP molecular files
Use Cases
01Visualizing molecular scaffolds and analyzing structural diversity in chemical datasets
02Standardizing and cleaning large chemical libraries for machine learning models
03Building high-throughput screening pipelines for drug discovery