Simplifies molecular featurization by providing access to over 100 pre-trained embeddings and hand-crafted featurizers for machine learning.
Molfeat is a comprehensive Python library designed to bridge the gap between chemical structures and machine learning models. It unifies a vast array of featurizers—ranging from traditional fingerprints like ECFP and MACCS to state-of-the-art deep learning embeddings like ChemBERTa and Graphormer—into a single, scikit-learn compatible interface. Whether you are performing virtual screening, building QSAR models, or analyzing chemical space, this skill provides the necessary tools to convert SMILES strings or RDKit molecules into high-quality numerical representations with parallel processing and built-in caching.
主な機能
01Scikit-learn compatible transformers for seamless ML pipeline integration
02Unified interface for 100+ molecular featurizers and pre-trained embeddings
03Supports diverse representations including fingerprints, 2D descriptors, and GNNs
04High-performance batch processing with automatic parallelization and caching
058 GitHub stars
06State management for saving and loading exact featurizer configurations
ユースケース
01Building QSAR/QSPR models for drug discovery and molecular property prediction
02Visualizing and clustering chemical space for exploratory data analysis
03Conducting large-scale virtual screening and similarity searches in chemical libraries