Builds machine learning models and unsupervised embeddings for genomic interval data and single-cell chromatin accessibility datasets.
Geniml is a powerful toolkit for computational biologists and data scientists working with genomic interval data (BED files). It provides specialized methods for learning high-dimensional embeddings of genomic regions, single cells, and metadata, enabling complex tasks like similarity searches, clustering, and cross-modal queries. Whether you are building consensus peak universes, training Region2Vec models for bulk data, or analyzing single-cell ATAC-seq with scEmbed, Geniml offers the statistical rigor and machine learning architecture necessary to transform raw genomic coordinates into actionable biological features.
Key Features
01Specialized single-cell ATAC-seq cell-level embeddings and scanpy integration
02Comprehensive utility suite for BED file randomization, caching, and evaluation
03Joint region and metadata embedding for cross-modal similarity search with BEDspace
04Statistical consensus peak (universe) building using CC, ML, and HMM methods
051 GitHub stars
06Unsupervised learning of genomic region embeddings using Region2Vec
Use Cases
01Cell-type annotation and clustering for single-cell chromatin accessibility analysis
02Building searchable genomic databases that link experimental metadata to specific region sets
03Dimensionality reduction and feature engineering for large-scale BED file collections