Builds machine learning models and unsupervised embeddings for genomic interval data and single-cell chromatin accessibility datasets.
Geniml is a powerful toolkit for computational biologists and data scientists working with genomic interval data (BED files). It provides specialized methods for learning high-dimensional embeddings of genomic regions, single cells, and metadata, enabling complex tasks like similarity searches, clustering, and cross-modal queries. Whether you are building consensus peak universes, training Region2Vec models for bulk data, or analyzing single-cell ATAC-seq with scEmbed, Geniml offers the statistical rigor and machine learning architecture necessary to transform raw genomic coordinates into actionable biological features.
Características Principales
01Specialized single-cell ATAC-seq cell-level embeddings and scanpy integration
02Comprehensive utility suite for BED file randomization, caching, and evaluation
03Joint region and metadata embedding for cross-modal similarity search with BEDspace
04Statistical consensus peak (universe) building using CC, ML, and HMM methods
051 GitHub stars
06Unsupervised learning of genomic region embeddings using Region2Vec
Casos de Uso
01Cell-type annotation and clustering for single-cell chromatin accessibility analysis
02Building searchable genomic databases that link experimental metadata to specific region sets
03Dimensionality reduction and feature engineering for large-scale BED file collections