7 April 2022 - DBM Guest Lecture Series #3_Application prospect of machine learning and data science in biomedicine and cancer treatment


Abstract:

Single-cell technologies have received extensive attention from bioinformatics and computational biology communities due to their evolutionary impacts on uncovering novel cell types and intra-population heterogeneity in various domains of biology and medicine. Recent advances on single-cell RNA-sequencing (scRNA-seq) technologies have enabled parallel transcriptomic profiling of millions of cells. However, existing scRNA-seq clustering methods are lack of scalability, time-consuming and prone to information loss during dimension reduction. To address these concerns, we present SHARP, an ensemble random projection-based algorithm which is scalable to clustering 10 million cells. By adopting a divide-and-conquer strategy, a sparse random projection and two-layer meta-clustering, SHARP has the following advantages: (1) hyper-faster than existing algorithms; (2) scalable to 10-million cells; (3) accurate in terms of clustering performance; (4) preserving cell-to-cell distance during dimension reduction; and (5) robust to dropouts in scRNA-seq data. Comprehensive benchmarking tests on 20 scRNA-seq datasets demonstrate SHARP remarkably outperforms state-of-the-art methods in terms of speed and accuracy. To the best of our knowledge, SHARP is the only R-based tool that is scalable to clustering 10 million cells. With an avalanche of single cells in different tissues to be sequenced in multiple international projects like The Human Cell Atlas, we believe SHARP will serve as one of the useful and important tools for large-scale single-cell data analysis.


About the speaker:

Dr. Wan Shibiao

Bioinformatics Research Scientist at St. Jude Children's Research Hospital, Postdoc at University of Pennsylvania, Postdoc at Princeton University, PhD of Machine Learning and Bioinformatics at Hong Kong Polytechnic University