Reading Week Lecture 2: Application prospect of machine learning and data science in biomedicine and cancer treatment

On the morning of April 7th, Dr. Shibiao Wan, from St. Jude Children’s Research Hospital, delivered a lecture on "The Application Prospect of Machine Learning and Data Science in Biomedical and Cancer Treatment" on the UIC campus. Before Dr. Wan’s lecture, Dr. Junyi Chai,the Program Director of the e-Business Management and Information Systems, gave a welcome speech.



Dr. Wan started his speech by introducing the application of machine leaning in dealing with big data of human genome and NGS (Next Generation Sequencing). He introduced three structures of machine learning: supervised learning, semi-supervised learning and unsupervised learning. Before explain the research topics in detail, he introduced the research background and methodology of the NGS area.


He explained two topics to the audience in the NGS area. The first one is: Unsupervised Learning for Transcriptomics Data, aiming for single cell data analysis. In this topic, Dr. Wan introduced transcriptomics data, the expression of gene, SHARP (Single-Cell RNA-Seq Hyper-Fast and Accurate Clustering via Ensemble Random Projection) and its framework, random projection, and clustering. The result of this topic is: 1. Unsupervised learning is useful for exploratory data analysis; 2. SHARP is applicable to processing large-scale and high-dimensional biological data, like single-cell RNA-seq data; 3. Random projection can be an alternative to PCA for dimension reduction of high-dimensional biological data; 4. Ensemble learning can yield robust clustering performance.  


The second topic is: (Semi-)Supervised Learning for Proteomics Data. Dr. Wan introduced how to use of machine learning and semi-supervised learning to mine and process proteins and subcellular locations and two models named PseAA(Pseudo Amino Acid Composition) and PA(Profile Alignment). The result of this topic is: 1. Semi-supervised learning is useful for limited labelled data; 2. Semi-supervised learning can leverage information from both annotated and unannotated data; 3. Using ensemble features performs better than using individual features; 4. Evolutionary feature (PA) and sequence features (PseAA) are complementary to each other.


The Lecture


In the question and answer session, the audiences are very active in asking questions both online and offline. Dr. Wan answered those questions patiently. They discussed different topics including SHARP, clustering analysis, and the vaccine of COVID-19. They have a great and positive atmosphere.



About the Speaker


Dr. Wan Shibiao

Bioinformatics Research Scientist at St. Jude Children's Research Hospital, Postdoc at University of Pennsylvania, Postdoc at Princeton University, PhD of Machine Learning and Bioinformatics at Hong Kong Polytechnic University