Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data
Affiliation
School of Information Science and Technology, ShanghaiTech University, Shanghai, 201210, China.Issue Date
2020
Metadata
Show full item recordAbstract
Single-cell protein abundance is a fundamental type of information to characterize cell states. Due to high cost and technical barriers, however, direct quantification of proteins is difficult. Single-cell RNA sequencing (scRNA-seq) data, serving as a cost-effective substitute of single-cell proteomics, may not accurately reflect protein expression levels due to measurement error, noise, post-transcriptional and translational regulation, etc. The recently emerging single-cell multimodal omics data, e.g. CITE-seq and REAP-seq, can simultaneously profile RNA and protein abundances in single cells, providing labeled data for predictive modeling in a supervised learning framework. Deep neural network-based transfer learning method has been applied to imputation of surface protein abundance from single-cell transcriptomic data. However, it is unclear if the artificial neural network is the best model, and it is desirable to improve the prediction performance (e.g. accuracy, interpretability) of machine learning models. In this paper, we compared several tree-based ensemble learning methods with neural network models, and found that ensemble learning often performed better than neural network, and Random Forest (RF) performed the best overall. Moreover, we used the feature importance scores from RF to interpret biological mechanisms underlying the prediction. Our study demonstrates the effectiveness of ensemble learning for reliable protein abundance prediction using single-cell multimodal omics data, and paves the way for knowledge discovery by mining single-cell multi-omics data in large scale.Citation
Xu F, Wang S, Dai X, Mundra PA, Zheng J. Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data. Methods. 2020.Journal
MethodsDOI
10.1016/j.ymeth.2020.10.001PubMed ID
33039573Additional Links
https://dx.doi.org/10.1016/j.ymeth.2020.10.001Type
Articleae974a485f413a2113503eed53cd6c53
10.1016/j.ymeth.2020.10.001
Scopus Count
Collections
Related articles
- Surface protein imputation from single cell transcriptomes by deep neural networks.
- Authors: Zhou Z, Ye C, Wang J, Zhang NR
- Issue date: 2020 Jan 31
- A multi-use deep learning method for CITE-seq and single-cell RNA-seq data integration with cell surface protein prediction and imputation.
- Authors: Lakkis J, Schroeder A, Su K, Lee MYY, Bashore AC, Reilly MP, Li M
- Issue date: 2022 Nov
- scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data.
- Authors: Ye W, Ji G, Ye P, Long Y, Xiao X, Li S, Su Y, Wu X
- Issue date: 2019 May 8
- A hybrid deep clustering approach for robust cell type profiling using single-cell RNA-seq data.
- Authors: Srinivasan S, Leshchyk A, Johnson NT, Korkin D
- Issue date: 2020 Oct
- Accurate Single-Cell Clustering through Ensemble Similarity Learning.
- Authors: Jeong H, Shin S, Yeom HG
- Issue date: 2021 Oct 22