An innovative computational tool, scSemiProfiler, makes powerful single-cell sequencing technology more accessible for health research

Single-cell sequencing is a breakthrough technology in biological research and personalized medicine. It can provide information at the individual cell level, allowing for increased understanding of cellular complexity, as well as identification and characterization of cellular subpopulations in patient samples, biomarker discovery and personalized therapies. However, the prohibitive costs of traditional methodologies substantially limit the application of this technology in health research and personalized medicine, particularly in large-scale studies.

Researcher Jun Ding, PhD, who holds a prestigious FRQS award in Artificial Intelligence and Health, is changing this paradigm. With his team at the Research Institute of the McGill University Health Centre (RI-MUHC), he has led the development of an innovative computational tool known as scSemiProfiler. This new tool, described in a recent Nature Communications publication and highlighted by the editors as among the 50 best papers in the field, will make single-cell sequencing technology more accessible for research. It combines deep generative artificial intelligence (AI) and active learning to create detailed single-cell data profiles, achieving high-quality results at just a fraction of the cost of traditional single-cell sequencing.

Jun Ding (left), conducts research in the Translational Research in Respiratory Diseases Program at the RI-MUHC. PhD candidate Jingtao Wang (centre) is first author on the publication in Nature Communications.

“This breakthrough tool makes it feasible to extend single-cell sequencing to broader research applications and complex disease cohorts,” says Prof. Ding, a junior scientist in the Translational Research in Respiratory Diseases Program at the RI-MUHC and Assistant Professor in the Department of Medicine at McGill University. “According to 2023 estimates from the McGill University Health Centre, sequencing 20,000 cells can cost approximately $6,000, making it impractical for large-scale research projects. But with scSemiProfiler, we can change that.”

Developing a “semi-profiling” approach

Until now, researchers have used different methods to draw inferences from more affordable bulk sequencing data. While useful, these methods lacked the detailed resolution needed for single-cell level analyses, crucial for understanding complex diseases.

In contrast, the RI-MUHC team designed a method to “semi-profile” disease cohorts at the single-cell level accurately and efficiently. The scSemiProfiler tool leverages bulk data and single-cell templates from representative samples. The researchers have shown that scSemiProfiler produces datasets that are highly accurate and consistent with real, fully profiled data, allowing them to leverage the information from bulk data while providing more cost-effective single-cell data.

“We can broadly categorize the entire framework into two major parts, both of which are critical for delivering accurate and effective semi-profiling,” says Jingtao Wang, PhD candidate in the Computational Biology program in Experimental Medicine at McGill University’s Faculty of Medicine and Health Sciences and first author on the publication. “The first part is what we call “representative selection,” in which we choose the best representative single-cell samples. The second part is called “in silico inference,” which refers to the process of inferring the target single-cell data from the bulk.”

The deep generative learning model first reconstructs the single-cell reference data, then introduces the target sample’s information by requiring the generated single-cell data to have an average value similar to the target sample’s bulk data.

“Single-cell sequencing is a crucial tool for dissecting the cellular intricacies of complex diseases,” says Prof. Ding. “We are pleased that our tool may circumvent its prohibitive cost, particularly for expansive biomedical studies.”

The next steps for the work are the improvement, maintenance and democratization of this computational tool, say the researchers. They have already made the scSemiProfiler tool accessible to the research community on the GitHub platform. Next, a cloud service will be established to facilitate adoption of the tool by researchers who lack access to extensive computational resources.

“We are really honoured that the editors at Nature Communications have decided to feature our publication about scSemiProfiler in their recent Editor’s highlights section,” adds Prof. Ding. “This recognition underscores the importance of cost-effective tools in advancing research on the cellular complexities of complex diseases.”

The researchers gratefully acknowledge funding support from the Meakins Christie Laboratories, via the Research Chair in Respiratory Research awarded to Jun Ding, as well as grants from the Canadian Institutes of Health Research (CIHR), Natural Sciences and Engineering Research Council (NSERC) and the Fonds de recherche du Quebec – Santé (FRQS).

The researchers heartily thank the Information Technology (IT) services team at the RI-MUHC for technical and logistical support.

About the publication

Wang, J., Fonseca, G.J. & Ding, J. scSemiProfiler: Advancing large-scale single-cell studies through semi-profiling with deep generative models and active learning. Nat Commun 15, 5989 (2024). https://www.nature.com/articles/s41467-024-50150-1