Novel evaluation approach for molecular signature-based deconvolution methods (2023)


Since the therapeutic immuno-oncological approaches for treating tumors emerged as a solid way to eliminate cancer cells, there are concerns about why there is a considerable number of patients that do not benefit from such treatments. The interplay between tumor cells and the surrounding immune cells (i.e the Tumor Immune Microenvironment or TIME) has been largely acknowledged as a main determinant of tumor fate during treatments such as checkpoint blockade therapy and other therapeutic approaches that lead to immune activation. [1], [2], [3]. Despite flow cytometry [4] or single cell sequencing approaches being accepted as the preferred methods to interrogate the TIME, both are usually impractical to be massively applied on large cohorts. However, current availability of gene expression profiles from bulk tumor tissues on large data sets, allow catching up on the content and characteristics of the TIME by means of computational tumor immune cell deconvolution methods (DM) that, by predicting cell type content and proportions, will allow finding TIME association with prognosis and elucidating the intrinsic and acquired mechanism of resistance to immunotherapy [5].

Several computational approaches had been proposed to evaluate the TIME, each one with its own strengths and weaknesses. They can be divided into two groups, reference-free and reference-based models [6]. Reference-free models can be useful in situations when the cell types present in the samples to be quantified are unknown to the user, or when the cell types present in the samples have not been previously characterized [6]. On the other hand, reference-based models interrogate the presence/absence of specific cell types on well-studied tissues or tumors to further evaluate their cell-type diversity and/or proportions association with prognosis, or other important clinical factors. In this work, the focus is set on the latter ones, where for certain deconvolution algorithms it is assumed that the cell-type composition of a bulk sample can be inferred as a linear combination of the gene expression profiles (as suggested by Zhong et al.[7]). Such profiles are stored in a molecular signature (MS) matrix as specific cell-type transcriptomic templates, taken as a reference to identify each cell-type content on bulk samples by DMs.

The linear deconvolution of the cell types present in an MS, holding N genes for k cell types (MSNxk), suspected to be the components of a mixture of cell types present in a tissue gene expression profile (Y), involves solving the following regression model equation, Y=MSB, where the proportions for all cell types in the admixture sample is represented by the column vector B (the vector of “k” regression coefficients) which should satisfy both the non-negativity B=β1,,βk,βj0j=1...k and sum-to-one constraints i=1kβi=1. In order to preserve the linear relationship of the observed gene expression profile and the gene signature matrix, this equation should be addressed with non-log RNA-Seq data or one-color microarray data [7]. As a result, the DM provides an estimate of the cell type proportions as a B^ vector (where “^” means estimated). In this sense, every DM output is composed by two terms: i) the cell type identification (i.e. those cell types for which β^j>0 represents the detected cell types in a sample) and ii) their corresponding proportion values (i.e. β^j).

When analyzing cell type proportion prediction accuracy, current evaluations and comparisons were made using metrics like i) the correlation coefficient (ρ), ii) the goodness of fit or coefficient of determination from a regression technique (R2) and/or iii) the root mean squared error (RMSE) or mean absolute deviation (mAD), involving their predictions and the ground truth or, alternatively, the predicted proportions estimated by a competing method. However, such evaluation values may provide wrong clues about proportion prediction accuracy and bias since they only evaluate the linear association of the estimated proportion related to the expected one (i.e. the overall error), thus not giving any insight about how the error varies across different cell proportion scenarios (prediction-dependent bias trend).

In particular, correlation refers to the presence of a relationship between two different variables (measures the strength of the relation), whereas agreement is defined as the concordance between two measurements of one variable [8], [9], [10]). Highly correlated variables may have a low level of agreement and show significant systematic or prediction-dependent error bias with an ascending trend (see [9], [10]). Correlation analysis may result appropriate for comparing score-based methods like xCell [11] or imsig [12] where proportion estimates errors can not be evaluated, thus being advised for cell-type content analysis of score-based TIME predictors. On the other hand, the coefficient of determination, R2, only gives information about the proportion of variance in common with the ground truth or, alternatively, a gold standard method [13]. RMSE and mAD are averaged measures of error’s standard deviation or the difference between one DM’s predicted proportions against the ground-truth-proportions. The RMSE may be useful when the differences between the ground truth and the predicted value are normally distributed, which may not always be the case, and two methods may have similar RMSE values but still have significant differences in the distribution of their error measurements (for instance, heteroscedasticity or prediction-dependent over/underestimation trends). Despite metrics like mAD can be used when the differences are not normally distributed, they do not provide, as the RMSE, information about the direction (over/underestimation) of the errors, thus not being fully suitable for evaluating the agreement between predicted and true proportions/cell-type coefficients. In any case, correlation, RMSE and mAD are just single values which does not allow the differentiation between systematic and prediction-dependent biased results (i.e. the first means a constant error across prediction values, and the second refers to predictions where the error or its variance augments or decreases depending on the predicted value). In order to qualitatively evaluate prediction-dependent patterns, graphical methods are crucial to provide information about the magnitude and direction of the differences between the prediction and the ground truth, helping to identify any kind of bias or variance variations.

Even though these metrics may still be used to evaluate MS-DM pairs, they are not sufficient to completely assess their performance nor to compare two or more of them. Any benchmarking protocol should necessarily include metrics and methods to explore the accuracy of a MS-DM pair estimations, not only comparing the averaged error between them or how their results correlate with the expected values, but measuring and displaying how the error evolves across different scenarios, if it has a homogeneous distribution, and their efficiency in detecting the presence of the questioned cell-types in the tumor sample.

In this sense, current proposed benchmark studies do not evaluate the cell-type identification abilities of each tested DM. This capability must be assessed by comparing the number of positively identified cell types with the true cell types present in the sample. Regarding this, measuring how accurately a DM detects which cell types are present in a sample is crucial since i) the β^ coefficients that any DM estimates are constrained by the number of cell types identified at each sample and ii) they may provide cell types which are not truly present in the TIME. Consequently, over or under cell-type detection will lead to a biased estimation of β^ coefficients (i.e predicted proportions).

In essence, the objective of any MS-DM pair is twofold: it should provide accurate cell type identification (low false discovery rates) present in the admixture sample and consequently provide unbiased β^ coefficients. So, any benchmark protocol should appropriately evaluate these two MS-DM pair objectives defining appropriate evaluation scenarios, metrics and diagnostic plots as proposed in this work.

Independent of the evaluation scenario and/or evaluated feature (i.e. data scaling/normalization, distribution of generated noise, background prediction, minimal fraction detection, etc. [14], [15]), unbiased and objective metrics and methods are required to fairly benchmark MS-DM pairs. In this work, we propose a protocol and appropriate metrics to evaluate the proportion estimation accuracy and the cell-type detection capabilities of each MS-DM pair by defining true and false positive/negative predictions.

In order to evaluate overall estimation accuracy, an enhanced Bland-Altman method [13], [16] is proposed, providing a graphical support (diagnostic plots) and metrics for agreement evaluation. To perform a case-specific analysis, by defining which β^ coefficients corresponded to cell types that were or were not present in a sample, the range of falsely detected β^ coefficients can be evaluated and compared between DMs.

Cell-type detection performance can be evaluated after analyzing every tested DM’s output as a “binary” classification result, and then calculating classification accuracy metrics such as sensitivity (Se), specificity (Sp), positive/negative predictive values (PPV, NPV), distance to the optimal ROC point (DOP, evaluates detection capabilities and certainty) [17], F1-score (evaluates the amount and confidence of positive detections) [18], and the error rate (ER). The number of predicted cell types (NCT) detected in each sample across different scenarios must be evaluated too.

By means of these proposed methods and metrics, the accuracy of the predicted proportions, which is affected by false positive/negative cell type detections (using true negative and positive scenarios respectively) can be evaluated.

Section snippets

Proposed benchmark evaluation tests

  • Self-Test: Any DM should accurately estimate the proportions of the expression profiles present in the MS that is paired to it, i.e. solving the regression problem by replacing Y with the columns (i:1..k) of the MS. In this case, it is expected that the solution provides only one coefficient β^i=1 for Y=MS.i where “.i” represents each column i=1..k of the MS. This test allows evaluating false positive detections, their impact on proportion estimates and the effect or robustness to

Self-Test analysis reveals collinearity problems and floating point errors

The estimated vs. true proportion coefficients βi=1&βj=0i,j=1..k&ijY=X.i(i.e. only the main diagonal is expected to be present) are shown in Fig. 1 for all MS-DM pairs. The leftmost panels correspond to the autocorrelation matrix of each MS, where high levels of multicollinearity patterns between the cell-type profiles were devised, showing correlation values of up to 0.9 (see Supplementary Table 1). The lowest correlation between its cell types was found in the murine kidney’s immune cells


Due to the massive availability of tumor gene expression data from both public repositories and from new clinical trials evaluating immunotherapies, the development of new techniques allowing the exploration of the complex interplay between the tumor cells and its immune microenvironment makes it crucial to decipher the mechanism driving therapy success, resistance and prediction response [31]. The deconvolution methods and the molecular signatures for tumoral-sample deconvolution are


Here it is demonstrated that by means of metrics not previously used in the field it is possible to assess the cell-type detection capabilities of each MS-DM pair, and how they play a key role in their proportion estimation accuracy. As a conclusion, we suggest the inclusion of this protocol’s metrics and methods when evaluating/developing MS-DM pairs for a fair and comprehensive comparison.

Statement of significance


This work was supported by the Universidad Católica de Córdoba (80020180100029CC) to E.A.F., Universidad Nacional de Córdoba (33620180100993CB) to E.A.F and the Argentinean Agency for Promotion of Science and Technology (PCE-GLAXO-2020–00004) to G. M.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Recommended articles (6)

  • Research article

    Integration of multiple lineage measurements from the same cell reconstructs parallel tumor evolution

    Cell Genomics, Volume 2, Issue 2, 2022, Article 100096

    Organoid evolution models complemented with integrated single-cell sequencing technology provide a powerful platform to characterize intra-tumor heterogeneity (ITH) and tumor evolution. Here, we conduct a parallel evolution experiment to mimic the tumor evolution process by evolving a colon cancer organoid model over 100 generations, spanning 6months in time. We use single-cell whole-genome sequencing (WGS) in combination with viral lineage tracing at 12 time points to simultaneously monitor clone size, CNV states, SNV states, and viral lineage barcodes for 1,641 single cells. We integrate these measurements to construct clonal evolution trees with high resolution. We characterize the order of events in which chromosomal aberrations occur and identify aberrations that recur multiple times within the same tumor sub-population. We observe recurrent sequential loss of chromosome 4 after loss of chromosome 18 in four unique tumor clones. SNVs and CNVs identified in our organoid experiments are also frequently reported in colorectal carcinoma samples, and out of 334 patients with chromosome 18 loss in a Memorial Sloan Kettering colorectal cancer cohort, 99 (29.6%) also harbor chromosome 4 loss. Our study reconstructs tumor evolution in a colon cancer organoid model at high resolution, demonstrating an approach to identify potentially clinically relevant genomic aberrations in tumor evolution.

  • Research article

    Towards texture accurate slice interpolation of medical images using PixelMiner

    Computers in Biology and Medicine, 2023, Article 106701

    Quantitative image analysis models are used for medical imaging tasks such as registration, classification, object detection, and segmentation. For these models to be capable of making accurate predictions, they need valid and precise information. We propose PixelMiner, a convolution-based deep-learning model for interpolating computed tomography (CT) imaging slices.

    PixelMiner was designed to produce texture-accurate slice interpolations by trading off pixel accuracy for texture accuracy. PixelMiner was trained on a dataset of 7829 CT scans and validated using an external dataset. We demonstrated the model's effectiveness by using the structural similarity index (SSIM), peak signal to noise ratio (PSNR), and the root mean squared error (RMSE) of extracted texture features. Additionally, we developed and used a new metric, the mean squared mapped feature error (MSMFE). The performance of PixelMiner was compared to four other interpolation methods: (tri-)linear, (tri-)cubic, windowed sinc (WS), and nearest neighbor (NN).

    PixelMiner produced texture with a significantly lowest average texture error compared to all other methods with a normalized root mean squared error (NRMSE) of 0.11 (p < .01), and the significantly highest reproducibility with a concordance correlation coefficient (CCC) ≥ 0.85 (p < .01).

    PixelMiner was not only shown to better preserve features but was also validated using an ablation study by removing auto-regression from the model and was shown to improve segmentations on interpolated slices.

  • Research article

    Association of Depression With Selenium Deficiency and Nutritional Markers in the Patients With End-Stage Renal Disease on Hemodialysis

    Journal of Renal Nutrition, Volume 25, Issue 4, 2015, pp. 381-387

    Depression is considered as the most common psychological problem in hemodialysis (HD) patients. As there is little evidence regarding the association of depression with serum selenium level as an antioxidant in these patients, the current survey investigates the possible relationship between depression and nutritional status including serum selenium levels.

    Cross-sectional study.

    A total of 110 HD patients and 40 healthy controls were enrolled in the study. The patients were in the age range of 18 to 85 years, who had been on hemodialysis for at least 3months without any acute illness.

    Beck Depression Inventory was used for assessing the severity of depression. Malnutrition was evaluated through subjective global assessment (SGA) and malnutrition inflammation score (MIS). Serum selenium levels and routine laboratory markers were measured from fasting samples.

    Sixty-two percent of the patients had some degree of depression based on Beck Depression Inventory score. HD patients were considered to be selenium deficient after comparing the mean value of serum selenium between the patients and controls (P<.001). No significant difference was found in serum selenium levels between depressed HD patients and the rest of patients without depression. The mean level of SGA and MIS in the depressed patients was significantly higher than the rest of patients (P=.03 and P=.04, respectively). Also lower levels of hemoglobin and serum albumin were significantly seen in depressed patients compared with nondepressed ones (P=.004 and P=.04, respectively).

    Although the HD patients in this study were selenium deficient, no significant association was found between depression and selenium. In addition, depression was more prevalent in malnourished HD patients with higher SGA and MIS scores and lower serum albumin and hemoglobin levels.

  • Research article

    The electronic density obtained from a QTAIM analysis used as molecular descriptor. A study performed in a new series of DHFR inhibitors

    Journal of Molecular Structure, Volume 1134, 2017, pp. 464-474

    The results reported here indicate that the electron density obtained from a QTAIM analysis is an excellent descriptor of molecular interactions that stabilize and destabilize the formation of the ligand-receptor (L-R) complex. The study was conducted on a series of 25 compounds that have inhibitory effects on DHFR. Besides the synthesis and bioassays performed for some of these compounds, various types of molecular calculations were performed. Thus, we performed MD simulations, computations at different levels of theory (ab initio and DFT) using reduced models and a QTAIM study on the different complexes.

    The resulting model has allowed us to differentiate not only highly active compounds with respect to compounds weakly active, but also among compounds that have similar affinities in this series. The model also showed a high degree of predictability which allows predicting the affinity of non-synthesized compounds. Very important additional information can be obtained through this type of study, it is possible to visualize which amino acids are involved in the interactions determining the different affinities of the ligands.

  • Research article

    A FGFR3-related immune prognostic score model effectively predicts prognosis in bladder cancer

    Computers in Biology and Medicine, 2023, Article 106976

    Immunotherapy and FGFR3-targeted therapy play an important role in the management of locally advanced and metastatic bladder cancer (BLCA). Previous studies indicated that FGFR3 mutation (mFGFR3) may be involved in the alterations of immune infiltration, which may affect the priority or combination of these two treatment regimes. However, the specific impact of mFGFR3 on the immunity and how FGFR3 regulates the immune response in BLCA to affect prognosis remain unclear. In this study, we aimed to elucidate the immune landscape associated with mFGFR3 status in BLCA, screen immune-related gene signatures with prognostic value, and construct and validate a prognostic model.

    ESTIMATE and TIMER were used to assess the immune infiltration within tumors in the TCGA BLCA cohort based on transcriptome data. Further, the mFGFR3 status and mRNA expression profiles were analyzed to identify immune-related genes that were differentially expressed between patients with BLCA with wild-type FGFR3 or mFGFR3 in the TCGA training cohort. An FGFR3-related immune prognostic score (FIPS) model was established in the TCGA training cohort. Furthermore, we validated the prognostic value of FIPS with microarray data in the GEO database and tissue microarray from our center. Multiple fluorescence immunohistochemical analysis was performed to confirm the relationship between FIPS and immune infiltration.

    mFGFR3 resulted in differential immunity in BLCA. In total, 359 immune-related biological processes were enriched in the wild-type FGFR3 group, whereas none were enriched in the mFGFR3 group. FIPS could effectively distinguish high-risk patients with poor prognosis from low-risk patients. The high-risk group was characterized by a higher abundance of neutrophils; macrophages; and follicular helper, CD4, and CD8 T-cells than the low-risk group. In addition, the high-risk group exhibited higher expression of PD-L1, PD-1, CTLA-4, LAG-3, and TIM-3 than the low-risk group, indicating an immune-infiltrated but functionally suppressed immune microenvironment. Furthermore, patients in the high-risk group exhibited a lower mutation rate of FGFR3 than those in the low-risk group.

    FIPS effectively predicted survival in BLCA. Patients with different FIPS exhibited diverse immune infiltration and mFGFR3 status. FIPS might be a promising tool for selecting targeted therapy and immunotherapy for patients with BLCA.

  • Research article

    Cellular and humoral functional responses after BNT162b2 mRNA vaccination differ longitudinally between naive and subjects recovered from COVID-19

    Cell Reports, Volume 38, Issue 2, 2022, Article 110235

    We have analyzed BNT162b2 vaccine-induced immune responses in naive subjects and individuals recovered from coronavirus disease 2019 (COVID-19), both soon after (14days) and later after (almost 8months) vaccination. Plasma spike (S)-specific immunoglobulins peak after one vaccine shot in individuals recovered from COVID-19, while a second dose is needed in naive subjects, although the latter group shows reduced levels all along the analyzed period. Despite how the neutralization capacity against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) mirrors this behavior early after vaccination, both groups show comparable neutralizing antibodies and S-specific B cell levels late post-vaccination. When studying cellular responses, naive individuals exhibit higher SARS-CoV-2-specific cytokine production, CD4+ Tcell activation, and proliferation than do individuals recovered from COVID-19, with patent inverse correlations between humoral and cellular variables early post-vaccination. However, almost 8months post-vaccination, SARS-CoV-2-specific responses are comparable between both groups. Our data indicate that a previous history of COVID-19 differentially determines the functional T and B cell-mediated responses to BNT162b2 vaccination over time.

© 2023 Elsevier Inc. All rights reserved.

Top Articles
Latest Posts
Article information

Author: Cheryll Lueilwitz

Last Updated: 12/01/2023

Views: 6294

Rating: 4.3 / 5 (74 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Cheryll Lueilwitz

Birthday: 1997-12-23

Address: 4653 O'Kon Hill, Lake Juanstad, AR 65469

Phone: +494124489301

Job: Marketing Representative

Hobby: Reading, Ice skating, Foraging, BASE jumping, Hiking, Skateboarding, Kayaking

Introduction: My name is Cheryll Lueilwitz, I am a sparkling, clean, super, lucky, joyous, outstanding, lucky person who loves writing and wants to share my knowledge and understanding with you.