CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization
介绍
该包是2017年发表的已经很久了,但其涵盖的几种分型方法比较经典。
开发背景:cancer subtype R包,该包使用相同的输入和输出格式实现了众所周知的癌症亚型发现方法。
内含方法:
-
Consensus clustering (CC) (Monti et al., 2003) is an unsupervised clustering method, which is frequently used and has several successful applications in cancer subtype discovery.
-
Consensus non-negative matrix factorization (CNMF) (Brunet et al., 2004) is an effective dimension reduction method used for finding molecular patterns from high-dimensional datasets.
-
Integrative clustering (iCluster) (Shen et al., 2009) uses a joint latent variable model for iCluster of multi-omics data.
-
Similarity network fusion (SNF) (Wang et al., 2014) is a method using SNF for aggregating multi-omics data to discover the similarities between patients.
-
We propose a new method, SNF-CC to combine SNF and CC together to take the advantages of both for cancer subtype identification.
-
Weighted SNF (WSNF) (Xu et al., 2016) is similar to SNF but it takes the level of importance of genes into consideration. The gene weights are calculated based on the number of links the genes have in the miRNA-Transcription Factor-mRNA regulatory network.
提供的验证和可视化方法
生存分析,表达差异,轮廓系数等
-
Statistical significance of clustering (Liu et al., 2008) tests the significance of the difference in data distribution between subtypes.
-
Silhouette width (Rousseeuw, 1987) is used to measure how well a sample is matched to its identified subtype compared to other subtypes. A high Silhouette value indicates that the sample is well matched.
github及使用手册
taoshengxu/CancerSubtypes (github.com)
安装
devtools::install_github("taoshengxu/CancerSubtypes")
该文献补充文件提供了详细的使用代码bioinformatics_33_19_3131_s2.pdf (silverchair-cdn.com)
例如共识聚类:
load("GBM_GeneEXp.rda")load("GBM_miRNA_8x15k.rda")load("GBM_clinical.rda")##The input dataset is multi-genomics data as a listGBM=list(GeneExp=GBM_GeneEXp,miRNAExp=GBM_miRNA_8x15k)result8 =ExecuteCC(clusterNum=3,d=GBM,maxK=3,clusterAlg="hc",distance="pearson",title="GBM")group=result8$groupdistanceMatrix=result8$distanceMatrixp_value=survAnalysis(mainTitle="GBM Consensus Clustering-Cluster=3",GBM_clinical$time,GBM_clinical$status,group,distanceMatrix=distanceMatrix,similarity=TRUE)##*****************************************************##GBM Consensus Clustering-Cluster=3 Cluster= 3 Call:##survdiff(formula = Surv(time, status) ~ group)##N Observed Expected (O-E)^2/E (O-E)^2/V##group=1 58##group=2 214##group=3 456161365.5152.02.51.37550.53200.09962.1121.8480.101##Chisq= 2.2 on 2 degrees of freedom, p= 0.339
参考文献:
1:CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization
2:各种癌症都有自己的细分亚型 - 知乎 (zhihu.com)