close

前幾期推文介紹了Seurat官網的CCA和RPCA方法進行單細胞多樣本整合,詳見

單細胞多樣本整合和插槽選擇(一)
單細胞多樣本整合之RPCA

實際上,除了這兩種外,整合去批次的方法還有很多。例如2020年的Genome Biol《A benchmark of batch-effect correction methods for single-cell RNA sequencing data》測評了14種方法,最後的結論是:

Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.

2021年Nat Biotechnol也有一篇測評文章《A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples》對目前主流的單細胞整合方法做了測評和總結,大致結論如下:

可以看到Harmony依舊是能打。Harmony發表在2019年的nature methods上《Fast, sensitive and accurate integration of single-cell data with Harmony》,相較於別的整合方法,其優點在於:

運行速度比較快;
節省內存;
不同於RPCA,Harmony可用於跨平台、跨組織的多樣本整合。

下圖是同一個數據集四種整合方法所占的內存情況:

本文主要介紹Harmony和LIGER的代碼實戰。最後,使用LISI指數評價CCA,RPCA,Harmony和LIGER的整合效果,可以看到Harmony和LEGER的整合效果更佳。LISI結果如下:

一. Harmony實戰

R包需提前安裝好:

rm(list=ls())#devtools::install_github('MacoskoLab/liger')#devtools::install_github('immunogenomics/harmony')library(Seurat)library(dplyr)library(patchwork)library(readr)library(ggplot2)library(RColorBrewer)library(future)library(clustree)library(cowplot)library(stringr)library(SeuratDisk)library(SeuratWrappers)library(harmony)library(rliger)library(reshape2)#changethecurrentplantoaccessparallelizationplan("multisession",workers=2)plan()#設置可用的內存options(future.globals.maxSize=10*1024^3)#installdatasetInstallData("ifnb")####1.loaddatasetifnb.data=LoadData("ifnb")####2.normalize/HVG/scale三步走ifnb.data<-ifnb.data%>%NormalizeData(verbose=F)%>%FindVariableFeatures(selection.method="vst",nfeatures=2000,verbose=F)%>%ScaleData(verbose=F)%>%RunPCA(npcs=30,verbose=F)####3.harmonyifnb.data<-ifnb.data%>%RunHarmony("orig.ident",plot_convergence=T)#Checkthegeneratedembeddings:harmony_embeddings<-Embeddings(ifnb.data,'harmony')harmony_embeddings[1:5,1:5]#### 4.降維聚類:進行 UMAP 和 clustering:n.pcs=20ifnb.data<-ifnb.data%>%RunUMAP(reduction="harmony",dims=1:n.pcs,verbose=F)%>%RunTSNE(reduction="harmony",dims=1:n.pcs,verbose=F)%>%FindNeighbors(reduction="harmony",k.param=10,dims=1:n.pcs)ifnb.data<-FindClusters(ifnb.data,resolution=0.5,algorithm=1)%>%identity()####5.umap可視化p1<-DimPlot(ifnb.data,reduction="umap",group.by="stim")p2<-DimPlot(ifnb.data,reduction="umap",group.by="seurat_annotations",label=TRUE,repel=TRUE)P.total=p1+p2ggsave(P.total,filename="Output/integrated_snn_res.harmony.pdf",width=15,height=6)saveRDS(ifnb.data,file="Output/integrated.harmony.rds")

Harmony整合效果還是不錯的:

Harmony二. LIGER實戰

LIGER能夠跨個體、物種以識別共有的細胞類型,以及數據集特有的特徵,提供對不同單細胞數據集的統一分析,LIGER還用於單細胞轉錄組和scATAC-seq等多擬態數據的整合。相較Harmony,LIGER的運行速度慢一些:

rm(list=ls())#####1.loaddatasetifnb.data=LoadData("ifnb")####2.normalize/HVG/scale三步走ifnb.data<-ifnb.data%>%NormalizeData(verbose=F)%>%FindVariableFeatures(selection.method="vst",nfeatures=2000,verbose=F)%>%ScaleData(verbose=F,do.center=F)%>%RunPCA(npcs=30,verbose=F)####3.使用orig.ident作為batch運行rliger#k是矩陣分解的因子數,一般取值20-40ifnb.data<-RunOptimizeALS(ifnb.data,k=30,lambda=5,split.by="orig.ident")ifnb.data<-RunQuantileNorm(ifnb.data,split.by="orig.ident")#### 4.降維聚類:進行 UMAP 和 clustering:n.pcs=ncol(ifnb.data[["iNMF"]])ifnb.data<-ifnb.data%>%RunUMAP(reduction="iNMF",dims=1:n.pcs,verbose=F)%>%RunTSNE(reduction="iNMF",dims=1:n.pcs,verbose=F)%>%FindNeighbors(reduction="iNMF",k.param=10,dims=1:30)ifnb.data<-FindClusters(ifnb.data,resolution=0.5,algorithm=1)####5.umap可視化p1<-DimPlot(ifnb.data,reduction="umap",group.by="stim")p2<-DimPlot(ifnb.data,reduction="umap",group.by="seurat_annotations",label=TRUE,repel=TRUE)P.total=p1+p2ggsave(P.total,filename="Output/integrated_snn_res.LIGER.pdf",width=15,height=6)saveRDS(ifnb.data,file="Output/integrated.LIGER.rds")

居然有一簇Ctrl特有的CD14 Mono,是真實存在的差異,還是整合未充分呢?

LIGER三. LISI進行評估library(lisi)library(ggthemes)library(readr)library(ggplot2)library(ggpubr)ifnb.data.liger=read_rds("./Output/integrated.LIGER.rds")ifnb.data.harmony=read_rds("./Output/integrated.harmony.rds")ifnb.data.RPCA=read_rds("./Output/integrated.RPCA.rds")ifnb.data.CCA=read_rds("./Output/integrated.CCA.rds")#pcapca.score=lisi::compute_lisi(X=Embeddings(ifnb.data.harmony,reduction="pca"),meta_data=ifnb.data.harmony@meta.data,label_colnames="stim")%>%dplyr::mutate(type="pca")pca.score$type="raw"#harmonyharmony.score=lisi::compute_lisi(X=Embeddings(ifnb.data.harmony,reduction="harmony"),meta_data=ifnb.data.harmony@meta.data,label_colnames="stim")%>%dplyr::mutate(type="harmony")#CCACCA.score=lisi::compute_lisi(X=Embeddings(ifnb.data.CCA,reduction="pca"),meta_data=ifnb.data.CCA@meta.data,label_colnames="stim")%>%dplyr::mutate(type="pca")CCA.score$type="CCA"#RPCARPCA.score=lisi::compute_lisi(X=Embeddings(ifnb.data.RPCA,reduction="pca"),meta_data=ifnb.data.RPCA@meta.data,label_colnames="stim")%>%dplyr::mutate(type="pca")RPCA.score$type="RPCA"#LIGERLIGER.score=lisi::compute_lisi(X=Embeddings(ifnb.data.liger,reduction="iNMF"),meta_data=ifnb.data.liger@meta.data,label_colnames="stim")%>%dplyr::mutate(type="iNMF")LIGER.score$type="LIGER"####可視化lisi.res<-rbind(pca.score,CCA.score,RPCA.score,harmony.score,LIGER.score)%>%tidyr::gather(key,val,"stim")ggboxplot(lisi.res,x="type",y="val",color="type",palette="jco",#add="jitter",short.panel.labs=FALSE)+labs(y=("LISIscore"),x=NULL,title="Integrationmethod")LISI score

可以看到Harmony和LEGER的整合效果更佳。

- END -
arrow
arrow
    全站熱搜
    創作者介紹
    創作者 鑽石舞台 的頭像
    鑽石舞台

    鑽石舞台

    鑽石舞台 發表在 痞客邦 留言(0) 人氣()