cuttag学习笔记

由于课题可能用上cut&tag这个技术,遂跟教程学习一波,记录一下以便后续的学习(主要是怕忘了)
教程网址cut&tag教程

背景知识:靶标下裂解与标记(Cleavage Under Targets & Tagmentation,CUT&Tag)是一种使用蛋白-A-Tn5(pA-Tn5)转座子融合蛋白的系留方法(图 1b)。在 CUT&Tag 中,用特定染色质蛋白抗体孵育透化细胞或细胞核,然后将装有镶嵌末端适配体的 pA-Tn5 依次拴系在抗体结合位点上。通过添加镁离子激活转座子,将适配体整合到附近的 DNA 中。然后对这些基因进行扩增,生成测序文库。基于抗体拴系 Tn5 的方法灵敏度高,因为 pA-Tn5 拴系后会对样本进行严格清洗,而且适配体整合效率高。与 ChIP-seq 相比,信噪比的提高使绘制染色质特征图所需的测序量减少了一个数量级,这样就可以通过对文库进行条形码 PCR,在 Illumina NGS 测序仪上进行成对端测序。通过CUT&Tag可以检测组蛋白修饰状态。

环境要求:
在这里插入图片描述
在这里插入图片描述
一、安装环境所需要的所有包
参考我前面的博客,通过conda安装
二、数据准备
在这里插入图片描述
1.下载数据
首先获取SRR_Acc_List.txt,然后批量下载,下载的SRA数据存放路径~/my_project/PRJNA606273/raw

cd raw/
cat ../SRR_Acc_List.txt |while read id;do (prefetch ${id} &);done

2.SRR数据转fastq

cd ../ 
mkdir -p PRJNA606273/data
for i in `ls raw`;do fastq-dump --split-3 $i --gzip -O ./data;done

3.FastQC质量检测

cd data/
ls *fastq.gz|while read id;do fastqc $id;done

二、比对
1.Alignment to HG38

1)需要先建立hg38索引

cd my_project/PRJNA606273
mkdir -p ref/hg38
mkdir -p ref/index
cd ref/hg38
nohup wget http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz  &
cd ref
bowtie2-build ./hg38/hg38.fa ./index/hg38 #耗时比较久,可以挂后台运行

2)比对

#!/bin/bashcores=8
projPath=~/my_project/PRJNA606273
ref="${projPath}/ref/index/hg38"mkdir -p ${projPath}/alignment/sam/bowtie2_summary
mkdir -p ${projPath}/alignment/bam
mkdir -p ${projPath}/alignment/bed
mkdir -p ${projPath}/alignment/bedgraphls *_1.fastq.gz|while read id; do id=${id/_1.fastq.gz/} #将字符串 id 中的_1.fq.gz 部分替换为空字符串,即将_1.fq.gz删除
bowtie2 --end-to-end --very-sensitive --no-mixed --no-discordant --phred33 -I 10 -X 700 -p ${cores} -x ${ref} -1 \
${projPath}/data/${id}_1.fastq.gz -2 ${projPath}/data/${id}_2.fastq.gz \
-S ${projPath}/alignment/sam/${id}_bowtie2.sam &> ${projPath}/alignment/sam/bowtie2_summary/${id}_bowtie2.txt
done

比对完之后可以在alignment/sam/bowtie2_summary/${id}_bowtie2.txt看到比对结果的summary,如下
在这里插入图片描述
我发现我做出来的结果测序深度一致,但是比对的结果有一点差异,不知道是我的操作问题,还是参考基因组更新了导致比对结果出现了些许不同

2.Alignment to spike-in genome for spike-in calibration

1)需要先建立Ecoli索引

cd my_project/PRJNA606273
mkdir -p ref/Ecoli
cd ./ref/Ecoli
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.fna.gz
gzip -dc GCF_000005845.2_ASM584v2_genomic.fna.gz > E.coli_K12_MG1655.fa
cd ../
bowtie2-build ./Ecoli/E.coli_K12_MG1655.fa ./index/Ecoli
wget https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.chrom.sizes

2)比对

#!/bin/bash
cores=8
projPath=~/my_project/PRJNA606273
spikeInRef="${projPath}/ref/index/Ecoli"
chromSize="~/my_project/PRJNA606273/ref/hg38.chrom.sizes"ls *_1.fastq.gz|while read id; do id=${id/_1.fastq.gz/} #将字符串 id 中的_1.fq.gz 部分替换为空字符串,即将_1.fq.gz删除
bowtie2 --end-to-end --very-sensitive --no-mixed --no-discordant --phred33 -I 10 -X 700 -p ${cores} -x ${spikeInRef} -1 ${projPath}/data/${id}_1.fastq.gz -2 ${projPath}/data/${id}_2.fastq.gz -S $projPath/alignment/sam/${id}_bowtie2_spikeIn.sam &> $projPath/alignment/sam/bowtie2_summary/${id}_bowtie2_spikeIn.txtseqDepthDouble=`samtools view -F 0x04 $projPath/alignment/sam/${id}_bowtie2_spikeIn.sam | wc -l`
seqDepth=$((seqDepthDouble/2))
echo $seqDepth >$projPath/alignment/sam/bowtie2_summary/${id}_bowtie2_spikeIn.seqDepth
done

在运行上面的代码的时候遇到一个关于samtools的报错
samtools: /home/administrator/miniconda3/envs/breakseq/bin/…/lib/libtinfow.so.6: no version information available (required by samtools)
samtools: /home/administrator/miniconda3/envs/breakseq/bin/…/lib/libncursesw.so.6: no version information available (required by samtools)
samtools: /home/administrator/miniconda3/envs/breakseq/bin/…/lib/libncursesw.so.6: no version information available (required by samtools)
解决办法:

conda install -c conda-forge ncurses 

3.汇报比对结果
此部分在R上进行
因为前面部分我一直采用都是SRR的编号命名,而教程提供的代码都是用K4me3和K27me3命名,为了直接套用这里的代码,我就按照对应的关系将文件重命名

cd sam
rename 's/SRR11074254/K4me3_rep1/g' *
rename 's/SRR11074240/K27me3_rep2/g' *
rename 's/SRR11074258/K4me3_rep2/g' *
rename 's/SRR12246717/K27me3_rep1/g' *
rename 's/SRR8754611/IgG_rep2/g' *
rename 's/SRR8754612/IgG_rep1/g' *cd bowtie2_summary/
rename 's/SRR11074254/K4me3_rep1/g' *
rename 's/SRR11074240/K27me3_rep2/g' *
rename 's/SRR11074258/K4me3_rep2/g' *
rename 's/SRR12246717/K27me3_rep1/g' *
rename 's/SRR8754611/IgG_rep2/g' *
rename 's/SRR8754612/IgG_rep1/g' *

1)Sequencing depth

library(dplyr)
library(stringr)
library(ggplot2)
library(viridis)
library(GenomicRanges)
library(chromVAR) ## For FRiP analysis and differential analysis
library(DESeq2) ## For differential analysis section
library(ggpubr) ## For customizing figures
library(corrplot) ## For correlation plot
##=== R command ===## 
## Path to the project and histone list
projPath = "~/my_project/PRJNA606273"
sampleList = c("K27me3_rep1", "K27me3_rep2", "K4me3_rep1", "K4me3_rep2", "IgG_rep1", "IgG_rep2")
histList = c("K27me3", "K4me3", "IgG")## Collect the alignment results from the bowtie2 alignment summary files
alignResult = c()
for(hist in sampleList){alignRes = read.table(paste0(projPath, "/alignment/sam/bowtie2_summary/", hist, "_bowtie2.txt"), header = FALSE, fill = TRUE)alignRate = substr(alignRes$V1[6], 1, nchar(as.character(alignRes$V1[6]))-1)histInfo = strsplit(hist, "_")[[1]]alignResult = data.frame(Histone = histInfo[1], Replicate = histInfo[2], SequencingDepth = alignRes$V1[1] %>% as.character %>% as.numeric, MappedFragNum_hg38 = alignRes$V1[4] %>% as.character %>% as.numeric + alignRes$V1[5] %>% as.character %>% as.numeric, AlignmentRate_hg38 = alignRate %>% as.numeric)  %>% rbind(alignResult, .)
}
alignResult$Histone = factor(alignResult$Histone, levels = histList)
alignResult %>% mutate(AlignmentRate_hg38 = paste0(AlignmentRate_hg38, "%"))

在这里插入图片描述
在这里插入图片描述

上图是我得到的结果,下图是教程的结果,比对结果有一些差异,我也不知道具体原因是啥。特别是IgG_rep1,我后续直接按照教程下载的fastq文件重新跑也是一样结果,奇怪的很。

2)Spike-in alignment

##=== R command ===## 
spikeAlign = c()
for(hist in sampleList){spikeRes = read.table(paste0(projPath, "/alignment/sam/bowtie2_summary/", hist, "_bowtie2_spikeIn.txt"), header = FALSE, fill = TRUE)alignRate = substr(spikeRes$V1[6], 1, nchar(as.character(spikeRes$V1[6]))-1)histInfo = strsplit(hist, "_")[[1]]spikeAlign = data.frame(Histone = histInfo[1], Replicate = histInfo[2], SequencingDepth = spikeRes$V1[1] %>% as.character %>% as.numeric, MappedFragNum_spikeIn = spikeRes$V1[4] %>% as.character %>% as.numeric + spikeRes$V1[5] %>% as.character %>% as.numeric, AlignmentRate_spikeIn = alignRate %>% as.numeric)  %>% rbind(spikeAlign, .)
}
spikeAlign$Histone = factor(spikeAlign$Histone, levels = histList)
spikeAlign %>% mutate(AlignmentRate_spikeIn = paste0(AlignmentRate_spikeIn, "%"))

在这里插入图片描述

在这里插入图片描述

上图是我得到的结果,下图是教程的结果,除了IgG_rep1外比对结果一致。感觉是IgG_rep1给错了数据编号。

3)Summarize the alignment to hg38 and E.coli

##=== R command ===## 
alignSummary = left_join(alignResult, spikeAlign, by = c("Histone", "Replicate", "SequencingDepth")) %>%mutate(AlignmentRate_hg38 = paste0(AlignmentRate_hg38, "%"), AlignmentRate_spikeIn = paste0(AlignmentRate_spikeIn, "%"))
alignSummary

4)Visualizing the sequencing depth and alignment results

##=== R command ===## 
## Generate sequencing depth boxplot
fig3A = alignResult %>% ggplot(aes(x = Histone, y = SequencingDepth/1000000, fill = Histone)) +geom_boxplot() +geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 18) +ylab("Sequencing Depth per Million") +xlab("") + ggtitle("A. Sequencing Depth")fig3B = alignResult %>% ggplot(aes(x = Histone, y = MappedFragNum_hg38/1000000, fill = Histone)) +geom_boxplot() +geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 18) +ylab("Mapped Fragments per Million") +xlab("") +ggtitle("B. Alignable Fragment (hg38)")fig3C = alignResult %>% ggplot(aes(x = Histone, y = AlignmentRate_hg38, fill = Histone)) +geom_boxplot() +geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 18) +ylab("% of Mapped Fragments") +xlab("") +ggtitle("C. Alignment Rate (hg38)")fig3D = spikeAlign %>% ggplot(aes(x = Histone, y = AlignmentRate_spikeIn, fill = Histone)) +geom_boxplot() +geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 18) +ylab("Spike-in Alignment Rate") +xlab("") +ggtitle("D. Alignment Rate (E.coli)")ggarrange(fig3A, fig3B, fig3C, fig3D, ncol = 2, nrow=2, common.legend = TRUE, legend="bottom")

在这里插入图片描述

三、去重
1.安装picard

conda install picard

在使用picard时出现了报错,Unable to access jarfile picard.jar请添加图片描述
谷歌了一下,发现原因可能是因为环境变量的问题,系统不知道picard的具体位置,需要设置全局环境变量,遂查了一下picard的位置
在这里插入图片描述
应该是位于上一级share目录中,知道具体picard具体位置后,就去设置环境变量

cd
vim .bashrc
#将下面这两句命令添加至.bashrc文件末尾
PICARD=~/miniconda3/envs/rna_seq/share/picard-3.1.1-0/picard.jar
alias picard="java -jar $PICARD"

当我以为成功的时候,再用picard的时候再次出现了报错
在这里插入图片描述

Error: A JNI error has occurred, please check your installation and try again
Exception in thread “main” java.lang.UnsupportedClassVersionError: picard/cmdline/PicardCommandLine has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader 1. r u n ( U R L C l a s s L o a d e r . j a v a : 363 ) a t j a v a . s e c u r i t y . A c c e s s C o n t r o l l e r . d o P r i v i l e g e d ( N a t i v e M e t h o d ) a t j a v a . n e t . U R L C l a s s L o a d e r . f i n d C l a s s ( U R L C l a s s L o a d e r . j a v a : 362 ) a t j a v a . l a n g . C l a s s L o a d e r . l o a d C l a s s ( C l a s s L o a d e r . j a v a : 418 ) a t s u n . m i s c . L a u n c h e r 1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher 1.run(URLClassLoader.java:363)atjava.security.AccessController.doPrivileged(NativeMethod)atjava.net.URLClassLoader.findClass(URLClassLoader.java:362)atjava.lang.ClassLoader.loadClass(ClassLoader.java:418)atsun.misc.LauncherAppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:621)
发现可能是因为picard版本太新的原因,遂卸载当前版本,下载旧版本2.25.5的picard

conda remove picard
conda install picard=2.25.5

然后终于成功了,可以使用了
在这里插入图片描述
这个时候再去更新.bashrc文件,就可以随处可用picard了

2.去重

#!/bin/bashprojPath=~/my_project/PRJNA606273mkdir -p $projPath/alignment/removeDuplicate/picard_summaryls *bowtie2.sam|while read histName; do histName=${histName/_bowtie2.sam/} #将字符串 id 中的_bowtie2.sam 部分替换为空字符串,即将_bowtie2.sam删除
## Sort by coordinate
picard SortSam I=$projPath/alignment/sam/${histName}_bowtie2.sam O=$projPath/alignment/sam/${histName}_bowtie2.sorted.sam SORT_ORDER=coordinate## mark duplicates
picard MarkDuplicates I=$projPath/alignment/sam/${histName}_bowtie2.sorted.sam O=$projPath/alignment/removeDuplicate/${histName}_bowtie2.sorted.dupMarked.sam METRICS_FILE=$projPath/alignment/removeDuplicate/picard_summary/${histName}_picard.dupMark.txt## remove duplicates
picard MarkDuplicates I=$projPath/alignment/sam/${histName}_bowtie2.sorted.sam O=$projPath/alignment/removeDuplicate/${histName}_bowtie2.sorted.rmDup.sam REMOVE_DUPLICATES=true METRICS_FILE=$projPath/alignment/removeDuplicate/picard_summary/${histName}_picard.rmDup.txt
done 

3.汇报结果

##=== R command ===## 
## Summarize the duplication information from the picard summary outputs.
dupResult = c()
for(hist in sampleList){dupRes = read.table(paste0(projPath, "/alignment/removeDuplicate/picard_summary/", hist, "_picard.rmDup.txt"), header = TRUE, fill = TRUE)histInfo = strsplit(hist, "_")[[1]]dupResult = data.frame(Histone = histInfo[1], Replicate = histInfo[2], MappedFragNum_hg38 = dupRes$READ_PAIRS_EXAMINED[1] %>% as.character %>% as.numeric, DuplicationRate = dupRes$PERCENT_DUPLICATION[1] %>% as.character %>% as.numeric * 100, EstimatedLibrarySize = dupRes$ESTIMATED_LIBRARY_SIZE[1] %>% as.character %>% as.numeric) %>% mutate(UniqueFragNum = MappedFragNum_hg38 * (1-DuplicationRate/100))  %>% rbind(dupResult, .)
}
dupResult$Histone = factor(dupResult$Histone, levels = histList)
alignDupSummary = left_join(alignSummary, dupResult, by = c("Histone", "Replicate", "MappedFragNum_hg38")) %>% mutate(DuplicationRate = paste0(DuplicationRate, "%"))
alignDupSummary

在这里插入图片描述

在这里插入图片描述

上图是我的结果,下图是教程的结果,结果不太一致,不知道是具体的原因是什么。

可视化结果:

##=== R command ===## 
## generate boxplot figure for the  duplication rate
fig4A = dupResult %>% ggplot(aes(x = Histone, y = DuplicationRate, fill = Histone)) +geom_boxplot() +geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 18) +ylab("Duplication Rate (*100%)") +xlab("") fig4B = dupResult %>% ggplot(aes(x = Histone, y = EstimatedLibrarySize, fill = Histone)) +geom_boxplot() +geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 18) +ylab("Estimated Library Size") +xlab("") fig4C = dupResult %>% ggplot(aes(x = Histone, y = UniqueFragNum, fill = Histone)) +geom_boxplot() +geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 18) +ylab("# of Unique Fragments") +xlab("")ggarrange(fig4A, fig4B, fig4C, ncol = 3, common.legend = TRUE, legend="bottom")

四. Assess mapped fragment size distribution

#!/bin/bash
projPath=~/my_project/PRJNA606273
mkdir -p $projPath/alignment/sam/fragmentLenls *bowtie2.sam|while read histName; do histName=${histName/_bowtie2.sam/} #将字符串 id 中的_bowtie2.sam 部分替换为空字符串,即将_bowtie2.sam删除
## Extract the 9th column from the alignment sam file which is the fragment length
samtools view -F 0x04 $projPath/alignment/sam/${histName}_bowtie2.sam | awk -F'\t' 'function abs(x){return ((x < 0.0) ? -x : x)} {print abs($9)}' | sort | uniq -c | awk -v OFS="\t" '{print $2, $1/2}' >$projPath/alignment/sam/fragmentLen/${histName}_fragmentLen.txt
done
##=== R command ===## 
## Collect the fragment size information
fragLen = c()
for(hist in sampleList){histInfo = strsplit(hist, "_")[[1]]fragLen = read.table(paste0(projPath, "/alignment/sam/fragmentLen/", hist, "_fragmentLen.txt"), header = FALSE) %>% mutate(fragLen = V1 %>% as.numeric, fragCount = V2 %>% as.numeric, Weight = as.numeric(V2)/sum(as.numeric(V2)), Histone = histInfo[1], Replicate = histInfo[2], sampleInfo = hist) %>% rbind(fragLen, .) 
}
fragLen$sampleInfo = factor(fragLen$sampleInfo, levels = sampleList)
fragLen$Histone = factor(fragLen$Histone, levels = histList)
## Generate the fragment size density plot (violin plot)
fig5A = fragLen %>% ggplot(aes(x = sampleInfo, y = fragLen, weight = Weight, fill = Histone)) +geom_violin(bw = 5) +scale_y_continuous(breaks = seq(0, 800, 50)) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 20) +ggpubr::rotate_x_text(angle = 20) +ylab("Fragment Length") +xlab("")fig5B = fragLen %>% ggplot(aes(x = fragLen, y = fragCount, color = Histone, group = sampleInfo, linetype = Replicate)) +geom_line(size = 1) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma") +theme_bw(base_size = 20) +xlab("Fragment Length") +ylab("Count") +coord_cartesian(xlim = c(0, 500))ggarrange(fig5A, fig5B, ncol = 2)

在这里插入图片描述
在这里插入图片描述

五.Alignment results filtering and file format conversion

1.Filtering mapped reads by the mapping quality filtering (可选)

#!/bin/bash
projPath=~/my_project/PRJNA606273
minQualityScore=2
ls *bowtie2.sam|while read histName; do histName=${histName/_bowtie2.sam/} #将字符串 id 中的_bowtie2.sam 部分替换为空字符串,即将_bowtie2.sam删除
samtools view -q $minQualityScore ${projPath}/alignment/sam/${histName}_bowtie2.sam >${projPath}/alignment/sam/${histName}_bowtie2.qualityScore$minQualityScore.sam
done

2.File format conversion

#!/bin/bash
projPath=~/my_project/PRJNA606273
ls *bowtie2.sam|while read histName; do histName=${histName/_bowtie2.sam/} #将字符串 id 中的_bowtie2.sam 部分替换为空字符串,即将_bowtie2.sam删除
## Filter and keep the mapped read pairs
samtools view -bS -F 0x04 $projPath/alignment/sam/${histName}_bowtie2.sam >$projPath/alignment/bam/${histName}_bowtie2.mapped.bam## Convert into bed file format
bedtools bamtobed -i $projPath/alignment/bam/${histName}_bowtie2.mapped.bam -bedpe >$projPath/alignment/bed/${histName}_bowtie2.bed## Keep the read pairs that are on the same chromosome and fragment length less than 1000bp.
awk '$1==$4 && $6-$2 < 1000 {print $0}' $projPath/alignment/bed/${histName}_bowtie2.bed >$projPath/alignment/bed/${histName}_bowtie2.clean.bed## Only extract the fragment related columns
cut -f 1,2,6 $projPath/alignment/bed/${histName}_bowtie2.clean.bed | sort -k1,1 -k2,2n -k3,3n  >$projPath/alignment/bed/${histName}_bowtie2.fragments.bed
done

3.Assess replicate reproducibility

#!/bin/bash
projPath=~/my_project/PRJNA606273
ls *bowtie2.sam|while read histName; do histName=${histName/_bowtie2.sam/} #将字符串 id 中的_bowtie2.sam 部分替换为空字符串,即将_bowtie2.sam删除
## We use the mid point of each fragment to infer which 500bp bins does this fragment belong to.
binLen=500
awk -v w=$binLen '{print $1, int(($2 + $3)/(2*w))*w + w/2}' $projPath/alignment/bed/${histName}_bowtie2.fragments.bed | sort -k1,1V -k2,2n | uniq -c | awk -v OFS="\t" '{print $2, $3, $1}' |  sort -k1,1V -k2,2n  >$projPath/alignment/bed/${histName}_bowtie2.fragmentsCount.bin$binLen.bed
done
##== R command ==##
reprod = c()
fragCount = NULL
for(hist in sampleList){if(is.null(fragCount)){fragCount = read.table(paste0(projPath, "/alignment/bed/", hist, "_bowtie2.fragmentsCount.bin500.bed"), header = FALSE) colnames(fragCount) = c("chrom", "bin", hist)}else{fragCountTmp = read.table(paste0(projPath, "/alignment/bed/", hist, "_bowtie2.fragmentsCount.bin500.bed"), header = FALSE)colnames(fragCountTmp) = c("chrom", "bin", hist)fragCount = full_join(fragCount, fragCountTmp, by = c("chrom", "bin"))}
}M = cor(fragCount %>% select(-c("chrom", "bin")) %>% log2(), use = "complete.obs") corrplot(M, method = "color", outline = T, addgrid.col = "darkgray", order="hclust", addrect = 3, rect.col = "black", rect.lwd = 3,cl.pos = "b", tl.col = "indianred4", tl.cex = 1, cl.cex = 1, addCoef.col = "black", number.digits = 2, number.cex = 1, col = colorRampPalette(c("midnightblue","white","darkred"))(100))

在这里插入图片描述

六. Spike-in calibration

#!/bin/bash
projPath=~/my_project/PRJNA606273
chromSize=$projPath/ref/hg38.chrom.sizes
ls *bowtie2.sam|while read histName; do histName=${histName/_bowtie2.sam/} #将字符串 id 中的_bowtie2.sam 部分替换为空字符串,即将_bowtie2.sam删除
seqDepth=`cat $projPath/alignment/sam/bowtie2_summary/${histName}_bowtie2_spikeIn.seqDepth`if [[ "$seqDepth" -gt "1" ]]; thenmkdir -p $projPath/alignment/bedgraphscale_factor=`echo "10000 / $seqDepth" | bc -l`echo "Scaling factor for $histName is: $scale_factor!"bedtools genomecov -bg -scale $scale_factor -i $projPath/alignment/bed/${histName}_bowtie2.fragments.bed -g $chromSize > $projPath/alignment/bedgraph/${histName}_bowtie2.fragments.normalized.bedgraph
fidone
##=== R command ===## 
scaleFactor = c()
multiplier = 10000
for(hist in sampleList){spikeDepth = read.table(paste0(projPath, "/alignment/sam/bowtie2_summary/", hist, "_bowtie2_spikeIn.seqDepth"), header = FALSE, fill = TRUE)$V1[1]histInfo = strsplit(hist, "_")[[1]]scaleFactor = data.frame(scaleFactor = multiplier/spikeDepth, Histone = histInfo[1], Replicate = histInfo[2])  %>% rbind(scaleFactor, .)
}
scaleFactor$Histone = factor(scaleFactor$Histone, levels = histList)
left_join(alignDupSummary, scaleFactor, by = c("Histone", "Replicate"))
##=== R command ===##
## Generate sequencing depth boxplot
fig6A = scaleFactor %>% ggplot(aes(x = Histone, y = scaleFactor, fill = Histone)) +geom_boxplot() +geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 20) +ylab("Spike-in Scalling Factor") +xlab("")normDepth = inner_join(scaleFactor, alignResult, by = c("Histone", "Replicate")) %>% mutate(normDepth = MappedFragNum_hg38 * scaleFactor)fig6B = normDepth %>% ggplot(aes(x = Histone, y = normDepth, fill = Histone)) +geom_boxplot() +geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.9, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 20) +ylab("Normalization Fragment Count") +xlab("") + coord_cartesian(ylim = c(1000000, 130000000))
ggarrange(fig6A, fig6B, ncol = 2, common.legend = TRUE, legend="bottom")

在这里插入图片描述
在这里插入图片描述

七. Peak calling

#!/bin/bash
projPath=~/my_project/PRJNA606273
mkdir -p $projPath/peakCalling/SEACR
seacr="$projPath/SEACR_1.3.sh"
ls *bowtie2.sam|while read histName; do histName=${histName/_bowtie2.sam/} #将字符串 id 中的_bowtie2.sam 部分替换为空字符串,即将_bowtie2.sam删除
histControl=IgG_rep1bash $seacr $projPath/alignment/bedgraph/${histName}_bowtie2.fragments.normalized.bedgraph \$projPath/alignment/bedgraph/${histControl}_bowtie2.fragments.normalized.bedgraph \non stringent $projPath/peakCalling/SEACR/${histName}_seacr_control.peaksbash $seacr $projPath/alignment/bedgraph/${histName}_bowtie2.fragments.normalized.bedgraph 0.01 non stringent $projPath/peakCalling/SEACR/${histName}_seacr_top0.01.peaks
done

1.Number of peaks called

##=== R command ===## 
peakN = c()
peakWidth = c()
peakType = c("control", "top0.01")
for(hist in sampleList){histInfo = strsplit(hist, "_")[[1]]if(histInfo[1] != "IgG"){for(type in peakType){peakInfo = read.table(paste0(projPath, "/peakCalling/SEACR/", hist, "_seacr_", type, ".peaks.stringent.bed"), header = FALSE, fill = TRUE)  %>% mutate(width = abs(V3-V2))peakN = data.frame(peakN = nrow(peakInfo), peakType = type, Histone = histInfo[1], Replicate = histInfo[2]) %>% rbind(peakN, .)peakWidth = data.frame(width = peakInfo$width, peakType = type, Histone = histInfo[1], Replicate = histInfo[2])  %>% rbind(peakWidth, .)}}
}
peakN %>% select(Histone, Replicate, peakType, peakN)

IgG_rep1在这里插入图片描述

IgG_rep2在这里插入图片描述
2.Reproducibility of the peak across biological replicates

##=== R command ===## 
histL = c("K27me3", "K4me3")
repL = paste0("rep", 1:2)
peakType = c("control", "top0.01")
peakOverlap = c()
for(type in peakType){for(hist in histL){overlap.gr = GRanges()for(rep in repL){peakInfo = read.table(paste0(projPath, "/peakCalling/SEACR/", hist, "_", rep, "_seacr_", type, ".peaks.stringent.bed"), header = FALSE, fill = TRUE)peakInfo.gr = GRanges(peakInfo$V1, IRanges(start = peakInfo$V2, end = peakInfo$V3), strand = "*")if(length(overlap.gr) >0){overlap.gr = overlap.gr[findOverlaps(overlap.gr, peakInfo.gr)@from]}else{overlap.gr = peakInfo.gr}}peakOverlap = data.frame(peakReprod = length(overlap.gr), Histone = hist, peakType = type) %>% rbind(peakOverlap, .)}
}peakReprod = left_join(peakN, peakOverlap, by = c("Histone", "peakType")) %>% mutate(peakReprodRate = peakReprod/peakN * 100)
peakReprod %>% select(Histone, Replicate, peakType, peakN, peakReprodNum = peakReprod, peakReprodRate)

在这里插入图片描述
3.FRagment proportion in Peaks regions (FRiPs)

##=== R command ===## 
library(chromVAR)bamDir = paste0(projPath, "/alignment/bam")
inPeakData = c()
## overlap with bam file to get count
for(hist in histL){for(rep in repL){peakRes = read.table(paste0(projPath, "/peakCalling/SEACR/", hist, "_", rep, "_seacr_control.peaks.stringent.bed"), header = FALSE, fill = TRUE)peak.gr = GRanges(seqnames = peakRes$V1, IRanges(start = peakRes$V2, end = peakRes$V3), strand = "*")bamFile = paste0(bamDir, "/", hist, "_", rep, "_bowtie2.mapped.bam")fragment_counts <- getCounts(bamFile, peak.gr, paired = TRUE, by_rg = FALSE, format = "bam")inPeakN = counts(fragment_counts)[,1] %>% suminPeakData = rbind(inPeakData, data.frame(inPeakN = inPeakN, Histone = hist, Replicate = rep))}
}frip = left_join(inPeakData, alignResult, by = c("Histone", "Replicate")) %>% mutate(frip = inPeakN/MappedFragNum_hg38 * 100)
frip %>% select(Histone, Replicate, SequencingDepth, MappedFragNum_hg38, AlignmentRate_hg38, FragInPeakNum = inPeakN, FRiPs = frip)

在这里插入图片描述
4.Visualization of peak number, peak width, peak reproducibility and FRiPs

fig7A = peakN %>% ggplot(aes(x = Histone, y = peakN, fill = Histone)) +geom_boxplot() +geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +facet_grid(~peakType) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.55, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 18) +ylab("Number of Peaks") +xlab("")fig7B = peakWidth %>% ggplot(aes(x = Histone, y = width, fill = Histone)) +geom_violin() +facet_grid(Replicate~peakType) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.55, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +scale_y_continuous(trans = "log", breaks = c(400, 3000, 22000)) +theme_bw(base_size = 18) +ylab("Width of Peaks") +xlab("")fig7C = peakReprod %>% ggplot(aes(x = Histone, y = peakReprodRate, fill = Histone, label = round(peakReprodRate, 2))) +geom_bar(stat = "identity") +geom_text(vjust = 0.1) +facet_grid(Replicate~peakType) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.55, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 18) +ylab("% of Peaks Reproduced") +xlab("")fig7D = frip %>% ggplot(aes(x = Histone, y = frip, fill = Histone, label = round(frip, 2))) +geom_boxplot() +geom_jitter(aes(color = Replicate), position = position_jitter(0.15)) +scale_fill_viridis(discrete = TRUE, begin = 0.1, end = 0.55, option = "magma", alpha = 0.8) +scale_color_viridis(discrete = TRUE, begin = 0.1, end = 0.9) +theme_bw(base_size = 18) +ylab("% of Fragments in Peaks") +xlab("")ggarrange(fig7A, fig7B, fig7C, fig7D, ncol = 2, nrow=2, common.legend = TRUE, legend="bottom")

在这里插入图片描述
八.Visualization
1.Heatmap over transcription units

#!/bin/bash
projPath=~/my_project/PRJNA606273
mkdir -p $projPath/alignment/bigwig  
ls *bowtie2.sam|while read histName; do histName=${histName/_bowtie2.sam/} #将字符串 id 中的_bowtie2.sam 部分替换为空字符串,即将_bowtie2.sam删除                                                                                                                                      
samtools sort -o $projPath/alignment/bam/${histName}.sorted.bam $projPath/alignment/bam/${histName}_bowtie2.mapped.bam                                                     
samtools index $projPath/alignment/bam/${histName}.sorted.bam                                                                                                              
bamCoverage -b $projPath/alignment/bam/${histName}.sorted.bam -o $projPath/alignment/bigwig/${histName}_raw.bw  
cores=8
computeMatrix scale-regions -S $projPath/alignment/bigwig/K27me3_rep1_raw.bw \$projPath/alignment/bigwig/K27me3_rep2_raw.bw \$projPath/alignment/bigwig/K4me3_rep1_raw.bw \$projPath/alignment/bigwig/K4me3_rep2_raw.bw \-R $projPath/hg38_gene \--beforeRegionStartLength 3000 \--regionBodyLength 5000 \--afterRegionStartLength 3000 \--skipZeros -o $projPath/hg38_heatmap/matrix_gene.mat.gz -p $coresplotHeatmap -m $projPath/hg38_heatmap/matrix_gene.mat.gz -out $projPath/hg38_heatmap/Histone_gene.png --sortUsing sum
done

hg38_gene获取:
https://genome.ucsc.edu/cgi-bin/hgTables
output format改为BED,其他不变
在这里插入图片描述
跑出了一个奇奇怪怪的图,怎么那么多黑线啊
在这里插入图片描述

2.Heatmap on CUT&Tag peaks

#!/bin/bash
projPath=~/my_project/PRJNA606273
for histName in K27me3 K4me3
dofor repName in rep1 rep1 rep2doawk '{split($6, summit, ":"); split(summit[2], region, "-"); print summit[1]"\t"region[1]"\t"region[2]}' $projPath/peakCalling/SEACR/${histName}_${repName}_seacr_control.peaks.stringent.bed >$projPath/peakCalling/SEACR/${histName}_${repName}_seacr_control.peaks.summitRegion.bedcomputeMatrix reference-point -S $projPath/alignment/bigwig/${histName}_${repName}_raw.bw \-R $projPath/peakCalling/SEACR/${histName}_${repName}_seacr_control.peaks.summitRegion.bed \--skipZeros -o $projPath/peakCalling/SEACR/${histName}_${repName}_SEACR.mat.gz -p 8 -a 3000 -b 3000 --referencePoint centerplotHeatmap -m $projPath/peakCalling/SEACR/${histName}_${repName}_SEACR.mat.gz -out $projPath/peakCalling/SEACR/${histName}_${repName}_SEACR_heatmap.png --sortUsing sum --startLabel "Peak Start" --endLabel "Peak End" --xAxisLabel "" --regionsLabel "Peaks" --samplesLabel "${histName} ${repName}"done
done

在这里插入图片描述

九. Differential analysis

  1. Create a master peak list merging all the peaks called for each sample
##=== R command ===## 
mPeak = GRanges()
## overlap with bam file to get count
for(hist in histL){for(rep in repL){peakRes = read.table(paste0(projPath, "/peakCalling/SEACR/", hist, "_", rep, "_seacr_control.peaks.stringent.bed"), header = FALSE, fill = TRUE)mPeak = GRanges(seqnames = peakRes$V1, IRanges(start = peakRes$V2, end = peakRes$V3), strand = "*") %>% append(mPeak, .)}
}
masterPeak = reduce(mPeak)
  1. Get the fragment counts for each peak in the master peak list
##=== R command ===## 
library(DESeq2)
bamDir = paste0(projPath, "/alignment/bam")
countMat = matrix(NA, length(masterPeak), length(histL)*length(repL))
## overlap with bam file to get count
i = 1
for(hist in histL){for(rep in repL){bamFile = paste0(bamDir, "/", hist, "_", rep, "_bowtie2.mapped.bam")fragment_counts <- getCounts(bamFile, masterPeak, paired = TRUE, by_rg = FALSE, format = "bam")countMat[, i] = counts(fragment_counts)[,1]i = i + 1}
}
colnames(countMat) = paste(rep(histL, 2), rep(repL, each = 2), sep = "_")
  1. Sequencing depth normalization and differential enriched peaks detection
##=== R command ===## 
selectR = which(rowSums(countMat) > 5) ## remove low count genes
dataS = countMat[selectR,]
condition = factor(rep(histL, each = length(repL)))
dds = DESeqDataSetFromMatrix(countData = dataS,colData = DataFrame(condition),design = ~ condition)
DDS = DESeq(dds)
normDDS = counts(DDS, normalized = TRUE) ## normalization with respect to the sequencing depth
colnames(normDDS) = paste0(colnames(normDDS), "_norm")
res = results(DDS, independentFiltering = FALSE, altHypothesis = "greaterAbs")countMatDiff = cbind(dataS, normDDS, res)
head(countMatDiff)

在这里插入图片描述

总算是跑完一轮没有报错了,但是结果有一些差异,还不清楚原因是什么,后面再看看是啥问题吧

长腿猴子请来的救兵写于2024.5.13

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/696010.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

每日互动(个推)与您相约2024 AI+研发数字峰会(AiDD)上海站

伴随着人工智能在众多行业领域的广泛应用及其带来的颠覆性变革&#xff0c;软件的开发模式、方式和实践也将发生巨大的变化。 5月17-18日&#xff0c;2024 AI研发数字峰会&#xff08;AiDD&#xff09;上海站即将重磅开幕。峰会设置了15个主题论坛&#xff0c;策划60精彩议题内…

CoSeg: Cognitively Inspired Unsupervised Generic Event Segmentation

名词解释 1.特征重建 特征重建是一种机器学习中常用的技术&#xff0c;通常用于自监督学习或无监督学习任务。在特征重建中&#xff0c;模型被要求将输入数据经过编码器&#xff08;encoder&#xff09;转换成某种表示&#xff0c;然后再经过解码器&#xff08;decoder&#x…

有了这玩意,分分钟开发公众号功能!

大家好&#xff0c;我是程序员鱼皮。 不论在企业、毕设还是个人练手项目中&#xff0c;很多同学或多或少都会涉及微信相关生态的开发&#xff0c;例如微信支付、开放平台、公众号等等。 一般情况下&#xff0c;我们需要到官网查阅这些模块对应的 API 接口&#xff0c;自己编写…

windows打开防火墙指定端口(局域网访问本地项目)

windows打开防火墙指定端口(局域网访问本地项目) 本地运行了Vue前端项目&#xff0c;部署在5173端口&#xff0c;想让同事从局域网内访问项目&#xff0c;开放本机端口5173允许访问 在 Windows 上使用自带的防火墙&#xff0c;你可以按照以下步骤来允许局域网内其他设备对特定端…

【刷题】一篇文章搞定“位运算”

只要春天不死&#xff0c;就有迎春的花朵年年岁岁开放&#xff0c;生命讲涅槃&#xff0c;生生不息&#xff0c;并会以另一种形式永存。 – 路遥 《平凡的世界》 (◦′ᆺ‵◦) ♬ ✧❥✧.•✧♡✧ ℒℴѵℯ ✧♡✧•.❥ (◦′ᆺ‵◦) ♬ ✧❥✧.•✧♡✧ ℒℴѵℯ ✧♡✧•.❥…

C++——缺省参数与重载函数

目录 ​前言 一.缺省参数 1.1缺省参数概念 1.2缺省参数分类 注意事项&#xff1a; 二.函数重载 2.1函数重载概念 2.2c支持函数重载原理——命名修饰 前言 本篇文章主要讲述c中有关于缺少参数与函数重载的相关概念与实例&#xff0c;以下是本人拙见&#xff0c;如有错误…

Apple store 静安·苹果店欣赏

官网&#xff1a; https://www.apple.com/today/Apple 亚洲第一大商店&#xff1a;Apple 静安零售店现已在上海开幕 静安苹果欣赏

MS31912半桥电机驱动器可pin to pin替代DRV8912

主要特点 工作电压 4.5V-32V 每个半桥支持1A电流&#xff0c;并联输出支持6A最大电流 支持3.3V和5V逻辑输入 低功耗睡眠模式 (1.5μA) 带菊花链功能的5MHz 16位SPI通信 可通过SPI&#xff0c;配置PWM发生器的频率和占空比 集成多种保护和诊断功能nFAULT引脚输出、VM欠压锁定 、…

家政服务新体验——家政小程序开发,让生活更轻松!

一、引言 随着现代生活节奏的加快&#xff0c;家政服务已经成为越来越多家庭不可或缺的一部分。然而&#xff0c;传统家政服务方式往往存在预约不便、服务质量参差不齐等问题。为了解决这些问题&#xff0c;我们精心打造了一款家政小程序&#xff0c;为您带来全新的家政服务体…

电子作业指导书系统如何提升医疗设备工厂的生产效率

在医疗设备工厂中&#xff0c;电子作业指导书&#xff08;ESOP&#xff09;正逐渐成为提升生产效率的关键因素。 一、电子作业指导书系统提供了即时可得的准确信息。 电子作业指导书系统与传统的纸质作业指导书相比&#xff0c;员工可以在工作现场通过电子设备随时查阅最新、最…

【数据库原理及应用】期末复习汇总高校期末真题试卷11

试卷 一、填空题(每题 1 分&#xff0c;共10 分)    1. 数据库管理技术的发展经历了三个阶段&#xff1a;人工管理阶段&#xff0c;文件系统阶段和__________阶段。 2.实体完整性约束规定__________的取值不能为空值。 3. 计算机系统有三类安全性问题&#xff0c;即_____…

【机器学习】人力资源管理的新篇章:AI驱动的高效与智能化

&#x1f9d1; 作者简介&#xff1a;阿里巴巴嵌入式技术专家&#xff0c;深耕嵌入式人工智能领域&#xff0c;具备多年的嵌入式硬件产品研发管理经验。 &#x1f4d2; 博客介绍&#xff1a;分享嵌入式开发领域的相关知识、经验、思考和感悟&#xff0c;欢迎关注。提供嵌入式方向…