Using demultiplexed FastQ files as input, performs all necessary steps to end up with a tsv file summarizing the restriction enzyme fragments and the number of UMIs supporting that specific contact with the viewpoint (bait) of interest.

contactsUMI4C(
  fastq_dir,
  wk_dir,
  file_pattern = NULL,
  bait_seq,
  bait_pad,
  res_enz,
  cut_pos,
  digested_genome,
  bowtie_index,
  threads = 1,
  numb_reads = 1e+11,
  rm_tmp = TRUE,
  min_flen = 20,
  filter_bp = 1e+07,
  ref_gen
)

Arguments

fastq_dir

Path of the directory containing the FastQ files (compressed or uncompressed).

wk_dir

Working directory where to save the outputs generated by the UMI-4c analysis.

file_pattern

Character that can be used to filter the files you want to analyze in the fastq_dir.

bait_seq

Character containing the bait primer sequence.

bait_pad

Character containing the pad sequence (sequence between the bait primer and the restriction enzyme sequence).

res_enz

Character containing the restriction enzyme sequence.

cut_pos

Numeric indicating the nucleotide position where restriction enzyme cuts (zero-based) (for example, for DpnII is 0).

digested_genome

Path for the digested genome file generated using the digestGenome function.

bowtie_index

Path and prefix of the bowtie index to use for the alignment.

threads

Number of threads to use in the analysis. Default=1.

numb_reads

Number of lines from the FastQ file to load in each loop. If having memory size problems, change it to a smaller number. Default=10e10.

rm_tmp

Logical indicating whether to remove temporary files (sam and intermediate bams). TRUE or FALSE. Default=TRUE.

min_flen

Minimal fragment length to use for selecting the fragments. Default=20

filter_bp

Integer indicating the bp upstream and downstream of the viewpoint to select for further analysis. Default=10e6

ref_gen

A BSgenome object of the reference genome.

Value

This function is a combination of calls to other functions that perform the necessary steps for processing UMI-4C data.

Examples

path <- downloadUMI4CexampleData(use_sample=TRUE)
#> Will begin downloading datasets to /var/folders/6s/7ydjk_816j12lnyvsx292_qr0000gn/T//RtmpyYrfv3/file10d82ff3c066
#> untar: using cmd = ‘/usr/bin/tar -xf '/var/folders/6s/7ydjk_816j12lnyvsx292_qr0000gn/T//RtmpyYrfv3/file10d82ff3c066' -C './'’
#> Done writing UMI4C example files to ./
hg19_dpnii <- digestGenome(cut_pos = 0, res_enz = "GATC", name_RE = "DpnII", ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19, out_path = file.path(path, "ref_genome", "digested_genome"))
#> Generating digested genome using: #> > Restriction enzyme sequence: GATC #> > Restriction enzyme cut position: 0 #> > Restriction enzyme name: DpnII #> > Reference genome: BSgenome.Hsapiens.UCSC.hg19 #> > Output path: .//UMI4Cats_data_sub/ref_genome/digested_genome
#> Finished genome digestion.
raw_dir <- file.path(path, "SOCS1", "fastq") contactsUMI4C(fastq_dir=raw_dir, wk_dir=file.path(path, "SOCS1"), bait_seq="CCCAAATCGCCCAGACCAG", bait_pad="GCGCG", res_enz="GATC", cut_pos=0, digested_genome=hg19_dpnii, bowtie_index=file.path(path, "ref_genome", "ucsc.hg19.chr16"), threads=1, numb_reads=10e10, ref_gen = BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19)
#> #> [2020-04-14 14:21:34] Starting prepUMI4C using: #> > Fastq directory: #> .//UMI4Cats_data_sub/SOCS1/fastq #> > Work directory: .//UMI4Cats_data_sub/SOCS1 #> > Bait sequence: CCCAAATCGCCCAGACCAG #> > Bait pad: GCGCG #> > Restriction enzyme: GATC #> > Number of reads loaded into memory: 1e+11
#> [2020-04-14 14:21:35] Finished sample sub_ctrl_hi19_SOCS1
#> #> [2020-04-14 14:21:35] Starting splitUMI4C using: #> > Work directory: .//UMI4Cats_data_sub/SOCS1 #> > Cut position: 0 #> > Restriction enzyme: GATC #> > Number of reads loaded into memory: 1e+11
#> [2020-04-14 14:21:35] Finished sample sub_ctrl_hi19_SOCS1_R1.fq.gz
#> [2020-04-14 14:21:36] Finished sample sub_ctrl_hi19_SOCS1_R2.fq.gz
#> #> [2020-04-14 14:21:36] Starting alignmentUMI4C using: #> > Work directory: .//UMI4Cats_data_sub/SOCS1 #> > Viewpoint position: chr16:11349721-11349748 #> > Reference genome: .//UMI4Cats_data_sub/ref_genome/ucsc.hg19.chr16 #> > Number of threads: 1
#> Warning: For argument `samOutput`, file exist:.//UMI4Cats_data_sub/SOCS1/align/sub_ctrl_hi19_SOCS1_R1.sam. It will be overwrited
#> [2020-04-14 14:21:36] Finished sample sub_ctrl_hi19_SOCS1_R1.fq
#> Warning: For argument `samOutput`, file exist:.//UMI4Cats_data_sub/SOCS1/align/sub_ctrl_hi19_SOCS1_R2.sam. It will be overwrited
#> [2020-04-14 14:21:36] Finished sample sub_ctrl_hi19_SOCS1_R2.fq
#> #> [2020-04-14 14:21:36] Starting counterUMI4C using: #> > Work directory: .//UMI4Cats_data_sub/SOCS1 #> > Viewpoint position: chr16:11349721-11349748 #> > Restriction enzyme: GATC #> > Digested genome: .//UMI4Cats_data_sub/ref_genome/digested_genome/BSgenome.Hsapiens.UCSC.hg19_DpnII
#> [2020-04-14 14:21:38] Finished sample sub_ctrl_hi19_SOCS1