With the improvement of sequencing techniques, chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) is getting popular to study genome-wide protein-DNA interactions. To address the lack of powerful ChIP-Seq analysis method, we presented the Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites. MACS captures the influence of genome complexity to evaluate the significance of enriched ChIP regions and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. MACS can be easily used for ChIP-Seq data alone, or with a control sample with the increase of specificity. Moreover, as a general peak-caller, MACS can also be applied to any “DNA enrichment assays” if the question to be asked is simply: where we can find significant reads coverage than the random background.
This package is a wrapper of the MACS toolkit based on
basilisk.
The package is built on basilisk. The dependent python library macs3 will be installed automatically inside its conda environment.
There are 13 functions imported from MACS3. Details of each function can be checked from its manual.
| Functions | Description |
|---|---|
callpeak |
Main MACS3 Function to call peaks from alignment results. |
bdgpeakcall |
Call peaks from bedGraph output. |
bdgbroadcall |
Call broad peaks from bedGraph output. |
bdgcmp |
Comparing two signal tracks in bedGraph format. |
bdgopt |
Operate the score column of bedGraph file. |
cmbreps |
Combine BEDGraphs of scores from replicates. |
bdgdiff |
Differential peak detection based on paired four bedGraph files. |
filterdup |
Remove duplicate reads, then save in BED/BEDPE format. |
predictd |
Predict d or fragment size from alignment results. |
pileup |
Pileup aligned reads (single-end) or fragments (paired-end) |
randsample |
Randomly choose a number/percentage of total reads. |
refinepeak |
Take raw reads alignment, refine peak summits. |
callvar |
Call variants in given peak regions from the alignment BAM files. |
hmmratac |
Dedicated peak calling based on Hidden Markov Model for ATAC-seq data. |
callpeakWe have uploaded multipe test datasets from MACS to a data package
MACSdata in the ExperimentHub. For example,
Here we download a pair of single-end bed files to run the
callpeak function.
eh <- ExperimentHub::ExperimentHub()
eh <- AnnotationHub::query(eh, "MACSdata")
CHIP <- eh[["EH4558"]]
#> see ?MACSdata and browseVignettes('MACSdata') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
CTRL <- eh[["EH4563"]]
#> see ?MACSdata and browseVignettes('MACSdata') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cacheHere is an example to call narrow and broad peaks on the SE bed files.
cp1 <- callpeak(CHIP, CTRL, gsize = 5.2e7, store_bdg = TRUE,
name = "run_callpeak_narrow0", outdir = tempdir(),
cutoff_analysis = TRUE)
#> Installing pyenv ...
#> Done! pyenv has been installed to '/github/home/.local/share/r-reticulate/pyenv/bin/pyenv'.
#> Using Python: /github/home/.pyenv/versions/3.10.20/bin/python3.10
#> Creating virtual environment '/github/home/.cache/R/basilisk/1.24.0/MACSr/1.20.0/env_macs' ...
#> + /github/home/.pyenv/versions/3.10.20/bin/python3.10 -m venv /github/home/.cache/R/basilisk/1.24.0/MACSr/1.20.0/env_macs
#> Done!
#> Installing packages: pip, wheel, setuptools
#> + /github/home/.cache/R/basilisk/1.24.0/MACSr/1.20.0/env_macs/bin/python -m pip install --upgrade pip wheel setuptools
#> Installing packages: 'macs3==3.0.2'
#> + /github/home/.cache/R/basilisk/1.24.0/MACSr/1.20.0/env_macs/bin/python -m pip install --upgrade --no-user 'macs3==3.0.2'
#> Virtual environment '/github/home/.cache/R/basilisk/1.24.0/MACSr/1.20.0/env_macs' successfully created.
#> INFO @ 07 Jun 2026 10:52:33: [583 MB]
#> # Command line:
#> # ARGUMENTS LIST:
#> # name = run_callpeak_narrow0
#> # format = AUTO
#> # ChIP-seq file = ['/github/home/.cache/R/ExperimentHub/122f6713acdc_4601']
#> # control file = ['/github/home/.cache/R/ExperimentHub/122f187bca31_4606']
#> # effective genome size = 5.20e+07
#> # band width = 300
#> # model fold = [5.0, 50.0]
#> # qvalue cutoff = 5.00e-02
#> # The maximum gap between significant sites is assigned as the read length/tag size.
#> # The minimum length of peaks is assigned as the predicted fragment length "d".
#> # Larger dataset will be scaled towards smaller dataset.
#> # Range for calculating regional lambda is: 1000 bps and 10000 bps
#> # Broad region calling is off
#> # Additional cutoff on fold-enrichment is: 0.10
#> # Paired-End mode is off
#>
#> INFO @ 07 Jun 2026 10:52:33: [583 MB] #1 read tag files...
#> INFO @ 07 Jun 2026 10:52:33: [583 MB] #1 read treatment tags...
#> INFO @ 07 Jun 2026 10:52:33: [588 MB] Detected format is: BED
#> INFO @ 07 Jun 2026 10:52:33: [588 MB] * Input file is gzipped.
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1.2 read input tags...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] Detected format is: BED
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] * Input file is gzipped.
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 tag size is determined as 101 bps
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 tag size = 101.0
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 total tags in treatment: 49622
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 user defined the maximum tags...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s)
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 tags after filtering in treatment: 48047
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 Redundant rate of treatment: 0.03
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 total tags in control: 50837
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 user defined the maximum tags...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s)
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 tags after filtering in control: 50783
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 Redundant rate of control: 0.00
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 finished!
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 Build Peak Model...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 looking for paired plus/minus strand peaks...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 Total number of paired peaks: 469
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 Model building with cross-correlation: Done
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 finished!
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 predicted fragment length is 228 bps
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 alternative fragment length(s) may be 228 bps
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2.2 Generate R script for model : /tmp/RtmpRQkG1j/run_callpeak_narrow0_model.r
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #3 Call peaks...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #3 Pre-compute pvalue-qvalue table...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #3 Cutoff vs peaks called will be analyzed!
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 Analysis of cutoff vs num of peaks or total length has been saved in b'/tmp/RtmpRQkG1j/run_callpeak_narrow0_cutoff_analysis.txt'
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 In the peak calling step, the following will be performed simultaneously:
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 Write bedGraph files for treatment pileup (after scaling if necessary)... run_callpeak_narrow0_treat_pileup.bdg
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 Write bedGraph files for control lambda (after scaling if necessary)... run_callpeak_narrow0_control_lambda.bdg
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 Pileup will be based on sequencing depth in treatment.
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 Call peaks for each chromosome...
#> INFO @ 07 Jun 2026 10:52:34: [615 MB] #4 Write output xls file... /tmp/RtmpRQkG1j/run_callpeak_narrow0_peaks.xls
#> INFO @ 07 Jun 2026 10:52:34: [615 MB] #4 Write peak in narrowPeak format file... /tmp/RtmpRQkG1j/run_callpeak_narrow0_peaks.narrowPeak
#> INFO @ 07 Jun 2026 10:52:34: [615 MB] #4 Write summits bed file... /tmp/RtmpRQkG1j/run_callpeak_narrow0_summits.bed
#> INFO @ 07 Jun 2026 10:52:34: [615 MB] Done!
cp2 <- callpeak(CHIP, CTRL, gsize = 5.2e7, store_bdg = TRUE,
name = "run_callpeak_broad", outdir = tempdir(),
broad = TRUE)
#> Here are the outputs.
cp1
#> macsList class
#> $outputs:
#> /tmp/RtmpRQkG1j/run_callpeak_narrow0_control_lambda.bdg
#> /tmp/RtmpRQkG1j/run_callpeak_narrow0_cutoff_analysis.txt
#> /tmp/RtmpRQkG1j/run_callpeak_narrow0_model.r
#> /tmp/RtmpRQkG1j/run_callpeak_narrow0_peaks.narrowPeak
#> /tmp/RtmpRQkG1j/run_callpeak_narrow0_peaks.xls
#> /tmp/RtmpRQkG1j/run_callpeak_narrow0_summits.bed
#> /tmp/RtmpRQkG1j/run_callpeak_narrow0_treat_pileup.bdg
#> $arguments: tfile, cfile, gsize, outdir, name, store_bdg, cutoff_analysis
#> $log:
#> INFO @ 07 Jun 2026 10:52:33: [583 MB]
#> # Command line:
#> # ARGUMENTS LIST:
#> # name = run_callpeak_narrow0
#> # format = AUTO
#> ...
cp2
#> macsList class
#> $outputs:
#> /tmp/RtmpRQkG1j/run_callpeak_broad_control_lambda.bdg
#> /tmp/RtmpRQkG1j/run_callpeak_broad_model.r
#> /tmp/RtmpRQkG1j/run_callpeak_broad_peaks.broadPeak
#> /tmp/RtmpRQkG1j/run_callpeak_broad_peaks.gappedPeak
#> /tmp/RtmpRQkG1j/run_callpeak_broad_peaks.xls
#> /tmp/RtmpRQkG1j/run_callpeak_broad_treat_pileup.bdg
#> $arguments: tfile, cfile, gsize, outdir, name, store_bdg, broad
#> $log:
#> macsList classThe macsList is designed to contain everything of an
execution, including function, inputs, outputs and logs, for the purpose
of reproducibility.
For example, we can the function and input arguments.
cp1$arguments
#> [[1]]
#> callpeak
#>
#> $tfile
#> CHIP
#>
#> $cfile
#> CTRL
#>
#> $gsize
#> [1] 5.2e+07
#>
#> $outdir
#> tempdir()
#>
#> $name
#> [1] "run_callpeak_narrow0"
#>
#> $store_bdg
#> [1] TRUE
#>
#> $cutoff_analysis
#> [1] TRUEThe files of all the outputs are collected.
cp1$outputs
#> [1] "/tmp/RtmpRQkG1j/run_callpeak_narrow0_control_lambda.bdg"
#> [2] "/tmp/RtmpRQkG1j/run_callpeak_narrow0_cutoff_analysis.txt"
#> [3] "/tmp/RtmpRQkG1j/run_callpeak_narrow0_model.r"
#> [4] "/tmp/RtmpRQkG1j/run_callpeak_narrow0_peaks.narrowPeak"
#> [5] "/tmp/RtmpRQkG1j/run_callpeak_narrow0_peaks.xls"
#> [6] "/tmp/RtmpRQkG1j/run_callpeak_narrow0_summits.bed"
#> [7] "/tmp/RtmpRQkG1j/run_callpeak_narrow0_treat_pileup.bdg"The log is especially important for MACS to
check. Detailed information was given in the log when running.
cat(paste(cp1$log, collapse="\n"))
#> INFO @ 07 Jun 2026 10:52:33: [583 MB]
#> # Command line:
#> # ARGUMENTS LIST:
#> # name = run_callpeak_narrow0
#> # format = AUTO
#> # ChIP-seq file = ['/github/home/.cache/R/ExperimentHub/122f6713acdc_4601']
#> # control file = ['/github/home/.cache/R/ExperimentHub/122f187bca31_4606']
#> # effective genome size = 5.20e+07
#> # band width = 300
#> # model fold = [5.0, 50.0]
#> # qvalue cutoff = 5.00e-02
#> # The maximum gap between significant sites is assigned as the read length/tag size.
#> # The minimum length of peaks is assigned as the predicted fragment length "d".
#> # Larger dataset will be scaled towards smaller dataset.
#> # Range for calculating regional lambda is: 1000 bps and 10000 bps
#> # Broad region calling is off
#> # Additional cutoff on fold-enrichment is: 0.10
#> # Paired-End mode is off
#>
#> INFO @ 07 Jun 2026 10:52:33: [583 MB] #1 read tag files...
#> INFO @ 07 Jun 2026 10:52:33: [583 MB] #1 read treatment tags...
#> INFO @ 07 Jun 2026 10:52:33: [588 MB] Detected format is: BED
#> INFO @ 07 Jun 2026 10:52:33: [588 MB] * Input file is gzipped.
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1.2 read input tags...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] Detected format is: BED
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] * Input file is gzipped.
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 tag size is determined as 101 bps
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 tag size = 101.0
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 total tags in treatment: 49622
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 user defined the maximum tags...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s)
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 tags after filtering in treatment: 48047
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 Redundant rate of treatment: 0.03
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 total tags in control: 50837
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 user defined the maximum tags...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 filter out redundant tags at the same location and the same strand by allowing at most 1 tag(s)
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 tags after filtering in control: 50783
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 Redundant rate of control: 0.00
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #1 finished!
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 Build Peak Model...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 looking for paired plus/minus strand peaks...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 Total number of paired peaks: 469
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 Model building with cross-correlation: Done
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 finished!
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 predicted fragment length is 228 bps
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2 alternative fragment length(s) may be 228 bps
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #2.2 Generate R script for model : /tmp/RtmpRQkG1j/run_callpeak_narrow0_model.r
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #3 Call peaks...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #3 Pre-compute pvalue-qvalue table...
#> INFO @ 07 Jun 2026 10:52:33: [593 MB] #3 Cutoff vs peaks called will be analyzed!
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 Analysis of cutoff vs num of peaks or total length has been saved in b'/tmp/RtmpRQkG1j/run_callpeak_narrow0_cutoff_analysis.txt'
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 In the peak calling step, the following will be performed simultaneously:
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 Write bedGraph files for treatment pileup (after scaling if necessary)... run_callpeak_narrow0_treat_pileup.bdg
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 Write bedGraph files for control lambda (after scaling if necessary)... run_callpeak_narrow0_control_lambda.bdg
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 Pileup will be based on sequencing depth in treatment.
#> INFO @ 07 Jun 2026 10:52:34: [614 MB] #3 Call peaks for each chromosome...
#> INFO @ 07 Jun 2026 10:52:34: [615 MB] #4 Write output xls file... /tmp/RtmpRQkG1j/run_callpeak_narrow0_peaks.xls
#> INFO @ 07 Jun 2026 10:52:34: [615 MB] #4 Write peak in narrowPeak format file... /tmp/RtmpRQkG1j/run_callpeak_narrow0_peaks.narrowPeak
#> INFO @ 07 Jun 2026 10:52:34: [615 MB] #4 Write summits bed file... /tmp/RtmpRQkG1j/run_callpeak_narrow0_summits.bed
#> INFO @ 07 Jun 2026 10:52:34: [615 MB] Done!More details about MACS3 can be found: https://macs3-project.github.io/MACS/.
sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MACSdata_1.20.0 MACSr_1.20.0 BiocStyle_2.40.0
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.52.0 dir.expiry_1.20.0 xfun_0.58
#> [4] bslib_0.11.0 httr2_1.2.2 Biobase_2.72.0
#> [7] lattice_0.22-9 vctrs_0.7.3 tools_4.6.0
#> [10] generics_0.1.4 stats4_4.6.0 curl_7.1.0
#> [13] parallel_4.6.0 tibble_3.3.1 AnnotationDbi_1.74.0
#> [16] RSQLite_3.53.1 blob_1.3.0 pkgconfig_2.0.3
#> [19] Matrix_1.7-5 dbplyr_2.5.2 S4Vectors_0.50.1
#> [22] lifecycle_1.0.5 compiler_4.6.0 Biostrings_2.80.1
#> [25] Seqinfo_1.2.0 htmltools_0.5.9 sys_3.4.3
#> [28] buildtools_1.0.0 sass_0.4.10 yaml_2.3.12
#> [31] pillar_1.11.1 crayon_1.5.3 jquerylib_0.1.4
#> [34] cachem_1.1.0 ExperimentHub_3.2.0 AnnotationHub_4.2.0
#> [37] basilisk_1.24.0 tidyselect_1.2.1 digest_0.6.39
#> [40] dplyr_1.2.1 purrr_1.2.2 BiocVersion_3.23.1
#> [43] maketools_1.3.2 fastmap_1.2.0 grid_4.6.0
#> [46] cli_3.6.6 magrittr_2.0.5 withr_3.0.2
#> [49] filelock_1.0.3 rappdirs_0.3.4 bit64_4.8.2
#> [52] rmarkdown_2.31 XVector_0.52.0 httr_1.4.8
#> [55] bit_4.6.0 otel_0.2.0 reticulate_1.46.0
#> [58] png_0.1-9 memoise_2.0.1 evaluate_1.0.5
#> [61] knitr_1.51 IRanges_2.46.0 BiocFileCache_3.2.0
#> [64] rlang_1.2.0 Rcpp_1.1.1-1.1 glue_1.8.1
#> [67] DBI_1.3.0 BiocManager_1.30.27 BiocGenerics_0.58.1
#> [70] jsonlite_2.0.0 R6_2.6.1